Recursively download website files using WGET

1 min read Jul 28, 2013

I use the following command to recursively download a bunch of files from a website to my local machine. It is great for working with open directories of files, e.g. those made available from the Apache web server.

The following can be added to your .bash_profile or .bashrc script, depending on which your OS/distro recommends:

function download-web() {
    wget -r -nH --no-parent --reject='index.html*' "$@" ;
}

To invoke the command, you run it like so:

download-web http://www.example.com/path/to/files

It will then download everything linked from the first page, checking each child path, to the current directory. It will not download anything above that directory, and will not keep a local copy of those index.html files (or index.html?blah=blah which get pretty annoying).

This isn't a simple alias, but is a bash function, so that you can add a URL after the command. It should work fine in both OS X and Linux. If you are using OS X, you can follow my guide for Installing WGET on OS X.

Tags: #linux #macos #cli #scraping

Thomas has contributed to dozens of enterprise Node.js services and has worked for a company dedicated to securing Node.js. He has spoken at several conferences on Node.js and JavaScript and is an O'Reilly published author.