Got a big Web download job? You 'll need the right tools and HTTrack and SiteSucker are two of the best available. And they're free.

The other day I was checking out an excellent free book, Python for Informatics: Exploring Information by Charles Severance, which is available on the site PythonLearn. The site describes itself as “set of generic Python Learning Resources to allow self-paced learning of the Python Language.” Based on the materials from several courses at the University of Michigan, the site is a treasure trove for learning Python or sprucing up your Python-fu.

I decided to download the sample code from the book but, unfortunately, the samples are in about 65 individual files as well as five subdirectories which, in turn, each contain around half a dozen files. Sitting there and manually downloading each file wasn’t something I particularly wanted to do so I looked for a tool to do the job for me and I’ve come up with a couple of interesting choices.

First, there’s HTTrack, a “free (GPL, libre/free software) and easy-to-use offline browser utility” which:

…  allows you to download a World Wide Web site from the Internet to a local directory, building recursively all directories, getting HTML, images, and other files from the server to your computer. HTTrack arranges the original site's relative link-structure. Simply open a page of the "mirrored" website in your browser, and you can browse the site from link to link, as if you were viewing it online. HTTrack can also update an existing mirrored site, and resume interrupted downloads. HTTrack is fully configurable, and has an integrated help system.

HTTrack is great, it's got lots of useful features including sophisticated file type download options and it’s easy to install … at least, easy under Windows (where it’s known as WinHTTrack). This is because for the version that runs on OS X, BSD, and Linux (called WebHTTrack) you’ve got to compile the code or, for Mac, have MacPorts installed (which also requires Xcode to be installed)either of which can send you down “a maze of twisty little passages, all alike ...” if you’re trying to get the job done quickly. As I mainly use OS X, I wanted an easier solution ...

Some more research revealed a free OS X and iOS app called SiteSucker that turned out to be just what I needed. SiteSucker asynchronously copies:

… the site's Web pages, images, backgrounds, movies, and other files to your local hard drive, duplicating the site's directory structure. Just enter a URL (Uniform Resource Locator), press return, and SiteSucker can download an entire Web site.

SiteSucker can be used to make local copies of Web sites. By default, SiteSucker "localizes" the files it downloads, allowing you to browse a site offline, but it can also download sites without modification.

You can save all the information about a download in a document. This allows you to create a document that you can use to perform the same download whenever you want. If SiteSucker is in the middle of a download when you choose the Save command, SiteSucker will pause the download and save its status with the document. When you open the document later, you can restart the download from where it left off by pressing the Resume button.

SiteSucker can also be controlled by AppleScript and there’s a utility called SuckList that creates “lists of numerically indexed URLs and drive SiteSucker to download the files in the list. It can also drive SiteSucker using a manually produced list.”

Of course, after spending the time researching and installing download tools I discovered that all of the Python coding examples are available on another page on the PythonLearn site as a ZIP file

If you know of any other useful site download tools for Windows, OS X, or Linux, that you’ve used, let me know.

