Automated Downloading with Wget
Loading...
Date
2012-06-27
Authors
Milligan, Ian
Advisor
Journal Title
Journal ISSN
Volume Title
Publisher
The Editorial Board of the Programming Historian
Abstract
Wget is a useful program, run through your computer’s command line, for retrieving online material.
It can be useful in the following situations:
Retrieving or mirroring (creating an exact copy of) an entire website. This website might contain historical documents, or it may simply be your own personal website that you want to back up. One command can download the entire site onto your computer.
Downloading specific files in a website’s hierarchy (all websites within a certain part of a website, such as every page that is contained within the /papers/ directory of a website).
In this lesson, we will work through three quick examples of how you might use wget in your own work. At the end of the lesson, you will be able to quickly download large amounts of information from the Internet in an automated fashion. If you find a repository of online historical information, instead of right-clicking on every file and saving it to build your dataset, you will have the skills to craft a single command to do so.
Description
This article Published by the Editorial Board of the Programming Historian is made available under a Creative Commons Attribution 2.0 Generic License. Available at: http://programminghistorian.org/lessons/automated-downloading-with-wget
Keywords
Wget, Automated downloading, Website mirroring, Website retrieving