Automated Downloading with Wget

Loading...
Thumbnail Image

Date

2012-06-27

Authors

Milligan, Ian

Advisor

Journal Title

Journal ISSN

Volume Title

Publisher

The Editorial Board of the Programming Historian

Abstract

Wget is a useful program, run through your computer’s command line, for retrieving online material. It can be useful in the following situations: Retrieving or mirroring (creating an exact copy of) an entire website. This website might contain historical documents, or it may simply be your own personal website that you want to back up. One command can download the entire site onto your computer. Downloading specific files in a website’s hierarchy (all websites within a certain part of a website, such as every page that is contained within the /papers/ directory of a website). In this lesson, we will work through three quick examples of how you might use wget in your own work. At the end of the lesson, you will be able to quickly download large amounts of information from the Internet in an automated fashion. If you find a repository of online historical information, instead of right-clicking on every file and saving it to build your dataset, you will have the skills to craft a single command to do so.

Description

This article Published by the Editorial Board of the Programming Historian is made available under a Creative Commons Attribution 2.0 Generic License. Available at: http://programminghistorian.org/lessons/automated-downloading-with-wget

Keywords

Wget, Automated downloading, Website mirroring, Website retrieving

LC Keywords

Citation