Using curl and wget commands to download pages from web sites

The curl and the wget commands make it easy to download content from web sites.

Credit: Getty Images

One of the most versatile tools for collecting data from a server is curl. The “url” portion of the name properly suggests that the command is built to locate data through the URL (uniform resource locater) that you provide. And it doesn’t just communicate with web servers. It supports a wide variety of protocols. This includes HTTP, HTTPS, FTP, FTPS, SCP, SFTP and more. The wget command, though similar in some ways to curl, primarily supports HTTP and FTP protocols.

Using the curl command

You might use the curl command to:

Download files from the internet
Run tests to ensure that the remote server is doing what is expected
Do some debugging on various problems
Log errors for later analysis
Back up important files from the server

Probably the most obvious thing to do with the curl command is to download a page from a web site for review on the command line. To do this, just enter “curl” followed by the URL of the web site like this (the content below is truncated):

$ curl https://www.networkworld.com/category/linux/
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0  124k    0     0    0     0      0      0 --:--:--  0:00:06 --:--:--     0


   
  
…

You’ll see some timing data plus the content. To save the content to a file, redirect the output to a file using a command like this:

$ curl https://www.networkworld.com/category/linux/ > linux.html
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  124k  100  124k    0     0  23339      0  0:00:05  0:00:05 --:--:-- 30035

The downloaded file can then be viewed on your system using cat or more to see the html content or a browser to view the web page.

In the command below, a single html file is grabbed.

$ curl https://www.networkworld.com/video/series/8559/2-minute-linux-tips > linux_tips.html
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 79873  100 79873    0     0  56780      0  0:00:01  0:00:01 --:--:-- 56808

Any sequence of blank lines can be reduced to one with a command like this:

$ uniq linux_tips.html > linux_tips.html

More information on using curl is available in this previous post of mine: The Joy of curl

You can also get some quick help on options for using curl with the curl –help command:

$ curl --help
Usage: curl [options...] 
 -d, --data           HTTP POST data
 -f, --fail                 Fail fast with no output on HTTP errors
 -h, --help       Get help for commands
 -i, --include              Include protocol response headers in the output
 -o, --output         Write to file instead of stdout
 -O, --remote-name          Write output to a file named as the remote file
 -s, --silent               Silent mode
 -T, --upload-file    Transfer local FILE to destination
 -u, --user  Server user and password
 -A, --user-agent     Send User-Agent  to server
 -v, --verbose              Make the operation more talkative
 -V, --version              Show version number and quit

This is not the full help, this menu is stripped into categories.
Use "--help category" to get an overview of all categories.
For all options use the manual or "--help all".’

Using wget

The wget command makes it easy to download a web site recursively. While the site used in the command below is a single-page web site, it provides a quick example of how this command works.

$ wget -r http://example.com/
--2023-09-19 13:07:12--  http://example.com/
Resolving example.com (example.com)... 93.184.216.34, 2606:2800:220:1:248:1893:25c8:1946
Connecting to example.com (example.com)|93.184.216.34|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1256 (1.2K) [text/html]
Saving to: ‘example.com/index.html’

example.com/index.html        100%[=================================================>]   1.23K  --.-KB/s    in 0s

2023-09-19 13:07:12 (56.1 MB/s) - ‘example.com/index.html’ saved [1256/1256]

FINISHED --2023-09-19 13:07:12--
Total wall clock time: 0.1s
Downloaded: 1 files, 1.2K in 0s (56.1 MB/s)

The downloaded content will include a directory with the name of the URL (example.com) and containing its contents – in this case a single file.

$ ls example.com
index.html
$ head example.com/index.html



    Example Domain

    
    
    
    
    body {

If you were to run the command below (no recursion) multiple times, generations of the file will build up.

$ wget http://example.com/
$ ls -l index.html*
-rw-r--r--. 1 shs shs 1256 Oct 17  2019 index.html
-rw-r--r--. 1 shs shs 1256 Oct 17  2019 index.html.1
-rw-r--r--. 1 shs shs 1256 Oct 17  2019 index.html.2
-rw-r--r--. 1 shs shs 1256 Oct 17  2019 index.html.3

The no-parent option

The no-parent options ensures that the command will not ever ascend to the parent directory when retrieving content recursively so that only the files below a certain hierarchy will be downloaded.

$ wget --no-parent -r https://uushenandoah.org/how-to-become-a-member/

Wrap-up

Both curl and wget are extremely useful commands for downloading and troubleshooting web content. Check out the man pages for information on the many options available.

More Linux tips and how-tos:

Linux

Using curl and wget commands to download pages from web sites

The curl and the wget commands make it easy to download content from web sites.

Using the curl command

Using wget

The no-parent option

Wrap-up

More from this author

Parameter expansion on Linux

Essential commands for Linux server management

Why people love Linux

Understanding devices on Linux systems

Many ways to use the date command on Linux

More math on the Linux command line

Making use of your command history on Linux

18 essential commands for new Linux users

Show me more

Groundcover raises $100M as observability pivots from monitoring to AI infrastructure

Dangling DNS records and reverse DNS gaps give attackers new openings

Data center developer eyes disused newpaper printing plant

Has the hype around ‘Internet of Things’ paid off? | Ep. 145

Episode 1: Understanding Cisco’s Converged SDN Transport

Episode 2: Pluggable Optics and the Internet for the Future

Master Linux Math with the bc Command | Easy CLI Calculations Explained!

Master Linux Math in Seconds: How to Use the expr Command Like a Pro

How to Do Math in the Command Line Using Double Parentheses

Using curl and wget commands to download pages from web sites

Using the curl command

Using wget

The no-parent option

Wrap-up

From our editors straight to your inbox

More from this author

Parameter expansion on Linux

Essential commands for Linux server management

Why people love Linux

Understanding devices on Linux systems

Many ways to use the date command on Linux

More math on the Linux command line

Making use of your command history on Linux

18 essential commands for new Linux users

Show me more

Groundcover raises $100M as observability pivots from monitoring to AI infrastructure

Dangling DNS records and reverse DNS gaps give attackers new openings

Data center developer eyes disused newpaper printing plant

Has the hype around ‘Internet of Things’ paid off? | Ep. 145

Episode 1: Understanding Cisco’s Converged SDN Transport

Episode 2: Pluggable Optics and the Internet for the Future

Master Linux Math with the bc Command | Easy CLI Calculations Explained!

Master Linux Math in Seconds: How to Use the expr Command Like a Pro

How to Do Math in the Command Line Using Double Parentheses