Hire enCypher Technologies Pvt. Ltd.

What is Wget? (Part-2)

What is Wget? (Part-2)

How to, initiative, installation, knowledge, learning, Technology, wget 13 Comments
What is Wget? (Part-2)

In my last article, I introduced you all to the internet downloading tool- Wget. We have already seen the installation procedure and few basic commands.

Today, we are going to explore more custom uses of this simple yet powerful tool.

The method we are going to follow is, we’ll discuss a use case and then the command to get it done and various tweaks that could be performed on it to suit your requirements.

So lets get started…

Say, you have a file.txt with any number of links each on newline and you want to download them all, use

wget –input file.txt

Here “–input” option tells wget to download all the links in file.txt.

If you are a web developer, you might want to have a look at a website’s html, css, or js to figure out some new cool UI implentations. To do that, our wget tool comes in handy and can download and store all of the webpage’s html, css, js and other assets like images etc… locally which are enough to make it run offline

To do this, use:

wget –page-requisites –span-hosts –convert-links –adjust-extension http://sitename.com/dir/file

options Their uses
–page-requisites This option causes Wget to download all the files that are necessary to properly display a given HTML page. This includes things such as inlined images, sounds, and referenced stylesheets.
–page-span Enable spanning across hosts when doing recursive retrieving.

 

–convert-links Retrieve only one HTML page, but make sure that all the elements needed for the page to be displayed, such as inline images and external style sheets, are also downloaded. Also make sure the downloaded page references the downloaded links.
–adjust-extension Takes care of file extension names. Checks and correct extensions of all the files.

What we just saw above can be modified to mirror entire websites which is something every blogger should use regularly to take a backup of their wordpress files if in case the domain service messes up the online blog. To mirror a website, use

wget –execute robots-off –recursive –no-parent –continue –no-cobbler http://sitename.com

Lets download all the mp3 within a subdirectory of a website

wget –level=1 –recursive –no-parent –accept mp3,MP3 http://sitename.com/mp3/

This will download all the mp3 files contained in /mp3/ directory. We have set recusion depth as 1 using “–level” option

Exact above command can be modified for images as

wget –directory-prefix=file/images –no-directories –recursive –no-cobbler –accept jpg,gif,png,jpeg http://sitename/images/

Here “–accept” tells wget to only accept files with extensions matching jpg,gif,png and jpeg

Following is the command to downlaod files from a website which checks the User Agent and the HTTP Referer

wget –refer=http://google.com –user-agent=”Mozilla/5.0 Firefox/4.0.1” http://sitename.com

Download files from a password protected site

wget –http-user=labnol –http-password=hello123 http://sitename.com/secret/file.zip

Some other options that might come in handy:

options Their uses
–wait=10 Use this option when you dont want to consume all of the bandwidth and download files after a gap of 10 sec in between. It is a good practice to not put a lot of load on some website.
–random-wait Use this in combination with above one to not wait exact 10 sec but random amount of time.
–domain=xyz.com,docs.abc.com,files.abc.com Use this to specify any number of domains name to fetch from separated by comma.
–limit-rate=200k Limits the bandwidth use to 200Kbps.

By now, you have mastered the use of [options] fields to suit as per your requirements. You can use various combinations of options and tweak above commands to get the desired result set. All the options listed here will by and large  cover your needs but if you are stuck, you can always refer to the wget manual at https://www.gnu.org/software/wget/ or ask in the comments section below.

Image Source: lintut.com

Written by:


Like it? Share it
Tweet this! Share on Facebook. Vote on Reddit Stumble it Digg this! Share on LinkedIn share on Google+ pinterest

Tags: No tag[s] associated with this article


BLOG SEARCH


RECENT POSTS

READ RECENT POSTS

  • The IoT world – naive opinions! March 9, 2017 / 2:05 PM
  • A few Lines about story: Real stories and incidents happened when someone pretending to know about IoT. Disclaimer: This article is not a fiction and is based on real incidents which I have faced till now. Identities of all the characters and companies appearing in this article have not been shared just to prevent their “privacy” […]

  • What is Wget? (Part-2) March 8, 2017 / 8:07 PM
  • In my last article, I introduced you all to the internet downloading tool- Wget. We have already seen the installation procedure and few basic commands. Today, we are going to explore more custom uses of this simple yet powerful tool. The method we are going to follow is, we’ll discuss a use case and then […]