Installing Wget
As explained in my last post, wget is a small tool used to download files, mirror web pages as well as alot of other things. It works in the command line environment.
To be able to use wget from any directory, you will first need to place it in the main system path folders, where it can be accessed from any directory.
To find out the main system path;
open command prompt, and
type path.
you will see the various paths separated by semi colons.
copy wget into the main part, most often, in windows it is either the system32 folder, or windows folder.
Copy wget.exe into system32, or windows folder.
you can test if wget is working well, by typing
wget -h, and hitting the enter key in command prompt.
If correctly installed, it displays a list of help commands.
NB: you can use windows explorer or command prompt to copy wget.exe, to the main system path folder.
It is always advisable, to create a working directory, where you want your downloads to be placed. you can create this in windows, or command prompt using the md command.
After creating the working folder,
In command prompt, navigate to the working folder, that is where you want to put the downloaded files or websites.
To download a file,
open command prompt, and navigate to the folder where you want the files to be downloaded and kept.
Type wget, followed by number of retries, url of file, and ampersand, if you want to enable background downloading, and press the enter key.
i.e.
wget -t 30 http://www.url of file to be downloaded &
e.g.
wget -t 25 http://www.runtime.org/raid_recovery_for_windows.cab &
To mirror a complete website,
open command prompt
navigate to where you want website to be downloaded and kept, and
type wget -m url of website
then press enter.
e.g.
wget -m http://www.runtime.org
To mirror a website recursively,
i.e. relink the downloaded files locally,
type wget -mk url of website to be mirrored.
e.g.
wget -mk http://www.runtime.org
and don't forget the ampersand if you want to enable background downloading. i.e. to allow wget to continue mirroring of the website after you log out.
You can get wget from the gnu website, softpedia, cnet download.com, or by simply performing a google search.
Good Luck and Happy downloading.
Sunday, September 18, 2011
Saturday, September 17, 2011
wget tutorial, how to download files, webpages, and mirror websites
Downloading files with wget
Wget is a free utility for non-interactive download of files from the Web. It supports http, https, and ftp protocols, as well as retrieval through http proxies.
If say you want to download a url like www.softpedia.com,
You can Just type:
wget http://www.softpedia.com
If you want to dowload a file from softpedia, say avira antivirus,
you will have to just type wget, followed by the link location of the file, and hit enter. e.g.
wget http://www.softpedia.com/avira_antivirus_personal.exe
But if the connection is slow, and the file is lengthy? The connection will probably fail before the whole file is retrieved, more than once. In this case, Wget will try getting the file until it either gets the whole of it, or exceeds the default number of retries ( being 20). It is easy to change the number of tries to 50, to insure that the whole file will arrive safely:
wget --tries=50 http://www.softpedia.com/avira_antivirus_personal.exe
We can leave Wget to work in the background, and write its progress to a log file. It is better to use ‘-t’, instead of -tries.
wget -t 50 -o log http://www.softpedia.com/avira_antivirus_Personal.exe &
this tells wget to keep retrying to get the file, up to 50 times. the ampersand at the end, tells wget to continue downloading in the background, while the (-o log) tells wget to report any errors to a file named log.
In ftp downloads, you simply type wget, followed by number of retries, except if you don't want to use the default number of retries, then the file address.
wget ftp://gnjilux.srk.fer.hr/welcome.msg
If you specify a directory, Wget will retrieve the directory listing, parse it and convert it to html.
E.g.
wget ftp://ftp.gnu.org/pub/gnu/
links index.html
You can equally put all the urls you want to download in a file, and save it. you will then use the -i switch to specify the file. e.g.
wget -i file
To mirror a website, simply type:
wget -m http://your_website_url
To create a three levels deep mirror image of the GNU web site, with the same directory structure the original has, with only one try per document, saving the log of the activities to gnulog: type
wget -r -l3 http://www.gnu.org/ -o gnulog
To download a three levels deep mirror of the gnu web site as above, and convert addresses of the links of the downloaded files to point to local files, so you can view the documents off-line:
wget --convert-links -r -l3 http://www.gnu.org/ -o gnulog
or for short, use
wget -mk -l3 http://www.gnu.org/ -o gnulog
Retrieve only one html page, but make sure that all the elements needed for the page to be displayed, such as inline images and external style sheets, are also downloaded. Also make sure the downloaded page references the downloaded links.
wget -p --convert-links http://www.server.com/dir/page.html
to mirror a complete website recursively(i.e. relink the downloaded files, for example, www.blogger.com type;
wget -mk http://www.blogger.com/
More support is available at gnu website, the official developers of wget
http://www.gnu.org
Wget is a free utility for non-interactive download of files from the Web. It supports http, https, and ftp protocols, as well as retrieval through http proxies.
If say you want to download a url like www.softpedia.com,
You can Just type:
wget http://www.softpedia.com
If you want to dowload a file from softpedia, say avira antivirus,
you will have to just type wget, followed by the link location of the file, and hit enter. e.g.
wget http://www.softpedia.com/avira_antivirus_personal.exe
But if the connection is slow, and the file is lengthy? The connection will probably fail before the whole file is retrieved, more than once. In this case, Wget will try getting the file until it either gets the whole of it, or exceeds the default number of retries ( being 20). It is easy to change the number of tries to 50, to insure that the whole file will arrive safely:
wget --tries=50 http://www.softpedia.com/avira_antivirus_personal.exe
We can leave Wget to work in the background, and write its progress to a log file. It is better to use ‘-t’, instead of -tries.
wget -t 50 -o log http://www.softpedia.com/avira_antivirus_Personal.exe &
this tells wget to keep retrying to get the file, up to 50 times. the ampersand at the end, tells wget to continue downloading in the background, while the (-o log) tells wget to report any errors to a file named log.
In ftp downloads, you simply type wget, followed by number of retries, except if you don't want to use the default number of retries, then the file address.
wget ftp://gnjilux.srk.fer.hr/welcome.msg
If you specify a directory, Wget will retrieve the directory listing, parse it and convert it to html.
E.g.
wget ftp://ftp.gnu.org/pub/gnu/
links index.html
You can equally put all the urls you want to download in a file, and save it. you will then use the -i switch to specify the file. e.g.
wget -i file
To mirror a website, simply type:
wget -m http://your_website_url
To create a three levels deep mirror image of the GNU web site, with the same directory structure the original has, with only one try per document, saving the log of the activities to gnulog: type
wget -r -l3 http://www.gnu.org/ -o gnulog
To download a three levels deep mirror of the gnu web site as above, and convert addresses of the links of the downloaded files to point to local files, so you can view the documents off-line:
wget --convert-links -r -l3 http://www.gnu.org/ -o gnulog
or for short, use
wget -mk -l3 http://www.gnu.org/ -o gnulog
Retrieve only one html page, but make sure that all the elements needed for the page to be displayed, such as inline images and external style sheets, are also downloaded. Also make sure the downloaded page references the downloaded links.
wget -p --convert-links http://www.server.com/dir/page.html
to mirror a complete website recursively(i.e. relink the downloaded files, for example, www.blogger.com type;
wget -mk http://www.blogger.com/
More support is available at gnu website, the official developers of wget
http://www.gnu.org
Subscribe to:
Posts (Atom)