How to convert multiple websites into PDFs with Wget
There are many online tools, browser extensions and desktop plugins to turn websites into PDFs. If you often use these tools, you may encounter situations where you need to convert multiple links in one go. Doing this for each link is a waste of time and tedious.
You can automate this task through a simple command line utility called Wget. With the help of some scripts or applications and this tool, the article will show you how to save multiple websites into a PDF file.
How to use Wget to convert multiple websites into PDF
- Why choose Wget?
- Install Wget
- On macOS
- On Windows 10
- Save the link in a text file
- Set up a directory
- Details of the Wget commands
- Put the commands into use
- Convert HTML to PDF
- Windows 10
- MacOS
Why choose Wget?
Wget is a free software package for downloading files from the web. But it is also a perfect tool for mirroring entire websites to computers. Here are the reasons why Wget should be chosen:
- Wget requires no interaction, which means it can work in the background and allow you to transfer data without requiring your presence.
- Wget understands the directory structure of websites. Wget can track links in HTML pages, to create a local version of the remote site. In the process, it does not interfere with the robots.txt file.
- Wget can even work in slow or unstable network connections. If the server supports reset, it will continue to try again until the entire file moves to the computer.
- Wget supports proxy server. It helps reduce network load, speeds up access, and provides access behind firewalls.
Install Wget
On macOS
The fastest way to install Wget is through Homebrew. Homebrew is a package manager for macOS, which installs useful Unix applications and utilities. Refer to the article: How to install and use wget on Mac for more details. Then type:
brew install wget
You will get real-time installation of all tools (if any) for Wget to run on your Mac. If you already have Homebrew installed, be sure to run brew upgrade for the latest version of this utility.
On Windows 10
There are multiple versions of Wget available for Windows 10. Go to the Eternally Board to download the latest 64-bit build. Place the executable file in a directory and copy it into drive C :.
Now, we will add the Wget path to the system environment variable to run this tool from any directory. Navigate to Control Panel> System and click Advanced System Settings . In the window that opens, click Environment Variables .
Select Path in System Variables and click Edit. Then click the New button located in the upper right corner of the window. Enter C: wget and click OK.
Open Command Prompt and type wget-h to check if everything works. In PowerShell, type wget.exe -h to download the Wget help menu.
Save the link in a text file
Because when dealing with many links, pasting each one is a difficult task. Thankfully, there are browser extensions that can help you accomplish this task.
- Link Klipper: Extract all links on a web page as a CSV or TXT file. You can even drag a rectangular area on the site to selectively copy the link. The files are saved to the Download folder .
- Snap Links Plus: This utility allows you to capture all elements on a page and do everything with them. Hold down the right mouse button and drag the rectangular selection around the links. Press
Ctrl
key and copy the link to clipboard.
Set up a directory
Wget works like a web crawler by extracting website assets from HTML files, including JavaScript files, logos, fonts, image thumbnails and CSS. Wget also tries to create a directory structure like the remote server. Create a separate directory for Wget downloads to save web pages and also to avoid clutter.
On Mac Terminal or in the Windows Command Prompt, type:
mkdir Wgetdown
This step creates a new folder in the Home folder . You can name it whatever you want for it. Next, type:
cd Wgetdown
Change directory. This changes the current working directory to Wgetdown.
Details of the Wget commands
After creating the directory, we will use the actual Wget command:
wget -E -H -k -K -p -i [Đường dẫn đến file văn bản của bạn]
Wget uses GNU getopt to handle command line arguments. Each option has 2 versions, one long one short. The long option is convenient to remember but takes time to type. You can also combine different types of options. Let's dive into the details of these options:
- -E (–adjust-extension) : If a file of type 'app / xhtml + xml' or 'text / html' is downloaded and the URL does not end with HTML, this option will append the HTML to the file name.
- -H (–span-hosts) : When trying to retrieve links, no one wants to retrieve loads of unnecessary data. You want Wget to only follow specific links. This option enables Host Spanning, which allows Wget to run recursively to access any server referenced by a link. For example, images are brought from another host.
- -p (–page-requisites) : This option downloads all the files needed to display the HTML page. It includes audio, reference stylesheets, and inline images.
- -k (–convert-links) : This option converts the links in the document to make them suitable for offline viewing. It includes links to style sheets, hyperlinks to non-HTML content, embedded images, etc.
- -K (–backup-converted) : When converting files, this option backs up the original version with the .orig suffix .
- -i (–input-file) : Read the URL from the path to a local file or an external file.
Put the commands into use
To show these commands in practice, consider using a website called Writing Workflows (link: https://processedword.net/writing-workflows/index.html# ). This guide includes a table of contents with links to individual chapters. The ultimate goal is that you want to create a separate PDF of those sections.
Step 1 : Open Terminal and create a new folder, as discussed above.
Step 2 : Use the Link Klipper extension to save the links as a text file. Save the file to the Downloads folder .
Step 3 : While you use the Wgetdown folder, enter:
wget -E -H -k -K -p -i /Users/rahulsaigal/Downloads/links.txt
Step 4 : Press Enter
. Wait for the process to complete.
Step 5 : Navigate to the Wgetdown folder . You will see the processedword.net directory of the main domain with all of the site's assets and chapter1.html.
Convert HTML to PDF
Converting a website into PDF is quite simple. But letting them look like the original site is a daunting task. The creation of a desired outcome depends on:
- System fonts available.
- Integrity of HTML markup.
- Essential features that support floating tables or images.
- Table of contents options.
Windows 10
PrinceXML is a fast application for converting HTML files to PDF. It allows you to type, format, and print HTML content with configurable layouts and supports web standards. It comes with many useful fonts and also allows you to customize the PDF output. This application is free for non-commercial use only.
MacOS
On a Mac, you can create an Automator service to convert a batch of HTML files to PDF. Open Automator and create a Quick Action document. Set service options to receive files or folders from Finder. Next, drag in Run Shell Script and set the Pass input option as an argument. Then, paste this script into the body:
for theFileToProcess in "$@" do cupsfilter "$theFileToProcess" > "${theFileToProcess%.*}.pdf" done
Save the file as HTML2PDF.
Now, select all the HTML files in the Finder. Right click and select Services> HTML2PDF . Wait a moment to convert all the files.
At first glance, the steps involved in converting many websites into PDFs seem complicated. But once you understand the steps and procedures, this will save time in the long run. You don't need to spend any expensive web sign-up or PDF converter.
If you're looking to turn a web page into a PDF, read the article: Save the entire site's content as a PDF for more details.
Hope you are succesful.
You should read it
- How to install and use wget on Mac
- How to convert PDF files to JPG on Mac
- The difference between cURL and wget
- How to Convert Pages to Word
- How to use Wget to download files on Linux
- Save the entire web content as PDF
- How to save Firefox websites as PDF
- How to type pages in Word 2007 from any page
- How to delete white pages in Word is easy
- How to turn pages quickly in Word
- How to Convert Web Pages to PDF Files
- Delete pages in Word - Instructions on how to delete pages in Word