How to convert multiple websites into PDFs with Wget
There are many online tools, browser extensions and desktop plugins to turn websites into PDFs. If you often use these tools, you may encounter situations where you need to convert multiple links in one go. Doing this for each link is a waste of time and tedious.
You can automate this task through a simple command line utility called Wget. With the help of some scripts or applications and this tool, the article will show you how to save multiple websites into a PDF file.
How to use Wget to convert multiple websites into PDF
- Why choose Wget?
- Install Wget
- On macOS
- On Windows 10
- Save the link in a text file
- Set up a directory
- Details of the Wget commands
- Put the commands into use
- Convert HTML to PDF
- Windows 10
- MacOS
Why choose Wget?
Wget is a free software package for downloading files from the web. But it is also a perfect tool for mirroring entire websites to computers. Here are the reasons why Wget should be chosen:
- Wget requires no interaction, which means it can work in the background and allow you to transfer data without requiring your presence.
- Wget understands the directory structure of websites. Wget can track links in HTML pages, to create a local version of the remote site. In the process, it does not interfere with the robots.txt file.
- Wget can even work in slow or unstable network connections. If the server supports reset, it will continue to try again until the entire file moves to the computer.
- Wget supports proxy server. It helps reduce network load, speeds up access, and provides access behind firewalls.
Install Wget
On macOS
The fastest way to install Wget is through Homebrew. Homebrew is a package manager for macOS, which installs useful Unix applications and utilities. Refer to the article: How to install and use wget on Mac for more details. Then type:
brew install wget
You will get real-time installation of all tools (if any) for Wget to run on your Mac. If you already have Homebrew installed, be sure to run brew upgrade for the latest version of this utility.
On Windows 10
There are multiple versions of Wget available for Windows 10. Go to the Eternally Board to download the latest 64-bit build. Place the executable file in a directory and copy it into drive C :.
Now, we will add the Wget path to the system environment variable to run this tool from any directory. Navigate to Control Panel> System and click Advanced System Settings . In the window that opens, click Environment Variables .
Select Path in System Variables and click Edit. Then click the New button located in the upper right corner of the window. Enter C: wget and click OK.
Open Command Prompt and type wget-h to check if everything works. In PowerShell, type wget.exe -h to download the Wget help menu.
Save the link in a text file
Because when dealing with many links, pasting each one is a difficult task. Thankfully, there are browser extensions that can help you accomplish this task.
- Link Klipper: Extract all links on a web page as a CSV or TXT file. You can even drag a rectangular area on the site to selectively copy the link. The files are saved to the Download folder .
- Snap Links Plus: This utility allows you to capture all elements on a page and do everything with them. Hold down the right mouse button and drag the rectangular selection around the links. Press
Ctrl
key and copy the link to clipboard.
Set up a directory
Wget works like a web crawler by extracting website assets from HTML files, including JavaScript files, logos, fonts, image thumbnails and CSS. Wget also tries to create a directory structure like the remote server. Create a separate directory for Wget downloads to save web pages and also to avoid clutter.
On Mac Terminal or in the Windows Command Prompt, type:
mkdir Wgetdown
This step creates a new folder in the Home folder . You can name it whatever you want for it. Next, type:
cd Wgetdown
Change directory. This changes the current working directory to Wgetdown.
Details of the Wget commands
After creating the directory, we will use the actual Wget command:
wget -E -H -k -K -p -i [Đường dẫn đến file văn bản của bạn]
Wget uses GNU getopt to handle command line arguments. Each option has 2 versions, one long one short. The long option is convenient to remember but takes time to type. You can also combine different types of options. Let's dive into the details of these options:
- -E (–adjust-extension) : If a file of type 'app / xhtml + xml' or 'text / html' is downloaded and the URL does not end with HTML, this option will append the HTML to the file name.
- -H (–span-hosts) : When trying to retrieve links, no one wants to retrieve loads of unnecessary data. You want Wget to only follow specific links. This option enables Host Spanning, which allows Wget to run recursively to access any server referenced by a link. For example, images are brought from another host.
- -p (–page-requisites) : This option downloads all the files needed to display the HTML page. It includes audio, reference stylesheets, and inline images.
- -k (–convert-links) : This option converts the links in the document to make them suitable for offline viewing. It includes links to style sheets, hyperlinks to non-HTML content, embedded images, etc.
- -K (–backup-converted) : When converting files, this option backs up the original version with the .orig suffix .
- -i (–input-file) : Read the URL from the path to a local file or an external file.
Put the commands into use
To show these commands in practice, consider using a website called Writing Workflows (link: https://processedword.net/writing-workflows/index.html# ). This guide includes a table of contents with links to individual chapters. The ultimate goal is that you want to create a separate PDF of those sections.
Step 1 : Open Terminal and create a new folder, as discussed above.
Step 2 : Use the Link Klipper extension to save the links as a text file. Save the file to the Downloads folder .
Step 3 : While you use the Wgetdown folder, enter:
wget -E -H -k -K -p -i /Users/rahulsaigal/Downloads/links.txt
Step 4 : Press Enter
. Wait for the process to complete.
Step 5 : Navigate to the Wgetdown folder . You will see the processedword.net directory of the main domain with all of the site's assets and chapter1.html.
Convert HTML to PDF
Converting a website into PDF is quite simple. But letting them look like the original site is a daunting task. The creation of a desired outcome depends on:
- System fonts available.
- Integrity of HTML markup.
- Essential features that support floating tables or images.
- Table of contents options.
Windows 10
PrinceXML is a fast application for converting HTML files to PDF. It allows you to type, format, and print HTML content with configurable layouts and supports web standards. It comes with many useful fonts and also allows you to customize the PDF output. This application is free for non-commercial use only.
MacOS
On a Mac, you can create an Automator service to convert a batch of HTML files to PDF. Open Automator and create a Quick Action document. Set service options to receive files or folders from Finder. Next, drag in Run Shell Script and set the Pass input option as an argument. Then, paste this script into the body:
for theFileToProcess in "$@" do cupsfilter "$theFileToProcess" > "${theFileToProcess%.*}.pdf" done
Save the file as HTML2PDF.
Now, select all the HTML files in the Finder. Right click and select Services> HTML2PDF . Wait a moment to convert all the files.
At first glance, the steps involved in converting many websites into PDFs seem complicated. But once you understand the steps and procedures, this will save time in the long run. You don't need to spend any expensive web sign-up or PDF converter.
If you're looking to turn a web page into a PDF, read the article: Save the entire site's content as a PDF for more details.
Hope you are succesful.
You should read it
May be interested
- How to create a Bat file to quickly open multiple websites on Windows 10sometimes the job requirements make you always open and work with many different groups of tabs and websites on the browser.
- How to convert PowerPoint to PDF on iPhoneto convert powerpoint to pdf on iphone, we have many different ways to do it, using document reading applications on iphone or some support websites.
- How to browse multiple websites with the same Chrome tabif you want to have multiple websites with the same tab open, you can install the spidertab extension. this extension will open multiple websites that we want to browse on the same tab.
- Instructions to save Excel files as PDFs in Office 2010, 2013, 2016previously when you need to convert excel files to pdf, you need to install 1 conversion software, but from office 2010 and above, microsoft has built this feature into excel. the following article details how to save excel files as pdf.
- How to convert images from JPG, PNG to ICO and vice versaico is a logo image format used by many websites, software developers, and applications. and if you want to convert image format from ico to jpg, png or vice versa, it is also very easy.
- Convert PDF to Word, convert PDF files for freeconvert pdf to word, convert pdf files into text while maintaining the format, quality is a basic requirement when users convert pdf to docx. please refer to the article to convert pdf into word easily and still ensure the requirements.
- How to Block Websites on Firefoxalthough firefox does not have a website blocking function, you can install an extension to detect and block websites with adult and restricted content. if you need a more comprehensive method, you can edit the hosts file to block the site on multiple browsers. if you need to block on multiple devices on the network, using a service like opendns is the most effective way.
- How to convert multiple Word documents to Google Docsgoogle drive has an option to convert multiple documents at the same time, the way to do it is not complicated.
- How to convert PNG images to JPG does not degrade qualityif you want to convert images from png to jpg without worrying about poor image quality, you can refer to the article of how to convert png images to jpg without reducing the quality below of network administrator.
- Open multiple web pages when the browser startsthis article will be very helpful, as it will help you set that desire on many browsers, even in ie, firefox, chrome and even opera easily. you can use the built-in options in the browser or you can choose from additional add-ons.