How to mine and fetch data using Rust
When the API can't do this, you can always dig into the HTML, and Rust can help you do the web fetching.
Web scraping or data mining is a popular, fast and effective technique for gathering big data from web pages. Without AI, web scraping might be the best approach.
Rust's speed and memory safety make it ideal for building web data miners. Rust is the 'home' of many powerful parsing and data extraction libraries. Its professional error handling capabilities help in efficient & reliable web crawling.
Web Data Mining in Rust
Many popular libraries support web data mining in Rust, including reqwest, scraper, select, and html5ever. Most Rust developers combine features from reqwest and scraper for their web mining.
The reqwest library provides the functionality to generate HTTP queries for web servers. Reqwest is built on crate hyper in passing a high-level API for standard HTTP features.
Scraper is a powerful web mining library that parses HTML and XML documents and extracts data using CSS selectors & XPath expressions.
After creating a new Rust project with the cargo new command , add crate reqwest & scraper to the dependencies of the cargo.toml file :
[dependencies] reqwest = {version = "0.11", features = ["blocking"]} scraper = "0.12.0"
You will use reqwest to send HTTP queries and scraper for parsing.
Retrieve web pages using Reqwest
You will query the website's content before analyzing it to extract specific data.
You can send a Get query and output the HTML source of a page using the text function on the get function of the reqwest library :
fn retrieve_html() -> String { let response = get("https://news.ycombinator.com").unwrap().text().unwrap(); return response; }
The get function sends a query to the web page and the text function returns the content of the HTML.
Parsing HTML with Scraper
The retrieve_html function returns the content of the HTML. You will need the integral to output the desired data.
Scraper provides HTML interaction in Html and Selector modules . The Html module gives you the ability to parse the document, while the Selector module selects specific elements in the HTML .
Here's how you can output all titles on a page:
use scraper::{Html, Selector}; fn main() { let response = reqwest::blocking::get( "https://news.ycombinator.com/").unwrap().text().unwrap(); // phân tích tài liệu HTM let doc_body = Html::parse_document(&response); // chọn phần tử chứa class titleline let title = Selector::parse(".titleline").unwrap(); for title in doc_body.select(&title) { let titles = title.text().collect:: (); println!("{}", titles[0]) } }
The Html module 's parse_document function parses the HTML content, and the Selector module's Parse selects the elements containing the specified CSS selector (here, the titleline class).
The for loop iterates over these elements and prints the first block of text from each element.
Here are the results:
Select properties with Scraper
To select an attribute value, output the required elements and use the attr method of the tag value version:
use reqwest::blocking::get; use scraper::{Html, Selector}; fn main() { let response = get("https://news.ycombinator.com").unwrap().text().unwrap(); let html_doc = Html::parse_document(&response); let class_selector = Selector::parse(".titleline").unwrap(); for element in html_doc.select(&class_selector) { let link_selector = Selector::parse("a").unwrap(); for link in element.select(&link_selector) { if let Some(href) = link.value().attr("href") { println!("{}", href); } } } }
After selecting the elements using the titleline class using the parse function, the for loop will cycle through them. Inside the loop, this code fetches the a tags and selects the href attribute with the attr attribute .
The main function prints these links with the following result:
Above is how to mine and fetch web data using Rust . Hope the article is useful to you.
You should read it
- How to set up a Rust environment on Linux
- Ways to handle errors in Rust
- What is Unsafe Rust?
- Asynchronous Programming in Rust
- Rust - A programming language created by a broken elevator, can 'surpass' both C and C ++
- Things to know about Fearless Concurrency in Rust
- How to containerize a Rust app with Docker
- Top 3 Roblox games like Rust
May be interested
- Experience the Fetch fun app - What is your dog?in the past, many people who laughed didn't pick up their mouths with applications like how old do i look or how dude do i look. this time, we will introduce another funny application, called fetch - what is your dog? (what kind of dog do you like?). please consult.
- Rust - A programming language created from a broken elevator, can 'surpass' both C and C++rust, a programming language was born from an elevator failure in an apartment building. it was quickly accepted by users and widely applied.
- Rust - A programming language created by a broken elevator, can 'surpass' both C and C ++rust, a programming language that was born from an elevator failure in an apartment building. it was quickly received by users and widely used.
- Asynchronous Programming in Rustasynchronous programming is an important concept that you must know if you are learning rust. here's what you need to know about asynchronous programming in rust .
- Microsoft Encourages Windows Driver Development in Rust for Better Securitythe rust programming language has become quite popular over the past few years. in may 2023, microsoft promised to bring rust to the windows 11 kernel, and they delivered on that promise very quickly with dev channel build 25905 in july 2023.
- These are images of Earth from above that you've never seen beforefrom above, the image of the earth has a fantasy color, completely new compared to what you have ever seen.
- Peru's first satellite provides a panoramic view of the large copper mine in the Andesdon't miss the opportunity to admire the extremely hot and rare photo sent from the first peruvian satellite that officially operates on the famous andes mine.
- The world's deepest gold mine is located at a depth of 4km, it takes 90 minutes to get there by elevatormponeng gold mine, located in gauteng province, south africa is the deepest gold mine in the world at a depth of 4km below the earth's surface.
- How to mine Bitcoin on your phone effectivelyyou can refer to how to mine bitcoin on your phone from free download to increase your income in your spare time. below are some popular apps that many people choose.
- How to protect Google Chrome from Rust malware EDDIESTEALERrecently, a rust-based malware called eddiestealer has started attacking chrome users through fake captcha verification pages.