What is DarkBERT? Can AI help fight cyber threats?
The popularity of large language models (LLMs) is skyrocketing, with new ones constantly appearing. Models like ChatGPT are often trained on a variety of Internet sources, including articles, websites, books, and social media.
A team of Korean researchers developed DarkBERT, an LLM trained on datasets sourced exclusively from the dark web. Their aim is to create an AI engine that outperforms existing language models and helps threat researchers, law enforcement and cybersecurity professionals fight cyberthreats.
What is DarkBERT?
DarkBERT is an encoder model based on the RoBERTa architecture. LLM has trained on millions of dark web sites, including data from hacking forums, phishing sites and other online sources associated with illegal activities.
The term "dark web" refers to a hidden part of the Internet that is not accessible through standard web browsers. This section of the Internet is notorious for harboring anonymous websites and illegal markets, such as the trade in stolen data, drugs and weapons.
To train DarkBERT, the researchers gained access to the dark web through the Tor network and collected raw data. They carefully filtered this data using techniques like deduplication, category balancing, and preprocessing to create a fine-tuned dark web database, which was then made available to RoBERTa for about 15 days to create DarkBERT.
Applications of DarkBERT in Cybersecurity
DarkBERT has an unsurpassed understanding of the language of cybercriminals and excels at detecting specific potential threats. It can study the dark web and successfully identify and flag cybersecurity threats such as data leaks and ransomware, making it a potentially useful tool against cyber threats.
Research published on arxiv.org indicates that to evaluate the effectiveness of DarkBERT, researchers compared it with two well-known NLP models, BERT and RoBERTa, evaluating their performance across three important use cases related to cybersecurity.
1. Monitor Dark Web forums for potentially harmful topics
Monitoring dark web forums, which are often used to exchange illegal information, is important for identifying potentially dangerous topics. However, manually reviewing these can be time-consuming, making process automation beneficial for security professionals.
The researchers focused on potentially harmful activity in hacking forums, providing annotated guides to notable topics, including sharing confidential data and distributing malware or critical vulnerabilities.
DarkBERT outperforms other language models in accuracy, recall, and F1 scores, emerging as a superior choice for identifying notable topics on the dark web.
2. Detecting sites that host confidential information
Hackers and ransomware groups use the dark web to create leaky websites, where they publish confidential data stolen from organizations that refuse to comply with ransom demands. Other cybercriminals simply upload leaked sensitive data, like passwords and financial information, to the dark web with the intention of selling them.
In their study, the researchers collected data from notorious ransomware groups and analyzed ransomware leak websites that publish private data of organizations. DarkBERT outperforms other language models in identifying and classifying such sites, demonstrating its understanding of the language used in underground hacking forums on the dark web.
3. Identify keywords related to threats on the Dark Web
DarkBERT leverages mask-filling, an inherent feature of the BERT family of language models, to pinpoint keywords associated with illegal activities, including drug sales on the dark web.
When the word "MDMA" was hidden in a drug page, DarkBERT generated drug-related words, while other models suggested generic words and terms unrelated to drugs, such as different professions.
DarkBERT's ability to identify keywords associated with illegal activities can be valuable in tracking and addressing emerging cyber threats.
Can the public access DarkBERT?
DarkBERT is not currently available to the public, but willing researchers can submit a request to use it for academic purposes.
You should read it
- 25 unexpected facts about orange sure you don't know
- What to do when Skype video doesn't work?
- Learn about Krita - Free alternative to GIMP
- How to view battery life on iOS 12
- Toshiba introduces a Core i3 touchscreen laptop
- New printing solution with Google Cloud Print
- Free antivirus software is better than the paid one
- 7 ways hackers steal your identity on social networks
May be interested
- Facebook is a new trend in cyber attacksattack through mxh, specifically facebook is becoming a new trend, very dangerous in the current cyber attack, mr. vu quoc thanh, vice president and secretary general of vnisa said at the information security day 2014 held yesterday (november 4).
- Security threats in VoIPnow that voip is widely accepted and becoming one of the mainstream communications technologies, security has become a major concern. let's take a look at the threats faced by voip users today.
- The threat of ransomware is threatening businessesransomware is often mentioned whenever businesses discuss the cyber threats they may face in 2021.
- Microsoft Excel is the most cyber-attacked softwareaccording to kaspersky statistics, cyber infections and attacks in businesses are increasing. the most popular form of cyber attack is still trojan.
- 7 SaaS security threats to know in 2023cloud technology is the future; businesses are doing their best to ensure they use the cloud and related services to make a profit while cutting costs.
- Cisco sends fake phishing emails to employees to teach them not to click miscellaneousover the past few years, steve martino, information security manager at cisco has developed smart techniques to fight against cyber attacks.
- Strategies for characters in Shadow Fight Arenashadow fight arena is one of the best fighting games on mobile and it's great that it transitions to the online pvp gameplay that fighting game fans crave.
- Horror 2 Gorilla fighters fight like real gladiatorssometimes in a herd, between the gorillas there are battles that are fraying or competing for power. at that time, they often rushed to fight like real gladiators.
- Instructions to download and install Cyber Hunter on computers, laptopsin this article will guide you how to download and install cyber hunter pc, laptop
- Difference between Cyber Extortion and Ransomwarealthough these two terms are often confused, there is a difference between ransomware and cyber extortion. however, this pair is linked and one can lead to the other.