AI tool detects LLM-generated text in research and peer-reviewed articles

An analysis of tens of thousands of research papers has shown a significant increase in the presence of text generated using artificial intelligence (AI) over the past few years, according to an academic publisher.

The American Association for Cancer Research (AACR) found that 23% of abstracts in manuscripts and 5% of peer-reviewed papers submitted to their journals in 2024 contained text that was likely generated by large language models (LLMs). Publishers also found that less than 25% of authors disclosed the use of AI in preparing their manuscripts, despite publishers requiring disclosure at submission.

AI tool detects LLM-generated text in research and peer-reviewed articles Picture 1

To screen manuscripts for signs of AI use, AACR used an AI tool developed by New York City-based Pangram Labs. When applied to 46,500 abstracts, 46,021 methods sections, and 29,544 peer review comments submitted to 10 AACR journals between 2021 and 2024, the tool detected an increase in suspicious AI-generated text in submissions and review reports since the public release of OpenAI's ChatGPT chatbot in November 2022.

' We were shocked to see the results from Pangram ,' said Daniel Evanko, AACR's director of journal operations and systems, who presented the findings at the 10th International Congress on Peer Review and Scientific Publishing in Chicago, Illinois, on September 3.

The analysis found that AI-generated text in peer review reports dropped by 50% by the end of 2023, after AACR banned peer reviewers from using LLMs. But detections of AI-generated text in peer review comments doubled by early 2024 and continued to increase.

' It's worrying to see people increasingly using LLMs for peer review even though we have banned it ,' Evanko said. ' Our goal is certainly to start screening all incoming manuscripts and all incoming peer review comments ,' he added.

The tool 'seems to work very well,' said Adam Day, founder of Clear Skies, a London-based research integrity company. However, ' there may be bias that we're not seeing in terms of false positive rates, and we should be aware of that ,' he added.

99.85% accuracy

Pangram was trained on 28 million handwritten documents before 2021, including 3 million scientific papers, as well as 'AI mirrors' — LLM-generated text that replicates human-written passages in length, style, and tone.

Max Spero, CEO of Pangram Labs, said adding active learning to Pangram was 'one of the breakthroughs' that allowed it to reduce its false positive rate—the proportion of text that is incorrectly marked as having been written by AI. He and his team continually retrained the tool, ' reducing our false positive rate from about one in 10,000 to about one in 10,000, ' he said.

In a preprint published last year, Spero and his colleagues showed that Pangram was 99.85% accurate, with an error rate 38 times lower than other existing AI detection tools.

Testing the AI ​​detection tool on manuscripts before ChatGPT was released in November 2022, it detected only seven abstracts and no methods sections or peer-reviewed reports containing text that was likely AI-generated. ' From there, detections just increased linearly and at what we thought was a very high rate ,' Evanko said.

The tool can also distinguish between different LLMs, including ChatGPT, DeepSeek, LLaMa and Claude models. ' We can only do this because we created our entire training dataset ourselves, so we know exactly where the source is, we know which model the training data comes from ,' Spero explains.

Pangram's current model cannot distinguish between paragraphs that were entirely generated by AI and those that were written by humans but edited with AI.

Language Support

AACR used Pangram to analyze submissions in 2024 including 11,959 abstracts, 11,875 methods sections, and 7,211 peer review reports.

Their analysis found that authors at institutions in countries where English is not the native language were more than twice as likely to use LLMs.

' I was really shocked to see the high level of usage in the methods section, ' Evanko said. ' Asking an LLM to improve the language of the methods section can cause errors. because those details need to be precise in how you do something, and if you rephrase something, it may no longer be precise, ' he added.

3.5 ★ | 2 Vote