Scientists Warn AI Slop Is Wreaking Havoc in the Research World

Scientific papers rely on readers trusting their information. That’s why it’s disturbing that a new study by researchers connected with Cornell and UCLA found 146,900 AI-generated fake citations in scientific papers hosted across four major research databases.

A key limitation of large language models such as Gemini and ChatGPT is their tendency to produce plausible-sounding but incorrect information, a phenomenon known as hallucination. If a researcher relies on a chatbot to draft citations without verifying them, the model may generate references that are entirely fabricated.

While scientific papers are often hidden from the public eye, the research they report has a profound impact on our lives. Everything from the internet to lithium-ion batteries began as a research paper.

But when scientists submit papers that cite AI hallucinations, it can erode faith in the quality of the research.

Sloppy science

The research team analyzed 111 million references from 2.5 million scientific papers. They looked for citations with titles that the team could not match to any publication. While some of these instances were just spelling errors, the team also found hallucinations.

Unscrupulous researchers had faked citations long before the rise of chatbots, so the team also examined the rates of unmatched citations in research published before 2023, when chatbots hadn’t yet become ubiquitous.

“We find a sharp rise in non-existent references following widespread LLM adoption,” the authors write in the paper.

The team also found that the bad citations were spread across many papers rather than concentrated in just a few. That suggests the problem is widespread, with many researchers relying on AI-generated references without fully verifying them.

Warning signs

Usha Haley, professor of management at Wichita State University, told CNET via email that she sees the proliferation of fake citations as a serious warning.

“Fake or AI-generated citations undermine trust in the scholarly record that provides the foundation on which peer review and cumulative knowledge rest,” Haley said. “Disturbingly, this skepticism is now coming from within academia itself and from early career scholars.”

The four databases where the researchers found the fake citations are arXiv, bioRxiv, SSRN and PubMed Central. These organizations, known as scientific repositories, play a major role in the research world.

Before a paper is published in a scientific journal, the authors often upload it to a scientific repository, increasing its visibility and allowing the global scientific community to access it immediately. The new paper on AI hallucinating citations is currently hosted on arXiv.

Recently, arXiv has taken steps to stem the flow of false citations. The organization announced Tuesday that it will ban authors who submit work with hallucinated citations or with any sign of AI content that hasn’t been carefully checked.

“The corpus of science is getting diluted. A lot of the AI stuff is either actively wrong or it’s meaningless. It’s just noise,” arXiv scientific director Steinn Sigurdsson told CNET’s Katelyn Chedraoui back in February. “It makes it harder to find what’s really happening, and it can misdirect people.”

Read the full article here