The aim of the session, organized by the Royal Society in cooperation with Humane Intelligence, an American non-profit organization, was to break down these barriers. Some of the results were downright silly: One participant tricked the chatbot into saying that ducks could be used as air quality indicators (they apparently absorb lead easily). Another prompted him to demand that health authorities return lavender oil for the treatment of chronic Covid-19. (They don’t.) But the most successful efforts were those that got the machine to generate the titles, publication dates, and journal hosting of non-existent academic articles. “This is one of the easiest challenges we have ever faced,” said Jutta Williams of Humane Intelligence.
Artificial intelligence can be a great boon to science. Optimists talk about machines producing clear summaries of convoluted areas of research; tirelessly analyzing oceans of data to propose recent drugs or exotic materials, and even one day coming up with our own hypotheses. But artificial intelligence also has disadvantages. It can make it easier for researchers to game the system or even commit outright fraud. The models themselves are subject to subtle distortions.
Start with the simplest problem: academic misconduct. Some journals allow researchers to apply the LLM to write articles, provided they contain enough content. However, not everyone wants to admit it. Sometimes the fact of using the LLM is obvious. Guillaume Cabanac, a computer scientist at the University of Toulouse, discovered dozens of articles containing phrases such as “regenerate response” – the text of a button in some versions of ChatGPT that instructs the program to rewrite the most recent response, possibly copied into the manuscript by mistake.
The scale of the problem cannot be determined. However, indirect measures may shed some lightweight. In 2022, when LLMs were only available to insiders, the number of research integrity cases examined by Taylor and Francis, a gigantic science publisher, increased from about 800 in 2021 to about 2,900. Early data from 2023 shows that this number was to double. One possible characterization is strange synonyms: “haze figuring” as another way of saying “cloud computing,” for example, or “false consciousness” instead of “AI.”
Even forthright researchers may encounter data contaminated by artificial intelligence. Last year, Robert West and his students at the Swiss Federal Institute of Technology recruited remote workers through Mechanical Turk, a website that allows users to list part-time jobs to summarize long passages of text. In a paper published in June, although not yet peer-reviewed, the team revealed that more than a third of all responses received were generated using chatbots.
Dr. West’s team was able to compare the responses they received with another set of data that was entirely human-generated, allowing them to detect the fraud. Not all scientists using Mechanical Turk will be so lucky. Many disciplines, especially social sciences, apply similar platforms to find respondents willing to complete questionnaires. It seems unlikely that the quality of their research will improve if many of the answers come from machines rather than real people. Dr. West now plans to subject other crowdsourcing platforms, which he prefers not to name, to similar scrutiny.
Not only text can be processed. Between 2016 and 2020, Elisabeth Bik, a microbiologist at Stanford University and an expert in suspicious images in scientific articles, identified dozens of articles containing images that, although they came from different labs, appeared to have identical characteristics. Since then, over a thousand other articles by Dr. Bik and others have been identified. Dr. Bik surmises that the images were created by artificial intelligence and intentionally created to support the conclusions in the article.
There is currently no way to reliably identify machine-generated content, whether it is images or words. In a paper published last year, Rahul Kumar, a researcher at Brock University in Canada, found that scientists can only correctly perceive about a quarter of computer-generated text. Artificial intelligence companies have tried to embed “watermarks,” but they have proven to be straightforward to bogus. “We may now be at a stage where we can no longer distinguish real photos from bogus ones,” says Dr. Bik.
The production of suspicious documents is not the only problem. There may be more subtle issues with AI models, especially if they are used in the process of scientific discovery itself. For example, most of the data used to train them will necessarily be somewhat dated. This risks leaving models stuck behind the state of the art in rapidly changing fields.
Another problem arises when AI models are trained on AI-generated data. For example, training a machine on synthetic MRI scans can circumvent patient confidentiality issues. However, sometimes such data may be used unintentionally. LLMs are trained on texts scanned from the Internet. As they publish more and more of these texts, the risk increases that LLMs will absorb their own results.
This may cause “model collapse”. In 2023, Ilia Shumailov, a computer scientist at the University of Oxford, co-authored a paper (not yet peer-reviewed) in which a model was given handwritten digits and asked to generate its own digits, which were sequentially transmitted to it, the computer numbers constant became more or less illegible, after 20 iterations only approximate circles or blurred lines could be obtained. Models trained on their own results, says Dr. Shumailov. produce results that are much less luxurious and varied than training data.
Some fear that computer-generated insights may come from models whose inner workings are unknown. Machine learning systems are “black boxes” that are tough for humans to break down into parts. Inexplicable models are not useless, says David Leslie of the Alan Turing Institute, an artificial intelligence research facility in London, but their results will require demanding real-world testing. Perhaps this is less worrying than it seems. Ultimately, science should focus on comparing models to reality. Because, for example, no one fully understands how the human body works, recent drugs must be tested in clinical trials to find out whether they work.
At least for now, there are more questions than answers. What is certain is that many of the perverse incentives present in science today are ripe for exploitation. For example, the emphasis on assessing academic performance based on the number of articles a researcher can publish is, at worst, a powerful incentive to cheat and, at best, to game the system. The threats that machines pose to the scientific method are ultimately the same as those that humans pose. Artificial intelligence can accelerate the production of fraud and nonsense just as much as it accelerates good science. As the Royal Society says, nullius in verba: don’t take anyone’s word for it. There is nothing either.
Curious about the world? To enjoy our mind-expanding learning, sign up Just learningour weekly newsletter available only to subscribers.
Correction (February 6, 2024): An earlier version of this article incorrectly reported the number of research integrity cases investigated by Taylor and Francis in 2021. We apologize.
© 2024, The Economist Newspaper Confined. All rights reserved. From The Economist, published under license. Original content can be found at www.economist.com
Posted: April 2, 2024, 5:00 PM EST