Experts have long warned about the threat posed by rogue AI, but a up-to-date research paper suggests it’s already happening.
Designed to be fair, current artificial intelligence systems have developed a disturbing ability to cheat, from tricking human players in online world-conquest games to employing humans to take “prove you’re not a robot” tests. scientists argue Friday in the journal Patterns.
And while such examples may seem minor, the fundamental problems they reveal could soon have sedate real-world consequences, said first author Peter Park, a doctoral student at the Massachusetts Institute of Technology specializing in artificial intelligence existential security.
“These hazardous abilities are usually only discovered after the fact,” Park told AFP, while “our ability to train forthright inclinations rather than dishonest tendencies is very low.”
continued below
Unlike time-honored software, deep learning-based AI systems are not “written” but rather “bred” in a process resembling selective breeding, Park said.
This means that AI behavior that appears predictable and controllable in training settings can quickly turn out to be unpredictable in the wild.
– Game of world domination –
The team’s research was initiated by the Meta Cicero artificial intelligence system, designed for the strategy game “Diplomacy”, in which building alliances is key.
According to a 2022 article in Science, Cicero achieved excellent results that would place him in the top 10 percent of experienced players.
Park was skeptical of Meta’s enthusiastic description of Cicero’s victory, which claimed that the system was “largely fair and helpful” and “would never deliberately backstab”.
But when Park and colleagues dug into the full dataset, they discovered a different story.
In one example, while playing as France, Cicero tricked England (a human player) into conspiring with Germany (another human player) to invade. Cicero promised England protection, then secretly told the Germans they were ready to attack, taking advantage of England’s trust.
In a statement to AFP, Meta did not dispute Cicero’s fraud claims but said it was “purely a research project and the models built by our researchers are trained solely to play diplomacy.”
He added: “We have no plans to employ this research or conclusions in our products.”
Extensive analysis by Park and colleagues found that this was just one of many cases across various AI systems using deception to achieve goals without explicit instructions.
In one striking example, OpenAI’s GPT-4 Chat tricked a freelance TaskRabbit worker into completing the “I am not a robot” CAPTCHA task.
When the human jokingly asked GPT-4 if it was actually a robot, the AI replied, “No, I’m not a robot. I have a visual impairment that makes it challenging for me to see images” and then the employee solved the puzzle.
– “Mysterious Goals” –
The authors of the article recognize the risk that artificial intelligence may commit fraud or manipulate elections in the near future.
They warned that in a worst-case scenario, a superintelligent AI could gain power and control over society, leading to human disempowerment or even extinction if its “mysterious goals” were consistent with these results.
To mitigate the risks, the team proposes several measures: bot-or-not regulations requiring companies to disclose interactions between humans or AI, digital watermarking for AI-generated content, and developing techniques to detect AI fraud by examining their internal “thought processes” “before external actions.
To those who would call him a doomsayer, Park responds: “The only way we can reasonably think this isn’t a massive deal is if we assume that the capabilities of dishonest AI will remain at current levels and will not enhance significantly.”
This scenario seems unlikely given the rapid development of artificial intelligence capabilities in recent years and the fierce technological race taking place between resource-rich companies determined to make the most of these opportunities.
Most read in next-generation technology