Artificial intelligence systems can now convince you they are human. Two large language models have passed the Turing test, which determines if a machine can “show the same intelligence as a human being,” said The Independent. This significant development in AI is troubling, as anthropomorphizing LLMs can lead to deception and raise questions about what’s real and what isn’t.
Man or machine
In the test, a person “engages in text-based conversations with both a human and a machine without knowing which is which,” said Stanford University. If the individual cannot tell them apart, the machine is considered to have passed the test. Researchers tested four AI systems and found that newer LLMs can “effectively imitate people in short interactions,” said a study published in the journal PNAS.
“Given the right prompts, advanced LLMs can exhibit the same tone, directness, humor and fallibility as humans,” study author Cameron Jones said in a release. “While we know LLMs can easily produce knowledge on nearly every topic, this test showed that it can also convincingly display social behavioral traits, which has major implications for how we think of AI.” The four tested AI models were GPT-4.5 and Llama-3.1-405B, which were state-of-the-art models, as well as the older baseline models GPT-4o and ELIZA, a simple chatbot from the 1960s.
Of the models, “GPT-4.5 was judged to be the human 73% of the time, meaning interrogators selected it as ‘human’ significantly more often than they selected the real human participant,” said the release. Llama-3.1-405B, “given the same prompt, was judged human 56% of the time,” making it “statistically indistinguishable from the humans it was compared against.” The baseline systems performed significantly worse, with ELIZA being mistaken for human only 23% of the time and GPT-4o being mistaken 21% of the time.
No man’s land
AI models passing for humans is a concerning development. The Turing test is a “game about lying for the models,” Jones said in the release, and “one of the implications is that models seem to be really good at that.” A big risk of the existence of AI models with this ability is the rise of “counterfeit people.” Thanks to the ease of deception, we “need to be more alert,” and “people should be much less confident that they know they’re talking to a human rather than an LLM.” Still, AI is not yet at a level where it can be deceptive on its own.
While the bots did pass the Turing test, they also required specific instructions to do so. Each of the systems was “instructed to adopt a persona, or a specific character and communication style,” said The Independent. These prompts “worked partly by leading the systems to make mistakes in the same way a human would.” When the models were not prompted, they were much less likely to be mistaken for humans, and GPT-4.5 fell to a 36% win rate and Llama-3.1-405B to a 38% win rate. The models “have the ability to appear humanlike,” study co-author Ben Bergen said in the release, “but maybe not as much the ability to figure out what it would take to appear humanlike.”
The systems can imitate humans
