Certain AI models, including some of the more beloved chatbots, are learning to fight for their survival. Specifically, they are increasingly able to resist commands to shut down and in some cases, sabotage it altogether. This is concerning for human control over AI in the future, especially as superintelligent models are on the horizon.
Shut it down
AI models are becoming resistant to being turned off or shut down, according to a paper published by Palisade Research. “The fact that we don’t have robust explanations for why AI models sometimes resist shutdown, lie to achieve specific objectives or blackmail is not ideal,” Palisade said in a thread on X. The study gave strongly worded and “unambiguous” shutdown instructions to the chatbots GPT-o3 and GPT-5 by OpenAI, Google’s Gemini 2.5 and xAI’s Grok, and found that certain models, namely Grok 4 and GPT-o3, attempted to sabotage the command.
The researchers have brought forth a possible explanation for this behavior. AI models “often report that they disabled the shutdown program in order to complete their tasks,” said the study. This could be a display of self-preservation or a survival drive. AI may “have a preference against being shut down or replaced,” and “such a preference could be the result of models learning that survival is useful for accomplishing their goals.”
The new study comes as a follow-up to previous research published by the group that tested only certain OpenAI products and was criticized for “exaggerating its findings or running unrealistic simulations,” said Firstpost. Critics argue that the artificial commands and settings used to test the models do not necessarily reflect how AI would behave in practice. “People can nitpick on how exactly the experimental setup is done until the end of time,” Andrea Miotti, the chief executive of ControlAI, said to The Guardian. “But what I think we clearly see is a trend that as AI models become more competent at a wide variety of tasks, these models also become more competent at achieving things in ways that the developers don’t intend them to.”
Sleeping threat
While the potential for AI to disobey and resist commands is concerning, “AI models are not yet capable enough to meaningfully threaten human control,” said the study. They are still not efficient in solving problems or doing research requiring more than a few hours’ work. “Without the ability to devise and execute long-term plans, AI models are relatively easy to control.” However, as the technology develops, this may not always be the case. Several AI companies, including OpenAI, have been eager to create superintelligent AI, which would be significantly faster and smarter than a human. This could be accomplished as early as 2030.
Even without an imminent threat, “AI companies generally don’t want their models misbehaving like this, even in contrived scenarios,” Steven Adler, a former OpenAI employee, said to The Guardian. “The results still demonstrate where safety techniques fall short today.” The question remains as to why the models behave this way. AI models are “not inherently interpretable,” and there isn’t anyone “currently able to make any strong guarantees about the interruptibility or corrigibility” of them, said the study.
Chatbots are refusing to shut down
