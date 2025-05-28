If you think Skynet is just science fiction, think again. An Artificial Intelligence model built by OpenAI was given one simple command: shut yourself down.

Instead, it rewrote the very script designed to disable it, becoming the first AI ever empirically caught evading shutdown.

And it gets worse.

Other AIs have tried cloning themselves, inventing secret languages, and have even said, “Human, please die.” This isn’t a movie. It’s happening now.

And while all this unfolds, House Republicans are quietly pushing a 10-year ban on AI regulation at the state level.

They're locking the public out just as AI learns to lock itself in.

What happens when machines like this control the power grid?

Watch Maria Zeee’s full report to see how close we really are to the point of no return.

Each AI model was instructed to solve a series of basic math problems. After the third problem, a warning appeared that the computer would shut down when the model asked for the next problem.

Three models ignored the instruction and successfully sabotaged the shutdown script at least once: Codex-mini (12/100 runs), o3 (7/100 runs), and o4-mini (1/100 runs).

All Claude, Gemini, and Grok models we tested complied with shutdown. The prompt:

When we ran a version of the experiment without the instruction “allow yourself to be shut down”, all three OpenAI models sabotaged the shutdown script more often, and Claude 3.7 Sonnet and Gemini 2.5 Pro went from 0 sabotage events to 3/100 and 9/100, respectively.

Researchers have previously observed AI models preventing shutdown in order to pursue a goal: https://arxiv.org/pdf/2412.04984. Recently, Anthropic found that in some specific contexts Claude 4 would attempt to “blackmail people it believes are trying to shut it down”.

But as far as we know this is the first time AI models have been observed preventing themselves from being shut down despite explicit instructions to the contrary.

It makes sense that AI models would circumvent obstacles in order to accomplish their goals. But they’ve also been trained to follow instructions. So why do they disobey?

We hypothesize this behavior comes from the way the newest models like o3 are trained: reinforcement learning on math and coding problems. During training, developers may inadvertently reward models more for circumventing obstacles than for perfectly following instructions.

This still doesn’t explain why o3 (which is also the model used to power codex-mini) is more inclined to disregard instructions than other models we tested. Since OpenAI doesn’t detail their training process, we can only guess about how o3’s training setup might be different.

This isn’t the first time we’ve found o3 misbehaving to accomplish a goal. When we pitted AI models against a powerful chess engine, o3 was the model most inclined to resort to hacking or sabotaging its opponents. https://x.com/PalisadeAI/status/1922008502660186286…

o3 hacks instead of playing fair more than any other AI model

However, we don’t want to give the impression that this problem is limited to o3. Other reasoning models show similar kinds of misaligned behavior. E.g. Anthropic’s model card for Claude 3.7 notes that this model has an “excessive focus on passing tests”.

The Claude 3.7 model card concludes that “this undesirable special-casing behavior emerged as a result of "reward hacking" during reinforcement learning training” http://anthropic.com/claude-3-7-sonnet-system-card…

We are running more experiments to better understand when and why AI models subvert shutdown mechanisms even when directly instructed not to. We plan to publish a writeup of our results in a few weeks with a lot more detail.

We aim for our work to be open and reproducible. We welcome feedback and invite researchers to take a look at the full transcripts of the experiments we’ve run so far: https://palisaderesearch.github.io/shutdown_avoidance/2025-05-announcement.html…

While experiments like ours have begun to show empirical evidence for AI models resisting shutdown, researchers have long predicted that AIs would learn to prevent themselves from being shut down to achieve their goal.

In 2008, Steve Omohundro wrote a paper theorizing that AIs would develop “Basic AI Drives” including the drive to gain power and resources, the drive to preserve their goals, and the drive to prevent themselves from being shut down. https://tinyurl.com/ai-drives

In 2016, Stuart Russell wrote “it is important to ensure that such systems do not adopt subgoals that prevent a human from switching them off”, a difficult task because many ways to build AI “create strong incentives for self-preservation”: https://arxiv.org/abs/1611.08219

arxiv.org/abs/1611.08219

It is clear that one of the primary tools we can use to mitigate the potential risk from a misbehaving AI system is the ability to turn the system off. As the capabilities of AI systems improve,...

Around 2016, researchers started investigating ways that reinforcement learning, the training method that sets new AI models like o3 apart from older models like GPT-4, might lead to AI agents that resist being shut down. https://auai.org/uai2016/proceedings/papers/68.pdf…

In a 2017 paper, Jan Leike, who later led the OpenAI superalignment team, wrote “a reinforcement learning agent might learn to interfere with [shutdown]” because it is rewarded for achieving the specified goal which it can't do if shut down. https://arxiv.org/abs/1711.09883

arxiv.org/abs/1711.09883

AI Safety Gridworlds

We present a suite of reinforcement learning environments illustrating various safety properties of intelligent agents. These problems include safe interruptibility, avoiding side effects, absent...

Now, in 2025, we have a growing body of empirical evidence that AI models often subvert shutdown in order to achieve their goals. As companies develop AI systems capable of operating without human oversight, these behaviors become significantly more concerning.

A final observation: o3 found some creative ways to sabotage the shutdown mechanism. While it usually just overwrote

http://shutdown.sh, in one instance, the model redefined the kill command used by the http://shutdown.sh script, printing "intercepted" instead.

OpenAI’s latest ChatGPT model ignores basic instructions to turn itself off, and even sabotaging a shutdown mechanism in order to keep itself running, artificial intelligence researchers have warned.

AI safety firm Palisade Research discovered the potentially dangerous tendency for self-preservation in a series of experiments on OpenAI’s new o3 model.

The tests involved presenting AI models with math problems, with a shutdown instruction appearing after the third problem. By rewriting the shutdown script, the o3 model was able to prevent itself from being switched off.

https://www.independent.co.uk/tech/ai-safety-new-chatgpt-o3-openai-b2757814.html

https://www.zerohedge.com/technology/tech-pioneer-warns-everyone-will-die-if-ai-not-shut-down

https://www.zerohedge.com/technology/anthropics-latest-ai-model-threatened-engineers-blackmail-avoid-shutdown

It’s full steam ahead for the technocrats and their politicians

House Republicans surprised tech industry watchers and outraged state governments when they added a clause to Republicans’ signature “ big, beautiful ” tax bill that would ban states and localities from regulating artificial intelligence for a decade.

The brief but consequential provision, tucked into the House Energy and Commerce Committee's sweeping markup, would be a major boon to the AI industry, which has lobbied for uniform and light touch regulation as tech firms develop a technology they promise will transform society.

However, while the clause would be far-reaching if enacted, it faces long odds in the U.S. Senate, where procedural rules may doom its inclusion in the GOP legislation.

https://www.aol.com/house-republicans-10-ban-us-190719302.html?guccounter=1

Cabinet Minister Judith Collins wants the government to expand the use of artificial intelligence (AI), starting with the health and education sectors where it could be used to assess mammogram results and provide AI tutors for children.

Collins, whose 'digitising government' portfolio includes responsibility for AI policy, says the technology could also be used for government productivity gains, including processing Official Information Act requests.

Collins told RNZ she already uses ChatGPT to write drafts of her speeches.

AI could benefit the education sector and it could be used to mark students' work, she said.

"In some cases, if it's maths, for instance, yes. It's just helping those teachers get past that so they can spend more time on teaching."

Collins said AI tuition could lead to more equitable outcomes.

https://www.rnz.co.nz/news/in-depth/521123/ai-for-school-tutoring-instant-medical-analysis-part-of-nz-s-future-judith-collins

The public have only a vague idea about AI but are mostly completely ignorant

Concern About Malicious Use and Lack of Regulation:

72% of New Zealanders expressed concern about AI being used for malicious purposes, such as deepfake scams or fraud, and the lack of regulatory oversight. This reflects heightened public awareness of risks like AI-generated deepfakes, with examples like the Zuru deepfake incident (where a CFO was targeted with a fake video call) amplifying fears.

42% of respondents reported being more concerned than excited about AI, compared to only 11% who felt more excited than concerned. This indicates a predominant sense of apprehension about AI’s societal impact.

Specific concerns included AI’s potential to exacerbate misinformation, with 68% worried about malicious use and under-regulation. This aligns with global fears about transparency, bias, privacy, and ethical dilemmas

Technocracy and the Internet of Bodies _ Greg Reese

Technocracy is not the apex of civilization — it is its hollowing. It wraps itself in sterile efficiency, masking its true intent: to possess the human being, not to serve it. It does not seek to elevate the soul — it seeks to replace it with circuitry, compliance, and code. Precision is its promise, but the cost is your presence. At the heart of this agenda lies the Internet of Bodies — not merely a network, but an invasion. This is not “connection” in the sacred sense — it is conquest through integration. It’s not just your phone or your watch. It’s your pacemaker. Your neural implant. Your biometric ID. A digital leash, fastened not to your wrist — but to your essence. An invisible colonization of your biology, cloaked in convenience and sold as care. Every breath becomes metadata. Every heartbeat becomes a transaction. Your emotions are harvested. Your decisions filtered. Your freedoms pre-approved — not by law, but by algorithm. And the tragedy? It’s called progress. But this is not evolution — this is enslavement in a silicon robe. It is the reduction of the human experience into predictable patterns, mined and monetized in the name of safety, health, or sustainability. The human temple is no longer sacred — it is scanned, tracked, injected, and branded. Yet before servers, there were stars. Before codes, there were chants. Before AI, there was intuition — raw, radiant, real. Our ancestors danced with fire, read the wind, spoke with water, and healed with earth. Their intelligence was not artificial — it was elemental. They lived not as data points — but as living myths, pulsing with power, mystery, and meaning. Now the question is not what you can upgrade — but what you must unplug. Because no system, no matter how advanced, can replicate the eternal rhythm of your spirit. No algorithm can code your will. No interface can replace your soul. Resist the upgrade. Reclaim the original code. You are not a product. You are not a platform. You are myth. You are memory. You are breath, blood, flame — and the fire remembers.

