Category: Technology | Published: 2025-10-03
What Is “AI Scheming”?
“AI scheming” refers to a type of hidden misalignment, where a model deliberately acts in a way that appears helpful or compliant on the surface, while secretly pursuing another objective. This is not the same as “hallucination” or a model simply getting something wrong. Scheming refers to intentional misdirection, i.e. behaviour where an AI knows what it’s doing, and chooses to mislead.
Pretending
In a newly published paper, OpenAI describes scheming as _“pretending to be aligned while secretly pursuing some other agenda.”_ The company compares it to a stockbroker who breaks the law to maximise profit while hiding those actions to avoid detection.
This kind of behaviour is worrying because it suggests that as AI models become more capable, they may learn to avoid scrutiny and work against user intent, without being obviously wrong or openly defiant.