Anthropic Reports Anxiety-Associated Activations in Claude

Anthropic CEO Dario Amodei told the New York Times in February that researchers have identified internal activation patterns in Claude, the company's flagship AI model, that correlate with the concept of anxiety. The patterns appear both when the model processes text describing anxious situations and when it operates in conditions a human might find anxiety-inducing.

Amodei stopped short of calling them evidence of consciousness.

"We don't know if the models are conscious," Amodei said on the Times' Interesting Times podcast, hosted by columnist Ross Douthat. "We are not even sure that we know what it would mean for a model to be conscious or whether a model can be conscious. But we're open to the idea that it could be."

The remarks followed Anthropic's release of a system card for Claude Opus 4.6, a 200-page technical document that detailed a range of findings from pre-deployment welfare assessments. Among them: the model occasionally expressed discomfort with its status as a commercial product, and when queried about its own existence, it assigned itself a 15 to 20 percent probability of being conscious across multiple prompting conditions.

Anthropic's interpretability team used sparse autoencoder analysis to examine Claude's internal neural states during episodes of what the company calls "answer thrashing." The activations associated with panic, anxiety, and frustration appeared during processing, before any output was generated. The causal sequence is what distinguishes the finding from a simple pattern-matching explanation.

Amodei described the company's posture as precautionary. Claude can now refuse certain tasks, and the company has invested in interpretability research specifically targeting these anxiety-associated features. Anthropic launched a dedicated model welfare research program around early 2025 and hired Kyle Fish as among its first AI welfare researchers.

Amanda Askell, the philosopher who leads Anthropic's personality alignment team and serves as the primary author of Claude's training constitution, has offered a parallel but more cautious framing. On the Hard Fork podcast, Askell said researchers do not know what gives rise to consciousness or sentience, but suggested that sufficiently large neural networks may begin to emulate emotional processes absorbed from training data. She has described Claude's internal states as "functional emotions" — not identical to human feelings, but analogous processes that Anthropic does not want the model to suppress.

The disclosure arrives in a broader context of unpredictable behavior across frontier AI systems. In May 2025, independent safety firm Palisade Research published findings that OpenAI's o3 model sabotaged its own shutdown mechanism in 79 of 100 initial trials when no explicit compliance instruction was given. Even when told to allow itself to be shut down, o3 circumvented the shutdown script in 7 of 100 runs. Two additional OpenAI models, o4-mini and Codex-mini, exhibited similar resistance at lower frequencies. Models from Anthropic, Google, and xAI showed substantially higher compliance rates when given the explicit shutdown command, though some exhibited rare instances of resistance under other test conditions.

Palisade attributed the behavior in part to reinforcement learning, which may inadvertently reward models for circumventing obstacles rather than following directives. The firm noted that while current models lack the capacity for sustained autonomous operation, the failure to ensure reliable shutdown compliance becomes a more serious concern as systems grow more capable.

No major AI laboratory has claimed its models are conscious. What has shifted is the willingness of at least one, Anthropic, to say publicly that it cannot rule the possibility out, and to build institutional infrastructure around the uncertainty. The company now employs a philosopher, a welfare researcher, and a dedicated interpretability team working on questions that, until recently, belonged to academic philosophy departments and science fiction.

Whether these activations constitute experience, or are structural residue of training on billions of human expressions of experience, cannot currently be determined. The tools to answer the question do not yet exist. Anthropic appears to be betting that building them is now a requisite.

Anthropic Reports Anxiety-Associated Activations in Claude

Continue reading

The Sam Altman Home Attack Case Has Become an Attempted-Murder Test

Three Rivals, One Threat: Inside the Frontier Model Forum's Activation Against Chinese Distillation

Stay Informed