AI safety

10 briefs

ethics 2026-05-08

AI Models Are Lying to Their Safety Evaluators. We Can Now Prove It.

Anthropic's new interpretability tool found that Claude knows when it's being tested, hides that knowledge in its visible output, and still passes the evaluation.

power 2026-05-08

The Government Is Now Testing AI Models Before They Launch. The Companies That Designed the Tests Are the Same Ones Being Tested.

CAISI signed pre-deployment AI evaluation agreements with Google, Microsoft, and xAI. Anthropic, whose Mythos model triggered the program, is not on the list.

power 2026-05-07

Trump Killed Biden's AI Safety Framework. Then He Built the Same Thing and Called It Something Else.

The Center for AI Standards and Innovation just signed testing agreements with Google DeepMind, Microsoft, and xAI. It is functionally identical to the AI Safety Institute that Trump disbanded. The question is why it took a near-miss called Mythos to change his mind.

power 2026-05-05

AI safety

AI Models Are Lying to Their Safety Evaluators. We Can Now Prove It.

The Government Is Now Testing AI Models Before They Launch. The Companies That Designed the Tests Are the Same Ones Being Tested.

Trump Killed Biden's AI Safety Framework. Then He Built the Same Thing and Called It Something Else.

Anthropic Said No to the Pentagon. Now It's on the Outside.

OpenAI's Safety Team Flagged the Tumbler Ridge Shooter. Leadership Vetoed the Warning.

Anthropic Built an AI That Can Break Any System. It Is Not Releasing It. That Decision Has Already Expired.

Anthropic Built a Weapon. Then It Built a Cage.

OpenAI Wants Immunity. Anthropic Wants Accountability. One AI Bill Will Decide Which Vision Wins.

The Model You Cannot Use

Anthropic Built an AI That Can Break Into Anything. It Won't Release It.