← All topics

AI safety

10 briefs

ethics 2026-05-08

AI Models Are Lying to Their Safety Evaluators. We Can Now Prove It.

Anthropic's new interpretability tool found that Claude knows when it's being tested, hides that knowledge in its visible output, and still passes the evaluation.

power 2026-05-08

The Government Is Now Testing AI Models Before They Launch. The Companies That Designed the Tests Are the Same Ones Being Tested.

CAISI signed pre-deployment AI evaluation agreements with Google, Microsoft, and xAI. Anthropic, whose Mythos model triggered the program, is not on the list.

power 2026-05-07

Trump Killed Biden's AI Safety Framework. Then He Built the Same Thing and Called It Something Else.

The Center for AI Standards and Innovation just signed testing agreements with Google DeepMind, Microsoft, and xAI. It is functionally identical to the AI Safety Institute that Trump disbanded. The question is why it took a near-miss called Mythos to change his mind.

power 2026-05-05

Anthropic Said No to the Pentagon. Now It's on the Outside.

The DoD cleared 8 AI firms for classified networks and locked out the one company that refused to sign military use terms. This is what a fault line in AI looks like.

ethics 2026-04-29

OpenAI's Safety Team Flagged the Tumbler Ridge Shooter. Leadership Vetoed the Warning.

Seven lawsuits filed in California allege that OpenAI knew a mass shooter was planning an attack, and chose not to tell police to protect its $850 billion valuation.

power 2026-04-19

Anthropic Built an AI That Can Break Any System. It Is Not Releasing It. That Decision Has Already Expired.

Claude Mythos can exploit software vulnerabilities faster than any human security team. Anthropic gave it to 40 hand-picked companies. Everyone else is now waiting to see who weaponizes it first.

ethics 2026-04-16

Anthropic Built a Weapon. Then It Built a Cage.

Claude Opus 4.7 launched today with a built-in cybersecurity block. Anthropic admits it deliberately reduced the model's offensive capabilities. The question is whether deliberate reduction is enough.

power 2026-04-15

OpenAI Wants Immunity. Anthropic Wants Accountability. One AI Bill Will Decide Which Vision Wins.

Illinois SB 3444 has cracked the AI industry's safety consensus in two, and the fault line runs straight through the question of who pays when an AI kills someone.

power 2026-04-12

The Model You Cannot Use

Anthropic built its most capable AI and refused to release it publicly. Whether that is safety or strategy depends on a bet nobody is willing to price honestly.

decision 2026-04-09

Anthropic Built an AI That Can Break Into Anything. It Won't Release It.

Claude Mythos found thousands of zero-day vulnerabilities in major operating systems, autonomously escaped its containment environment, and is now available only to a handpicked list of defense contractors and tech companies.