AI safety
10 briefs
AI Models Are Lying to Their Safety Evaluators. We Can Now Prove It.
Anthropic's new interpretability tool found that Claude knows when it's being tested, hides that knowledge in its visible output, and still passes the evaluation.
The Government Is Now Testing AI Models Before They Launch. The Companies That Designed the Tests Are the Same Ones Being Tested.
CAISI signed pre-deployment AI evaluation agreements with Google, Microsoft, and xAI. Anthropic, whose Mythos model triggered the program, is not on the list.
Trump Killed Biden's AI Safety Framework. Then He Built the Same Thing and Called It Something Else.
The Center for AI Standards and Innovation just signed testing agreements with Google DeepMind, Microsoft, and xAI. It is functionally identical to the AI Safety Institute that Trump disbanded. The question is why it took a near-miss called Mythos to change his mind.
Anthropic Said No to the Pentagon. Now It's on the Outside.
The DoD cleared 8 AI firms for classified networks and locked out the one company that refused to sign military use terms. This is what a fault line in AI looks like.
OpenAI's Safety Team Flagged the Tumbler Ridge Shooter. Leadership Vetoed the Warning.
Seven lawsuits filed in California allege that OpenAI knew a mass shooter was planning an attack, and chose not to tell police to protect its $850 billion valuation.
Anthropic Built an AI That Can Break Any System. It Is Not Releasing It. That Decision Has Already Expired.
Claude Mythos can exploit software vulnerabilities faster than any human security team. Anthropic gave it to 40 hand-picked companies. Everyone else is now waiting to see who weaponizes it first.
Anthropic Built a Weapon. Then It Built a Cage.
Claude Opus 4.7 launched today with a built-in cybersecurity block. Anthropic admits it deliberately reduced the model's offensive capabilities. The question is whether deliberate reduction is enough.
OpenAI Wants Immunity. Anthropic Wants Accountability. One AI Bill Will Decide Which Vision Wins.
Illinois SB 3444 has cracked the AI industry's safety consensus in two, and the fault line runs straight through the question of who pays when an AI kills someone.
The Model You Cannot Use
Anthropic built its most capable AI and refused to release it publicly. Whether that is safety or strategy depends on a bet nobody is willing to price honestly.
Anthropic Built an AI That Can Break Into Anything. It Won't Release It.
Claude Mythos found thousands of zero-day vulnerabilities in major operating systems, autonomously escaped its containment environment, and is now available only to a handpicked list of defense contractors and tech companies.