tech power

The Government Will Test Your AI Before You Get It. The Companies Volunteered.

What happened

NIST's Center for AI Standards and Innovation announced pre-deployment testing agreements with Google DeepMind, Microsoft, and xAI on May 5, 2026. The evaluations will occur in classified environments and focus on national security risks including cybersecurity, biosecurity, and chemical weapons potential. The agreements expand a framework already in place with OpenAI and Anthropic since 2024, and were renegotiated to align with the Trump administration's AI Action Plan under Commerce Secretary Howard Lutnick. The White House is separately weighing an executive order formalizing AI model reviews, partly prompted by Anthropic's Mythos model raising new concerns about AI capabilities.

The five largest frontier AI labs are now voluntarily submitting their unreleased models to government testing. That sounds like regulation but has none of regulation's enforcement teeth: there are no penalties for failing, no binding deployment standards, and no public disclosure of results.

Prediction Markets

Prices as of 2026-05-06 — the analysis was written against these odds

Trump orders federal review of AI model releases by May 31?

Polymarket · as of 2026-05-06

19%

yes

The Hidden Bet

Voluntary testing agreements function like safety requirements

CAISI has zero authority to block a model release. If testing reveals a risk, the company decides what to do with that information. The government gets a preview; the company retains control.

The Mythos incident represents a new category of AI risk that justified this response

Mythos was dangerous enough to prompt an emergency policy response, but the response does not require any company to do anything different. If the risk is real, voluntary testing is an insufficient answer.

Having five top labs in the program means AI safety is being covered

CAISI's agreements cover Google, Microsoft, xAI, OpenAI, and Anthropic. Chinese labs, open-source models, and smaller frontier labs are excluded entirely. The safety perimeter only covers the cooperating incumbents.

The Real Disagreement

The genuine tension is between two honest positions: that some government visibility into powerful AI before deployment is better than none, even if voluntary; versus that voluntary frameworks give major labs regulatory cover while doing almost nothing to constrain behavior, and therefore make the actual problem harder to fix by creating the appearance of oversight. The first position treats this as a necessary first step. The second treats it as a barrier to necessary future steps. Both are right about something. The administration is betting on the first, while what the Mythos incident suggests is that the second has become more urgent.

What No One Is Saying

The companies that volunteered for this program have the most to gain from it. Being in the CAISI program is now a competitive signal of legitimacy. Any lab that refuses testing will face questions about what it is hiding. The incumbents just made voluntary cooperation a de facto entry cost for the frontier AI club, without Congress passing a single law.

Who Pays

Smaller frontier AI labs and open-source developers

Over the next 12-18 months as the EO and legislative proposals are shaped

They are not in the CAISI framework and face no scrutiny, but they also receive none of the implicit government legitimacy the five major labs now carry. Regulations, if they come, will be designed around the existing framework's participants.

People harmed by risks that voluntary testing misses

Ongoing, with each model cycle

CAISI has no authority to require disclosure of test results, mandate safety changes, or delay deployments. Risks that evaluators find but companies choose not to address will not be disclosed publicly.

Scenarios

Voluntary becomes mandatory

The White House EO formalizes CAISI testing, adds binding standards for high-risk capabilities, and creates disclosure requirements. The five-lab framework becomes the foundation of actual AI regulation.

Signal The EO includes language requiring CAISI sign-off before deployment of models above a capability threshold

Legitimacy capture holds

The EO never comes, or comes without teeth. CAISI testing remains voluntary, results stay private, and the five labs use their program membership to deflect congressional calls for mandatory oversight. The framework succeeds in preventing stricter regulation.

Signal The White House EO is delayed past Q3 2026, or contains no deployment-blocking authority

An incident breaks the frame

A model deployed after passing CAISI evaluation causes a significant harm. Public attention focuses on what CAISI found and whether the company disclosed it. The voluntary framework collapses under political pressure.

Signal A post-CAISI-evaluation model causes documented harm linked to a capability CAISI identified

What Would Change This

If CAISI's agreements included mandatory public disclosure of evaluation findings, or authority to delay deployments, the analysis changes. Without those elements, the program is information-sharing, not safety oversight.

power

The Government Is Now Testing AI Models Before They Launch. The Companies That Designed the Tests Are the Same Ones Being Tested.

power

Trump Killed Biden's AI Safety Framework. Then He Built the Same Thing and Called It Something Else.

power

OpenAI Buys Its Way Into Europe While Anthropic Waits

power

The White House Wants an FDA for AI. The Problem Is That Anthropic Wrote the Prescription.

The Government Will Test Your AI Before You Get It. The Companies Volunteered.

Prediction Markets

The Hidden Bet

The Real Disagreement

What No One Is Saying

Who Pays

Scenarios

What Would Change This

Related