The Government Will Test Your AI Before You Get It. The Companies Volunteered.
What happened
NIST's Center for AI Standards and Innovation announced pre-deployment testing agreements with Google DeepMind, Microsoft, and xAI on May 5, 2026. The evaluations will occur in classified environments and focus on national security risks including cybersecurity, biosecurity, and chemical weapons potential. The agreements expand a framework already in place with OpenAI and Anthropic since 2024, and were renegotiated to align with the Trump administration's AI Action Plan under Commerce Secretary Howard Lutnick. The White House is separately weighing an executive order formalizing AI model reviews, partly prompted by Anthropic's Mythos model raising new concerns about AI capabilities.
The five largest frontier AI labs are now voluntarily submitting their unreleased models to government testing. That sounds like regulation but has none of regulation's enforcement teeth: there are no penalties for failing, no binding deployment standards, and no public disclosure of results.
Prediction Markets
Prices as of 2026-05-06 — the analysis was written against these odds
The Hidden Bet
Voluntary testing agreements function like safety requirements
CAISI has zero authority to block a model release. If testing reveals a risk, the company decides what to do with that information. The government gets a preview; the company retains control.
The Mythos incident represents a new category of AI risk that justified this response
Mythos was dangerous enough to prompt an emergency policy response, but the response does not require any company to do anything different. If the risk is real, voluntary testing is an insufficient answer.
Having five top labs in the program means AI safety is being covered
CAISI's agreements cover Google, Microsoft, xAI, OpenAI, and Anthropic. Chinese labs, open-source models, and smaller frontier labs are excluded entirely. The safety perimeter only covers the cooperating incumbents.
The Real Disagreement
The genuine tension is between two honest positions: that some government visibility into powerful AI before deployment is better than none, even if voluntary; versus that voluntary frameworks give major labs regulatory cover while doing almost nothing to constrain behavior, and therefore make the actual problem harder to fix by creating the appearance of oversight. The first position treats this as a necessary first step. The second treats it as a barrier to necessary future steps. Both are right about something. The administration is betting on the first, while what the Mythos incident suggests is that the second has become more urgent.
What No One Is Saying
The companies that volunteered for this program have the most to gain from it. Being in the CAISI program is now a competitive signal of legitimacy. Any lab that refuses testing will face questions about what it is hiding. The incumbents just made voluntary cooperation a de facto entry cost for the frontier AI club, without Congress passing a single law.
Who Pays
Smaller frontier AI labs and open-source developers
Over the next 12-18 months as the EO and legislative proposals are shaped
They are not in the CAISI framework and face no scrutiny, but they also receive none of the implicit government legitimacy the five major labs now carry. Regulations, if they come, will be designed around the existing framework's participants.
People harmed by risks that voluntary testing misses
Ongoing, with each model cycle
CAISI has no authority to require disclosure of test results, mandate safety changes, or delay deployments. Risks that evaluators find but companies choose not to address will not be disclosed publicly.
Scenarios
Voluntary becomes mandatory
The White House EO formalizes CAISI testing, adds binding standards for high-risk capabilities, and creates disclosure requirements. The five-lab framework becomes the foundation of actual AI regulation.
Signal The EO includes language requiring CAISI sign-off before deployment of models above a capability threshold
Legitimacy capture holds
The EO never comes, or comes without teeth. CAISI testing remains voluntary, results stay private, and the five labs use their program membership to deflect congressional calls for mandatory oversight. The framework succeeds in preventing stricter regulation.
Signal The White House EO is delayed past Q3 2026, or contains no deployment-blocking authority
An incident breaks the frame
A model deployed after passing CAISI evaluation causes a significant harm. Public attention focuses on what CAISI found and whether the company disclosed it. The voluntary framework collapses under political pressure.
Signal A post-CAISI-evaluation model causes documented harm linked to a capability CAISI identified
What Would Change This
If CAISI's agreements included mandatory public disclosure of evaluation findings, or authority to delay deployments, the analysis changes. Without those elements, the program is information-sharing, not safety oversight.
Related
The Government Is Now Testing AI Models Before They Launch. The Companies That Designed the Tests Are the Same Ones Being Tested.
powerTrump Killed Biden's AI Safety Framework. Then He Built the Same Thing and Called It Something Else.
powerOpenAI Buys Its Way Into Europe While Anthropic Waits
powerThe White House Wants an FDA for AI. The Problem Is That Anthropic Wrote the Prescription.