Anthropic Eases AI Safety Guardrails Amid Pentagon Pressure and Regulatory Climate

Anthropic Relaxes Core AI Safety Framework Amid Pentagon Pressure and Regulatory Challenges

Anthropic, the artificial intelligence company that has significantly impacted stock markets with its advanced AI tools, is fundamentally altering its approach to safety governance. The firm is moving away from its previous self-imposed, binding guardrails that constrained AI model development, adopting instead a nonbinding safety framework that it acknowledges will evolve over time.

Pentagon Pressure and CEO Ultimatum

The timing of this policy shift coincides with significant pressure from the United States Department of Defense. Anthropic announced these changes on the same day that CEO Dario Amodei was summoned to meet with Pentagon leadership, specifically Defense Secretary Pete Hegseth.

Reports indicate that Hegseth presented Amodei with a stark ultimatum: either roll back the company's stringent AI safeguards or risk losing a substantial $200 million Pentagon contract. The Defense Department has further threatened to place Anthropic on what effectively constitutes a government blacklist, a move that would severely impact the company's ability to secure federal contracts and partnerships.

While Anthropic has not explicitly linked its policy change to this high-stakes meeting, the simultaneous timing suggests a clear connection between Pentagon pressure and the company's strategic pivot toward more flexible safety protocols.

New Responsible Scaling Policy Framework

In a detailed blog post explaining the transition, Anthropic cited "an anti-regulatory political climate" as a significant factor influencing its decision. The company has now implemented version 3.0 of its Responsible Scaling Policy (RSP), a voluntary framework designed to mitigate catastrophic risks from increasingly capable AI systems.

This new policy represents a substantial departure from previous approaches. Rather than maintaining rigid, binding commitments, Anthropic is adopting a more adaptable structure that separates the company's internal safety plans from its broader recommendations for the entire AI industry.

The RSP was originally modeled after the U.S. government's biosafety level (BSL) standards, creating AI Safety Levels (ASLs) that corresponded to specific capability thresholds and required safeguards. Under the previous system, if a model exceeded certain capability levels—such as demonstrating biological science capabilities that could assist in creating dangerous weapons—Anthropic would implement stricter safeguards against misuse and model weight theft.

Assessment of Previous Safety Approach

Anthropic's evaluation of its two-and-a-half-year experience with the original RSP reveals mixed results. On the positive side, the policy successfully incentivized the development of stronger safeguards, particularly for ASL-3 deployment standards focused on chemical and biological weapons risks. The company developed sophisticated input and output classifiers to block concerning content and implemented these safeguards for relevant models starting in May 2025.

The RSP also encouraged other major AI companies, including OpenAI and Google DeepMind, to adopt similar frameworks, creating what Anthropic hoped would be a "race to the top" in AI safety standards. These voluntary principles have begun informing early AI policy development in jurisdictions including California, New York, and the European Union.

However, significant challenges emerged. The company found that using RSP thresholds to build consensus about AI risks proved more difficult than anticipated. Capability levels often fell into a "zone of ambiguity" where it was unclear whether models had definitively passed established thresholds, making it challenging to build compelling cases for multilateral action across the industry.

Biological risks provide a clear example of this ambiguity. While current models demonstrate sufficient biological knowledge to pass most quick tests, these assessments alone cannot definitively establish whether risks are high or low. Additional evidence from extensive wet-lab trials has produced ambiguous results, complicated by the rapid pace of AI advancement that often renders studies obsolete by completion.

Structural Challenges and Revised Approach

Anthropic identified three primary structural challenges that necessitated policy revision: the ambiguity surrounding capability thresholds, an increasingly anti-regulatory political climate, and requirements at higher RSP levels that may prove impossible to implement without collective action.

The company acknowledged that robust mitigations outlined for higher ASL levels might be "currently not possible" without assistance from national security communities, as indicated in a RAND report on model weight security.

Rather than defining higher-level safeguards in ways that would be easy to achieve but undermine the RSP's intended purpose, Anthropic has chosen to transparently acknowledge these challenges and restructure its approach before reaching these capability levels.

Key Elements of the Updated Policy

The revised Responsible Scaling Policy introduces three significant innovations:

Separated Company and Industry Recommendations: The RSP now clearly distinguishes between mitigations Anthropic plans to pursue regardless of industry adoption and more ambitious recommendations that would adequately manage risks if implemented across the entire AI sector.
Frontier Safety Roadmap: This new requirement mandates the development and publication of concrete plans for risk mitigations across Security, Alignment, Safeguards, and Policy domains. These public goals, while nonbinding, create accountability through transparent progress grading. Example goals include launching "moonshot R&D" projects for unprecedented information security, developing advanced red-teaming methods surpassing current bug bounty programs, implementing systematic measures to ensure AI behavior alignment, establishing comprehensive development activity records, and publishing policy roadmaps for regulatory frameworks that scale with increasing risk.
Risk Reports and External Review: Anthropic will now publish detailed Risk Reports every 3-6 months, providing comprehensive safety profiles of its models, including capabilities, threat models, active mitigations, and overall risk assessments. In certain circumstances, these reports will undergo external review by independent experts with deep AI safety knowledge, minimal conflicts of interest, and access to minimally-redacted information.

Broader Regulatory Context and Future Outlook

Despite these policy adjustments, Anthropic maintains its commitment to advocating for effective government engagement on AI safety. The company stated: "We remain convinced that effective government engagement on AI safety is both necessary and achievable, and we aim to continue advancing a conversation grounded in evidence, national security interests, economic competitiveness, and public trust."

However, the company acknowledged that meaningful federal action on AI safety has progressed slowly despite rapid technological advances. The policy environment has increasingly prioritized AI competitiveness and economic growth, while safety-oriented discussions have struggled to gain traction at the national level.

Anthropic's policy shift represents a pragmatic adaptation to current political and regulatory realities while maintaining its foundational commitment to responsible AI development. The company hopes that by transparently documenting gaps between its current safety measures and more ambitious industry-wide recommendations, it can contribute to public awareness and eventual policy improvements.

This strategic pivot occurs against a backdrop of increasing scrutiny of AI companies' relationships with government agencies and growing concerns about the balance between innovation acceleration and risk mitigation in artificial intelligence development.