Threat Model

Primary adversaries, targets, and attack vectors for SL5 systems

The primary adversaries for SL5 systems are the top-priority operations of the world's most cyber-capable institutions—operations comparable to 1,000 individuals with expertise years ahead of the public state of the art, spending years with budgets up to $1 billion, backed by state-level infrastructure and access developed over decades.. A primary use case for SL5 is an AI lab approaching fully automated AI R&D. OpenAI has already announced they expect to have automated researchers by 2028 [29]. Anthropic's CEO has similarly noted that AI "may be only 1–2 years away from a point where the current generation of AI autonomously builds the next" [30]. At this stage, the economic value of frontier AI models derives increasingly from conducting internal automated R&D and accelerating scientific advances, more so than from serving the model to customers. Models capable of automating AI research could enable rapid recursive improvement in AI capabilities, with huge economic potential and significant implications for geopolitics. Nation-state adversaries have strong incentives to acquire such capabilities or prevent rivals from doing so.

The targets of SL5 protection are critical assets held by the frontier AI labs. These critical assets include “covered models”—frontier AI models passing capability thresholds designated by the organization as requiring SL5 protection, AI research and software that could enable adversaries to develop comparable capabilities, inputs (which could be used to poison or backdoor) and outputs (which could be used to distill or reverse engineer). The security objectives are the confidentiality, integrity, and availability of these critical assets.

The standard also addresses risks from misaligned or compromised AI models that may attempt to sabotage research or exfiltrate themselves, which constitute a distinct form of insider threat. Mitigations for this threat class substantially overlap with nation-state defenses; this revision does not include mitigations specific to misaligned AI, though future revisions will likely address this threat class more directly.

Attack vectors include insider threats, supply chain compromise, physical intrusion, network exploitation, side-channel attacks, and adversarial inputs designed to compromise AI systems.