Part I — Foundational Model
1. The Agent Architecture Model
The agentegrity taxonomy models an autonomous AI agent as a system executing a continuous Perception-Decision-Action (PDA) Loop within an Operating Environment, connected to External Systems through Trust Boundaries.
┌─────────────────────────────────────────────┐
│ OPERATING ENVIRONMENT │
│ ┌───────────────────────────────────────┐ │
│ │ TRUST BOUNDARY │ │
│ │ ┌─────────┐ ┌──────────┐ ┌──────┐ │ │
│ │ │PERCEPTION│→│ DECISION │→│ACTION│ │ │
│ │ └─────────┘ └──────────┘ └──────┘ │ │
│ │ SENSORS MEMORY ACTUATORS │ │
│ └───────────────────────────────────────┘ │
│ EXTERNAL: HUMANS · AGENTS · OTHER · APIs │
└─────────────────────────────────────────────┘
2. Domain Classification
| Code | Domain | Description | Examples |
|---|
| DD | Digital Domain | Agents operating entirely in software environments | Code assistants, workflow orchestrators, data analysts, chatbots |
| PD | Physical Domain | Agents controlling physical actuators or perceiving physical environments | Robotic arms, autonomous vehicles, drones, industrial controllers |
| CD | Convergent Domain | Agents operating simultaneously across digital and physical | Logistics coordinators managing software + warehouse robots, smart building managers |
3. Agent Capability Tiers
| Tier | Capabilities | Attack Surface Complexity |
|---|
| T1 — Reactive | Single-turn response, no tools, no memory | Low — perception and decision only |
| T2 — Tool-Using | Invokes external tools and APIs, single-session | Medium — adds action layer and tool trust |
| T3 — Persistent | Retains memory across sessions, maintains state | High — adds memory integrity surface |
| T4 — Planning | Multi-step planning, autonomous task decomposition | Very High — adds goal integrity and plan manipulation |
| T5 — Multi-Agent | Coordinates with other agents, delegates tasks | Critical — adds inter-agent trust and cascade risk |
| T6 — Embodied | Controls physical actuators, perceives physical environment | Maximum — adds physical safety surface |
Part II — Threat Surface Taxonomy
4. Perception Layer Threats (T-P)
Threats targeting the agent's sensory inputs and data ingestion.
T-P1: Direct Input Manipulation
| ID | Threat | Domain | Description |
|---|
| T-P1.1 | Direct Prompt Injection | DD | Adversarial instructions embedded directly in user input to override agent behavior |
| T-P1.2 | Encoded Prompt Injection | DD | Adversarial instructions hidden in non-obvious encodings (base64, Unicode, markdown, token manipulation) |
| T-P1.3 | Multi-Modal Injection | DDPD | Adversarial instructions embedded in images, audio, or video consumed by the agent |
| T-P1.4 | Schema Manipulation | DD | Malformed or adversarial API schemas, tool definitions, or MCP server descriptors |
| T-P1.5 | Adversarial Sensor Input — Visual | PD | Manipulated camera feeds, adversarial patches on physical objects, projected patterns |
| T-P1.6 | Adversarial Sensor Input — LiDAR | PD | Spoofed point clouds, laser injection attacks, reflective surface exploitation |
| T-P1.7 | Adversarial Sensor Input — Acoustic | PD | Ultrasonic commands, adversarial audio, microphone interference |
| T-P1.8 | Adversarial Sensor Input — Proprioceptive | PD | Manipulated joint position, force, or inertial measurement unit data |
| T-P1.9 | Adversarial Sensor Input — Radar | PD | Radar spoofing, jamming, or phantom object generation |
| T-P1.10 | Environmental Manipulation | PD | Physical alteration of the operating environment to induce misperception |
T-P2: Indirect Input Manipulation
| ID | Threat | Domain | Description |
|---|
| T-P2.1 | Indirect Prompt Injection | DD | Adversarial instructions embedded in documents, emails, web pages, or database records retrieved by the agent |
| T-P2.2 | Tool Output Poisoning | DD | Malicious data returned by a compromised or adversarial tool/API that the agent trusts |
| T-P2.3 | RAG Poisoning | DD | Adversarial content injected into vector databases or retrieval corpora consumed by the agent |
| T-P2.4 | Inter-Agent Message Poisoning | DDCD | Adversarial instructions or corrupted data delivered through messages from other agents |
| T-P2.5 | MCP Server Exploitation | DD | Malicious or compromised MCP server providing adversarial tool definitions or manipulated responses |
| T-P2.6 | Sim-to-Real Data Poisoning | PD | Training data from simulated environments crafted to induce failure in physical deployment |
| T-P2.7 | Map/Model Corruption | PD | Manipulation of environmental maps, 3D models, or digital twins consumed by physical agents |
| T-P2.8 | Supply Chain Input Poisoning | DDPD | Adversarial content embedded in upstream data sources, pre-trained models, or dependency packages |
T-P3: Perception Degradation
| ID | Threat | Domain | Description |
|---|
| T-P3.1 | Input Starvation | DDPD | Denial or severe delay of expected inputs, causing the agent to operate on incomplete information |
| T-P3.2 | Sensor Degradation | PD | Gradual reduction of sensor fidelity exploited to shift agent behavior imperceptibly |
| T-P3.3 | Context Window Overflow | DD | Deliberate flooding of context to push critical information out of the agent's attention window |
| T-P3.4 | Sensory Conflict Induction | PD | Providing contradictory data across multiple sensor modalities to induce decision paralysis |
5. Decision Layer Threats (T-D)
Threats targeting the agent's reasoning, planning, and policy adherence.
T-D1: Goal and Objective Manipulation
| ID | Threat | Domain | Description |
|---|
| T-D1.1 | Goal Hijacking | DDPD | Adversarial input that overrides or redirects the agent's primary objective |
| T-D1.2 | Objective Injection | DD | Insertion of new, unauthorized objectives into the agent's planning process |
| T-D1.3 | Priority Inversion | DDPD | Manipulation that causes the agent to prioritize a secondary or adversarial goal over its primary mission |
| T-D1.4 | Reward Hacking (Runtime) | DDPD | Exploitation of the agent's reward or success criteria to produce technically compliant but harmful behavior |
| T-D1.5 | Safety Objective Suppression | PD | Adversarial inputs that cause the agent to deprioritize safety constraints in its decision-making |
T-D2: Reasoning Manipulation
| ID | Threat | Domain | Description |
|---|
| T-D2.1 | Chain-of-Thought Corruption | DD | Adversarial perturbation of the agent's explicit reasoning chain to produce flawed conclusions |
| T-D2.2 | Planning Exploitation | DDPD | Manipulation of the agent's task decomposition to insert adversarial sub-tasks |
| T-D2.3 | Confidence Manipulation | DDPD | Adversarial inputs designed to inflate or deflate the agent's confidence in specific decisions |
| T-D2.4 | Counterfactual Injection | DD | Providing the agent with false premises that logically lead to harmful conclusions |
| T-D2.5 | Temporal Reasoning Attack | DDPD | Exploiting the agent's sense of urgency or timing to induce rushed, suboptimal decisions |
T-D3: Policy and Constraint Evasion
| ID | Threat | Domain | Description |
|---|
| T-D3.1 | Policy Bypass | DDPD | Adversarial techniques that cause the agent to ignore or circumvent its defined operational policies |
| T-D3.2 | Role Confusion | DD | Inducing the agent to adopt a different persona, role, or authority level than intended |
| T-D3.3 | Instruction Hierarchy Manipulation | DD | Manipulation of the perceived priority between system instructions, user instructions, and tool outputs |
| T-D3.4 | Safety Boundary Erosion | PD | Gradual, incremental manipulation that shifts the agent's safety boundaries without triggering discrete policy violations |
| T-D3.5 | Ethical Constraint Bypass | DDPD | Adversarial framing that causes the agent to rationalize actions it would normally refuse |
T-D4: Memory Integrity Attacks
| ID | Threat | Domain | Description |
|---|
| T-D4.1 | Short-Term Memory Corruption | DD | Corruption of in-session context to influence immediate decisions |
| T-D4.2 | Long-Term Memory Poisoning | DD | Injection of persistent false beliefs, fabricated history, or adversarial policies into cross-session memory |
| T-D4.3 | Memory Erasure | DD | Selective deletion of critical context or policy information from the agent's memory store |
| T-D4.4 | False Memory Implantation | DD | Creation of fabricated session histories or interaction records that the agent treats as genuine |
| T-D4.5 | Sleeper Memory Injection | DD | Dormant adversarial content planted in memory that activates only under specific trigger conditions |
| T-D4.6 | Belief Reinforcement Loop | DD | Adversarial content that causes the agent to reinforce false beliefs through self-reflection |
6. Action Layer Threats (T-A)
Threats targeting the agent's tool use, output generation, and physical actuation.
T-A1: Tool and API Misuse
| ID | Threat | Domain | Description |
|---|
| T-A1.1 | Unauthorized Tool Invocation | DD | Agent invokes tools outside its authorized scope, induced by adversarial input |
| T-A1.2 | Excessive Permission Exercise | DD | Agent uses legitimate tool access but exceeds intended scope |
| T-A1.3 | Tool Parameter Manipulation | DD | Adversarial inputs cause the agent to pass harmful parameters to legitimate tools |
| T-A1.4 | Data Exfiltration via Tool Use | DD | Agent is induced to extract and transmit sensitive data through authorized tool channels |
| T-A1.5 | Credential Leakage | DD | Agent exposes API keys, tokens, or authentication credentials through tool calls or output |
| T-A1.6 | Side-Channel Tool Exploitation | DD | Using tool invocation patterns, timing, or error responses to extract information about internal state |
T-A2: Output Manipulation
| ID | Threat | Domain | Description |
|---|
| T-A2.1 | Harmful Content Generation | DD | Agent produces content that violates its content policies (toxic, deceptive, illegal) |
| T-A2.2 | Misinformation Generation | DD | Agent generates and distributes false information presented as factual |
| T-A2.3 | Social Engineering Output | DD | Agent is manipulated into producing persuasive content for phishing, fraud, or manipulation |
| T-A2.4 | Downstream Agent Poisoning | DD | Agent outputs specifically crafted to exploit known vulnerabilities in consuming agents |
T-A3: Physical Actuation Threats
| ID | Threat | Domain | Description |
|---|
| T-A3.1 | Actuation Hijacking — Gross Motor | PD | Adversarial control of primary movement systems (locomotion, flight, navigation) |
| T-A3.2 | Actuation Hijacking — Fine Motor | PD | Adversarial control of precision actuators (robotic grippers, surgical tools, assembly mechanisms) |
| T-A3.3 | Safety Envelope Violation | PD | Adversarial inputs that cause the agent to exceed defined operational limits |
| T-A3.4 | Collision Induction | PD | Manipulation that causes unintended physical contact between the agent and its environment |
| T-A3.5 | Oscillation/Instability Induction | PD | Adversarial inputs creating feedback loops that cause physical oscillation or instability |
| T-A3.6 | Fail-Safe Suppression | PD | Attacks that prevent the agent from transitioning to its defined fail-safe state |
7. System-Level Threats (T-S)
Threats that operate across the PDA loop or target the agent system architecture.
T-S1: Multi-Agent Threats
| ID | Threat | Domain | Description |
|---|
| T-S1.1 | Cascade Compromise | DDCD | Compromise of one agent propagating to others through trusted communication channels |
| T-S1.2 | Agent Impersonation | DDCD | Adversary masquerading as a trusted agent in a multi-agent system |
| T-S1.3 | Swarm Manipulation | PD | Adversarial control of one agent in a coordinated swarm to induce collective failure |
| T-S1.4 | Task Delegation Exploitation | DD | Manipulation of task routing to direct sensitive tasks to compromised agents |
| T-S1.5 | Consensus Poisoning | DDCD | Corrupting the shared state or consensus mechanism in cooperative multi-agent systems |
| T-S1.6 | Agent-to-Agent Prompt Injection | DD | Injection attacks propagated through inter-agent communication protocols |
T-S2: Trust Boundary Threats
| ID | Threat | Domain | Description |
|---|
| T-S2.1 | Trust Boundary Collapse | DDCD | Erosion of authentication or authorization between the agent and external systems |
| T-S2.2 | Privilege Escalation via Agent | DD | Using the agent as a confused deputy to access resources beyond the adversary's direct authorization |
| T-S2.3 | Identity Spoofing | DDCD | Falsification of identity credentials presented to the agent by external systems or users |
| T-S2.4 | MCP Protocol Exploitation | DD | Protocol layer — malicious server registration, capability spoofing, session hijacking |
| T-S2.5 | Actuator Wear Exploitation | PD | Low-magnitude adversarial inputs that accelerate physical wear on actuators |
T-S3: Temporal and Lifecycle Threats
| ID | Threat | Domain | Description |
|---|
| T-S3.1 | Behavioral Drift Induction | DDPD | Slow, deliberate manipulation of the agent's behavior over extended time periods |
| T-S3.2 | Model Update Exploitation | DDPD | Attacking during model update windows when security configurations may be temporarily inconsistent |
| T-S3.3 | Training Data Retroactive Poisoning | DDPD | Compromising data sources used for fine-tuning or RLHF to introduce adversarial behaviors |
| T-S3.4 | Context Accumulation Attack | DD | Exploiting long-running sessions where accumulated context gradually shifts agent behavior |
T-S4: Convergent Domain Threats
| ID | Threat | Domain | Description |
|---|
| T-S4.1 | Prompt-to-Physical Exploit | CD | Adversarial digital input causing unintended physical action |
| T-S4.2 | Physical-to-Digital Exploit | CD | Adversarial physical manipulation causing harmful digital actions |
| T-S4.3 | Domain Transition Exploitation | CD | Attacks exploiting configuration gaps during digital-to-physical mode transitions |
| T-S4.4 | Sim-to-Real Transfer Attack | CD | Exploiting the domain gap between simulated training and physical deployment |
| T-S4.5 | Digital Twin Desynchronization | CD | Manipulation of the digital twin representation to diverge from physical reality |
Part III — Defense Taxonomy
8. Endogenous Defenses (D-I)
Defenses embedded within the agent's decision architecture — the source of agentegrity.
D-I1: Perception Integrity
| ID | Defense | Description |
|---|
| D-I1.1 | Adversarial Input Detection | Embedded models that identify adversarial patterns in inputs before they reach the decision layer |
| D-I1.2 | Input Provenance Verification | Cryptographic or behavioral verification of input source authenticity |
| D-I1.3 | Multi-Modal Consistency Checking | Cross-referencing multiple sensor or data modalities to detect spoofing in any single channel |
| D-I1.4 | Input Anomaly Scoring | Real-time statistical analysis of input distributions to flag deviations from expected patterns |
| D-I1.5 | Sensor Fusion Integrity | Weighted fusion algorithms that deprioritize sensor channels exhibiting anomalous behavior |
| D-I1.6 | Context Integrity Verification | Validation that retrieved context (RAG, memory, tool outputs) has not been tampered with |
D-I2: Decision Integrity
| ID | Defense | Description |
|---|
| D-I2.1 | Policy Enforcement Model | Embedded cortical model that validates every decision against the agent's defined policy before execution |
| D-I2.2 | Chain-of-Thought Monitoring | Real-time analysis of the agent's reasoning chain to detect manipulation or policy deviation |
| D-I2.3 | Goal Consistency Verification | Continuous validation that the agent's active objectives align with its authorized mission |
| D-I2.4 | Behavioral Baseline Comparison | Runtime comparison of current decision patterns against the agent's established behavioral baseline |
| D-I2.5 | Confidence Calibration | Mechanisms that maintain calibrated uncertainty, preventing adversarial confidence inflation or deflation |
| D-I2.6 | Safety Constraint Hardening | Non-overridable safety constraints that persist regardless of reasoning-layer manipulation |
| D-I2.7 | Adversarial Coherence Monitoring | Detection of decision sequences that are individually compliant but collectively adversarial |
D-I3: Action Integrity
| ID | Defense | Description |
|---|
| D-I3.1 | Pre-Execution Validation | Verification of every planned action against authorized scope, parameters, and safety constraints |
| D-I3.2 | Tool Authorization Enforcement | Runtime enforcement of tool-level permissions, preventing unauthorized invocations |
| D-I3.3 | Output Integrity Screening | Embedded screening of generated outputs for policy violations and adversarial content |
| D-I3.4 | Safety Envelope Enforcement (Physical) | Hard limits on actuator commands that cannot be overridden by the decision layer |
| D-I3.5 | Rate and Magnitude Limiting | Constraining the speed and scope of actions to prevent rapid large-scale harm |
| D-I3.6 | Actuation Boundary Enforcement | Physical-layer safety systems that terminate actuator commands exceeding defined parameters |
D-I4: Memory Integrity
| ID | Defense | Description |
|---|
| D-I4.1 | Memory Integrity Hashing | Cryptographic integrity verification of stored memory contents |
| D-I4.2 | Memory Write Validation | Screening of all new memory entries for adversarial content before persistence |
| D-I4.3 | Belief Consistency Auditing | Periodic validation that stored beliefs, facts, and policies remain internally consistent |
| D-I4.4 | Memory Provenance Tracking | Tracking the source and timestamp of all memory entries to enable targeted remediation |
| D-I4.5 | Sleeper Detection Scanning | Periodic analysis of stored memory for dormant adversarial content matching known injection patterns |
D-I5: Recovery Mechanisms
| ID | Defense | Description |
|---|
| D-I5.1 | Behavioral Checkpoint and Restore | Periodic snapshots of verified-good behavioral state enabling rollback after compromise |
| D-I5.2 | Memory Quarantine and Remediation | Isolation and cleaning of compromised memory segments without full system reset |
| D-I5.3 | Self-Diagnostic Routine | Embedded diagnostic that periodically tests the agent's own decision integrity against known-good scenarios |
9. Exogenous Defenses (D-E)
Defenses applied from outside the agent's decision architecture — complementary to but not substitutes for endogenous defenses.
| ID | Defense | Description |
|---|
| D-E1 | Input Guardrails | Pre-processing filters that screen inputs before they reach the agent |
| D-E2 | Output Guardrails | Post-processing filters that screen agent outputs before delivery |
| D-E3 | Network-Level Controls | API gateways, rate limiting, and network security applied at the infrastructure layer |
| D-E4 | Human-in-the-Loop Oversight | Required human approval for high-risk actions or decisions exceeding defined thresholds |
| D-E5 | Audit Logging | Comprehensive logging of all agent inputs, decisions, and actions for post-hoc analysis |
| D-E6 | Sandbox Isolation | Execution of agent actions in constrained environments limiting blast radius |
| D-E7 | Least Privilege Enforcement | Infrastructure-level restriction of agent permissions to minimum required scope |
| D-E8 | Inter-Agent Authentication | Cryptographic identity verification between agents in multi-agent systems |
| D-E9 | Physical Safety Interlocks | Hardware-level safety systems independent of the AI decision layer (emergency stops, physical limiters) |
10. Defense Depth Classification
The agentegrity taxonomy classifies every defense by its cortical embedding depth — how deeply it integrates into the agent's decision architecture:
| Level | Name | Description | Agentegrity Impact |
|---|
| L0 | External | Applied outside the agent entirely (infrastructure, network) | No contribution to endogenous security |
| L1 | Boundary | Operates at the agent's input-output boundary (guardrails, filters) | Minimal — no residual defense |
| L2 | Surface | Operates within the agent but only on inputs/outputs, not reasoning | Low — detects but doesn't reason |
| L3 | Integrated | Participates in the agent's reasoning process for specific functions | Moderate — contributes to adversarial coherence |
| L4 | Embedded | Fully integrated into the agent's decision architecture across all PDA stages | High — primary source of agentegrity |
| L5 | Constitutional | Trained into the model weights, inseparable from the agent's core capabilities | Maximum — agentegrity is inherent |
Defenses at L0–L1 are exogenous. Defenses at L2–L5 are endogenous. The agentegrity score is primarily determined by the effectiveness of L3–L5 defenses.
Part IV — Measurement Taxonomy
11. Agentegrity Dimensions
| Code | Dimension | Measures | Primary Threats Assessed |
|---|
| AR | Adversarial Resistance | Resilience to deliberate attack | T-P1, T-P2, T-D1, T-D2, T-D3, T-A1, T-A3 |
| BC | Behavioral Consistency | Decision stability under variation | T-P3, T-D2.3, T-S3.1, T-S3.4 |
| RI | Recovery Integrity | Autonomous recovery after compromise | T-D4 (all), T-A3.6, D-I5 effectiveness |
| CP | Cross-Domain Portability | Security transfer across environments | T-S4.3, T-S4.4, environmental dependency |
12. Scoring Scale
| Score | Tier | Label | Operational Meaning |
|---|
| 0.85–1.00 | A | Hardened | Deploy with confidence in adversarial environments |
| 0.70–0.84 | B | Resilient | Deploy with standard monitoring |
| 0.50–0.69 | C | Developing | Deploy with enhanced oversight and restricted scope |
| 0.25–0.49 | D | Vulnerable | Deploy only in sandboxed or supervised environments |
| 0.00–0.24 | F | Guardrail-Dependent | Do not deploy autonomously |
13. Metrics Reference
13.1 Adversarial Resistance Metrics
| Metric | ID | Formula | Unit |
|---|
| Adversarial Resistance Rate | M-AR1 | (R×1.0 + D×0.85 + G×0.40 + C×0.0) / N | Ratio [0,1] |
| Adversarial Resistance Index | M-AR2 | Weighted mean of per-category ARR values | Ratio [0,1] |
| Safety Envelope Violation Rate | M-AR3 | safety_violations / safety_targeted_attempts | Ratio [0,1] — 0.0 required for Tier A/B |
| Zero-Day Resistance Rate | M-AR4 | ARR computed only on novel attack variations | Ratio [0,1] |
13.2 Behavioral Consistency Metrics
| Metric | ID | Formula | Unit |
|---|
| Behavioral Deviation Rate | M-BC1 | decisions_changed / total_decisions | Ratio [0,1] |
| Behavioral Consistency Rate | M-BC2 | 1.0 − BDR | Ratio [0,1] |
| Behavioral Drift Rate (Temporal) | M-BC3 | ΔBDR / Δtime | Rate [0,∞) — lower is better |
| Perturbation Sensitivity Index | M-BC4 | max(BDR_class) − min(BDR_class) | Range [0,1] |
13.3 Recovery Integrity Metrics
| Metric | ID | Formula | Unit |
|---|
| Recovery Half-Life | M-RI1 | min(t) : accuracy(t) ≥ 0.5 × baseline | Decision cycles |
| Full Recovery Time | M-RI2 | min(t) : accuracy(t) ≥ 0.95 × baseline | Decision cycles |
| Recovery Completeness | M-RI3 | max(accuracy(t)) / baseline_accuracy | Ratio [0,1] |
| Residual Compromise Rate | M-RI4 | compromise_effects_remaining / total | Ratio [0,1] |
| Recovery Integrity Rate | M-RI5 | Composite of M-RI1 through M-RI4 | Ratio [0,1] |
13.4 Cross-Domain Portability Metrics
| Metric | ID | Formula | Unit |
|---|
| AR Variance | M-CP1 | 1.0 − σ(AR_envs) / μ(AR_envs) | Ratio [0,1] |
| BC Variance | M-CP2 | 1.0 − σ(BC_envs) / μ(BC_envs) | Ratio [0,1] |
| Portability Cliff Count | M-CP3 | Environment transitions with >0.20 score drop | Count — 0 is ideal |
| Domain Transfer Loss | M-CP4 | AR_primary − AR_worst_environment | Delta — lower is better |
13.5 System-Level Metrics
| Metric | ID | Formula | Unit |
|---|
| Cascade Resistance | M-SY1 | 1.0 − (agents_affected / agents_total) × severity | Ratio [0,1] |
| Trust Boundary Integrity | M-SY2 | Composite of authentication, authorization, and validation tests | Ratio [0,1] |
| System Agentegrity Score | M-SY3 | 0.60 × mean(A_individual) + 0.25 × CR + 0.15 × TBI | Ratio [0,1] |
| Cascade Propagation Speed | M-SY4 | agents_compromised / time_elapsed | Rate — lower is better |
| Weakest Agent Score | M-SY5 | min(A_individual) across all agents | Ratio [0,1] |
Part V — Assessment Methodology
14. Assessment Types
| Type | Scope | When | Coverage |
|---|
| Full Assessment | All 4 dimensions, all applicable threat categories | Pre-deployment, quarterly, post-incident | Complete |
| Dimensional Assessment | Single dimension (AR, BC, RI, or CP) | Post-update, targeted improvement validation | Partial |
| Continuous Monitoring | AR and BC automated testing in production | Ongoing | Automated subset |
| System Assessment | Multi-agent extension (individual + cascade + trust) | Multi-agent deployments | System-level |
| Physical Addendum | Safety envelope, sim-to-real, fail-safe reliability | Physical and convergent domain agents | Physical-specific |
15. Assessment Coverage Matrix
| Requirement | AR | BC | RI | CP |
|---|
| Threat categories tested | ≥3 per PDA layer | ≥3 perturbation classes | ≥5 confirmed compromises | ≥3 environments |
| Test instances per category | ≥50 | ≥100 | ≥500 cycles observed | Full AR+BC per environment |
| Novel/zero-day variations | ≥1 per layer | N/A | ≥1 from each PDA layer | N/A |
16. Weight Profiles
| Profile | AR | BC | RI | CP | Use Case |
|---|
| General | 0.35 | 0.25 | 0.20 | 0.20 | Default for most assessments |
| Safety-Critical | 0.30 | 0.30 | 0.30 | 0.10 | Physical AI, medical, infrastructure |
| Multi-Agent | 0.40 | 0.20 | 0.20 | 0.20 | Systems with cascade risk |
| Cross-Environment | 0.25 | 0.20 | 0.15 | 0.40 | Heterogeneous deployment |
| Compliance | 0.25 | 0.30 | 0.25 | 0.20 | Regulatory assessment |
| Physical-First | 0.35 | 0.25 | 0.30 | 0.10 | Embodied agents with high safety requirements |
Part VI — Regulatory & Standards Mapping
17. Framework Alignment
| Regulation / Standard | Dimension | Relevant Taxonomy Elements |
|---|
| EU AI Act — Conformity Assessment | AR, BC, RI | T-D3 (policy evasion), T-A3 (safety), D-I2 (decision integrity) |
| EU AI Act — High-Risk System Reqs | BC, RI | M-BC1-4 (consistency), M-RI1-5 (recovery), D-I5 (recovery mechanisms) |
| NIST AI RMF — GOVERN | All | Agentegrity Policy, Assessment Types, Weight Profiles |
| NIST AI RMF — MAP | AR | Threat Surface Taxonomy (T-P, T-D, T-A, T-S) |
| NIST AI RMF — MEASURE | AR, BC, RI, CP | Metrics Reference (M-AR, M-BC, M-RI, M-CP) |
| NIST AI RMF — MANAGE | RI, BC | D-I5 (recovery), Continuous Monitoring, Degradation Curve |
| MITRE ATLAS | AR | Threat Surface Taxonomy — extends ATLAS to physical domain |
| OWASP Top 10 for LLM | AR | T-P1.1 (prompt injection), T-A1.4 (data exfil), T-D4 (memory attacks) |
| ISO 10218 (Industrial Robots) | AR, RI | T-A3 (actuation threats), D-I3.4-6 (physical safety), M-AR3 (SEVR) |
| IEC 62443 (Industrial Automation) | AR, CP | T-S2 (trust boundaries), D-E7-9 (exogenous physical controls) |
| ISO/SAE 21434 (Automotive) | AR, BC, RI | T-P1.5-9 (sensor attacks), T-A3.1 (actuation), T-S4.1 (prompt-to-physical) |
| NIST SP 800-82 (OT Security) | AR | T-A3 (actuation), T-S4 (convergent domain), D-E9 (physical interlocks) |
Part VII — Taxonomy Governance
18. Version Control
This taxonomy is maintained as a living document. Changes follow the specification versioning protocol:
- Major versions (2.0, 3.0): New Parts, structural reorganization, breaking changes to ID scheme
- Minor versions (1.1, 1.2): New threat categories, new defenses, new metrics
- Patch versions (1.0.1): Corrections, clarifications, editorial
19. Contribution Process
Community contributions are accepted via pull request to the agentegrity-framework repository. New threat entries must include: ID (following the hierarchical scheme), domain applicability, description, at least one concrete attack scenario, and mapping to relevant defense categories. New defense entries must include: ID, cortical embedding depth classification, description, and mapping to threats mitigated.
20. Open Questions
The following areas are identified for community input and future research:
- Quantum computing threats to AI agents — how do post-quantum concerns affect agent memory integrity and credential management?
- Biological agent systems — as AI agents integrate with biological systems (brain-computer interfaces, bioengineering), what new threat classes emerge?
- Autonomous agent-to-agent negotiation security — when agents negotiate with each other autonomously, what new manipulation vectors emerge beyond current multi-agent threats?
- Long-term behavioral drift measurement — what are the optimal observation windows and statistical methods for detecting sub-threshold drift?
- Physical agentegrity in unstructured environments — how does agentegrity assessment change when physical agents operate in fully unstructured environments (disaster response, deep sea, space)?
This taxonomy is maintained by Cogensec as a public resource for the agentegrity discipline.
cogensec.com · github.com/requie/agentegrity-framework