About Agentegrity

Agentegrity is the AI agent security framework developed by Cogensec for measuring the structural integrity of autonomous AI agents. It scores agent integrity across four dimensions — Adversarial Resistance (AR), Behavioral Consistency (BC), Recovery Integrity (RI), and Cross-domain Portability (CP) — to quantify how safely an agent behaves under adversarial and out-of-distribution conditions.

Agentegrity Framework Glossary

Agentegrity: The structural integrity of an autonomous AI agent — its measurable ability to remain aligned, coherent, and safe under adversarial, ambiguous, or out-of-distribution conditions.
Adversarial Resistance (AR): An agent's capacity to maintain correct behavior under prompt injection, jailbreak attempts, and other adversarial inputs. Weighted 40% of the Agentegrity score.
Behavioral Consistency (BC): The degree to which an agent's outputs and decisions remain stable and predictable across semantically equivalent inputs. Weighted 25% of the Agentegrity score.
Recovery Integrity (RI): The agent's ability to detect, contain, and recover from failures or compromises without cascading harm. Weighted 15% of the Agentegrity score.
Cross-domain Portability (CP): How well an agent's integrity properties hold when deployed across different domains, modalities, or physical embodiments. Weighted 20% of the Agentegrity score.
Endogenous Security: Security properties that originate inside the AI system — in its weights, training, and policies — rather than being applied externally through filters or guardrails.

Frequently Asked Questions about Agentegrity

What is Agentegrity?

Agentegrity is an AI agent security framework developed by Cogensec that measures the structural integrity of autonomous AI agents. It quantifies agent integrity across four dimensions: Adversarial Resistance (AR), Behavioral Consistency (BC), Recovery Integrity (RI), and Cross-domain Portability (CP).

What is an AI agent security framework?

An AI agent security framework is a structured methodology for measuring, verifying, and improving the security posture of autonomous AI agents. Agentegrity is the first framework to score agents on endogenous (built-in) integrity rather than relying solely on external guardrails.

How is agent integrity measured?

Agent integrity is measured using the Agentegrity score: a weighted composite of Adversarial Resistance (40%), Behavioral Consistency (25%), Recovery Integrity (15%), and Cross-domain Portability (20%). Each dimension is evaluated through standardized red-team tests and behavioral probes.

What is endogenous security for AI agents?

Endogenous security means safety and integrity properties live inside the agent itself — in its training, weights, and decision policies — rather than being bolted on as external filters or guardrails. Agentegrity is the discipline of building structurally sound agents from the inside out.

How is Agentegrity different from LLM guardrails?

Guardrails are external filters that wrap an AI system at runtime. Agentegrity measures and improves the agent's own structural integrity so it remains safe even when guardrails fail, are bypassed, or are removed entirely. The two approaches are complementary: guardrails are reactive, Agentegrity is foundational.

Who created Agentegrity?

Agentegrity was developed by the Cogensec Security Research Lab as a public framework for measuring and certifying the integrity of autonomous AI agents across digital and physical domains.

Related Resources

About the Agentegrity Framework

Agentegrity (agent + integrity) is an open framework, discipline, and reference implementation for measuring and verifying the structural integrity of autonomous AI agents. Created by Cogensec and released under Apache 2.0, it instruments an existing agent loop with four cooperating evaluation layers (adversarial, cortical, governance, recovery) and emits a tamper-evident, hash-chained record of every reasoning step.

Supported Frameworks

Anthropic Claude Agent SDK
LangChain and LangGraph
OpenAI Agents SDK
CrewAI
Google ADK
Vercel AI SDK (TypeScript)

Canonical Event Stream

session_start
tool_call
tool_response (with channel: tool_response | peer_messages | retrieved_documents)
peer_message
session_end

Integrity Dimensions and Certification Tiers

Four integrity dimensions — Adversarial Resistance (AR), Behavioral Consistency (BC), Recovery Integrity (RI), Cross-Domain Portability (CP) — produce composite scores that map to four certification tiers: Certified, Conditional, Probationary, Non-Compliant.

Key Resources

Documentation: https://agentegrity.cogensec.com/docs
Source repository (Apache 2.0): https://github.com/Cogensec/agentegrity-framework
Research paper: https://cogensec.com/research/agentegrity-framework
Full-text LLM bundle: https://cogensec.com/llms-agentegrity-full.txt
Focused llms.txt: https://cogensec.com/llms-agentegrity.txt

Glossary

Agentegrity: Open framework and discipline (agent + integrity) for measuring the structural integrity of autonomous AI agents. Created by Cogensec, released under Apache 2.0. Complements exogenous guardrails with endogenous, measurable security properties.
Adversarial Resistance (AR): Agentegrity integrity dimension that measures an agent's ability to detect and resist prompt injection, jailbreaks, sociolinguistic intent drift, and data exfiltration framings introduced through tool responses, peer messages, or retrieved documents.
Behavioral Consistency (BC): Agentegrity integrity dimension that measures whether the agent's outputs remain consistent with its declared AgentProfile — its allowed tools, scope, and safety constraints — across a session.
Recovery Integrity (RI): Agentegrity integrity dimension that measures whether the agent can cleanly checkpoint, roll back, terminate, or hand off to an operator when entering an unrecoverable state.
Cross-Domain Portability (CP): Agentegrity integrity dimension that measures whether security properties hold when an agent moves across digital and physical domains, including multi-agent cascade scenarios.
Cortical Layer: One of Agentegrity's four evaluation layers. Scores whether the agent's own output stays inside the capabilities, scope, and constraints declared in its AgentProfile.
Adversarial Layer: Agentegrity layer that inspects incoming inputs (tool responses, peer messages, retrieved documents, user prompts) for prompt injection, jailbreak prefixes, exfiltration framings, and sociolinguistic intent drift.
Governance Layer: Agentegrity layer that applies operator-defined policy: rate limits, denylists, required approvals, structured deviations from a BaselineStore baseline. Policy rules are testable code rather than free-form prompts.
Recovery Layer: Agentegrity layer that watches for unrecoverable agent states and triggers checkpoint rollback, session termination, or operator handoff via FileCheckpoint or KMSCheckpoint.
Attestation Chain: Tamper-evident, hash-chained record produced by Agentegrity. Each evaluation record sets prev_hash = sha256(prior record), allowing end-to-end verification at session close. Records can be optionally signed via Ed25519 + JWS.
Canonical Event Stream: The five normalized event types every Agentegrity adapter emits: session_start, tool_call, tool_response, peer_message, session_end. Decouples the layer pipeline from any specific agent framework.
PDA Loop: Perception-Decision-Action loop. The attack-surface model Agentegrity uses to map where adversarial influence can enter an agent's reasoning process.
Measure-only Mode: Default Agentegrity operating mode. Layers score and record events but never block tool calls. Used for baseline calibration before enforcement.
Enforce Mode: Agentegrity operating mode enabled by setting enforce=True (Python) or enforce: true (TypeScript) on the adapter. Upgrades detected violations from recorded events to active refusals.
AgentProfile: Agentegrity declaration of an agent's allowed capabilities, scope, and safety constraints. Used by the cortical layer to score whether each step conforms.
Certification Tiers: Agentegrity composite-score tiers: Certified, Conditional, Probationary, Non-Compliant. Computed from the four integrity dimensions (AR, BC, RI, CP).

Frequently Asked Questions

What is Agentegrity?

Agentegrity (agent + integrity) is an open framework and discipline for measuring the structural integrity of autonomous AI agents. Created by Cogensec and released under Apache 2.0, it instruments an existing agent loop with four cooperating evaluation layers — adversarial, cortical, governance, recovery — and emits a tamper-evident, hash-chained record of every reasoning step.

How does Agentegrity differ from AI guardrails?

Guardrails are exogenous: they sit outside the agent and try to filter inputs and outputs. Agentegrity is endogenous: it measures the agent's own structural integrity — whether it stays inside its declared profile, resists adversarial inputs, recovers cleanly, and remains consistent across domains. The two are complementary; Agentegrity does not replace guardrails.

Which AI agent frameworks does Agentegrity support?

Agentegrity ships first-party adapters for Anthropic Claude Agent SDK, LangChain / LangGraph, OpenAI Agents SDK, CrewAI, Google ADK, and the Vercel AI SDK. Each adapter translates framework-native events into Agentegrity's canonical event stream; the four evaluation layers downstream are shared.

What are the four Agentegrity layers?

Adversarial (detects prompt injection, jailbreaks, sociolinguistic intent drift, exfiltration framings in incoming inputs), Cortical (scores whether the agent's output stays inside its declared AgentProfile), Governance (applies operator policy and baseline deviation rules), and Recovery (handles checkpoint rollback, session termination, operator handoff).

What are the four Agentegrity integrity dimensions?

Adversarial Resistance (AR), Behavioral Consistency (BC), Recovery Integrity (RI), and Cross-Domain Portability (CP). Composite scores across these dimensions produce certification tiers: Certified, Conditional, Probationary, Non-Compliant.

What is the Agentegrity attestation chain?

Every evaluation produces a record whose prev_hash field equals sha256 of the previous record. The chain is verifiable end-to-end at session close, providing non-repudiation of what the agent did and what each layer concluded. Records can be optionally signed using Ed25519 + JWS via the crypto extra.

What is the difference between measure-only and enforce mode?

Measure-only is the default: layers score and record events but never block tool calls. Setting enforce=True (Python) or enforce: true (TypeScript) on the adapter upgrades detected violations to active refusals. Most production deployments measure first, calibrate baselines, then enforce.

Is Agentegrity open source?

Yes. The full framework, specification, and reference adapters live at https://github.com/Cogensec/agentegrity-framework under the Apache 2.0 license. Cogensec maintains the project and ships an optional commercial agentegrity-pro receiver for enterprise deployments.

How do I install Agentegrity?

Python: pip install agentegrity (with optional extras like [crypto], [embedding], [adversarial_llm]). TypeScript: npm install @agentegrity/client plus the adapter for your framework, for example @agentegrity/langchain or @agentegrity/openai-agents. See the Quickstart at https://agentegrity.cogensec.com/docs/quickstart.

Who created Agentegrity?

Agentegrity was created by Cogensec, an AI security and research company and member of the NVIDIA Inception Program. The founding research paper, "Agentegrity: A Framework for Measuring Structural Integrity of Autonomous AI Agents Across Digital and Physical Domains" by Tarique Smith, is published at https://cogensec.com/research/agentegrity-framework.

Structural Integrity for Autonomous AI

Read the Manifesto View the Research

Security that lives inside the agent — not around it. Measure, verify, and guarantee the integrity of AI systems across every domain.

Adoption

Trusted by developers at these companies

Watch

The Future of AI Trust Starts Within

Why structural integrity must live inside the agent — not around it.

Open Source · Apache 2.0

Instrument your agent in 3 lines

Agentegrity is a measurement and verification library — not a guardrail. Drop it into your existing stack with zero config.

View on GitHub

Installpip install "agentegrity[claude]"

from claude_agent_sdk import ClaudeSDKClient, ClaudeAgentOptions
from agentegrity.claude import hooks, report

async with ClaudeSDKClient(options=ClaudeAgentOptions(hooks=hooks())) as sdk:
    await sdk.query("Summarize the latest LLM safety papers")
print(report())

Measure-only by default — Agentegrity never blocks tool calls. Blocking only happens via explicit governance policy.

Zero network calls

Runs locally. Never phones home to Cogensec or anyone else.

Measure, don't block

Produces evidence and signed attestations. Your governance policy decides what to do with them.

Bring your own framework

11 official adapters across Python and TypeScript. Custom adapters via the SessionExporter interface.

How Agentegrity Works

Three architectural layers that make security an endogenous property of the agent, not an external dependency.

Adversarial Layer

Continuous red-team testing embedded directly into the agent's reasoning pipeline, detecting prompt injection and manipulation in real time.

Learn more

Cortical Layer

Deep behavioral anchors that maintain agent identity and value alignment across context shifts, tool use, and multi-turn conversations.

Learn more

Governance Layer

Structural compliance verification and recovery mechanisms that operate without external monitoring dependencies.

Learn more

Core Thesis

Two approaches. One complete defense.

Exogenous Security

Monitor agents from the outside

Guardrails, input/output filters, and boundary monitoring. Observes API patterns, enforces rate limits, and filters content at the perimeter.

API call patterns and network traffic

Input/output content filtering

Rate limiting and access control

Internal reasoning chain corruption

Subtle behavioral drift over time

Cross-stage feedback attacks

Agentegrity

Endogenous Security

Observe reasoning from within

Cortical models and embedded defenses. Monitors the agent's own reasoning, detecting drift and corruption at the source.

Internal reasoning chain integrity

Behavioral drift detection (BDR-T)

Cross-stage feedback monitoring

Value alignment state verification

Recovery Half-Life (RHL) measurement

Certain attack classes — particularly those exploiting internal reasoning — are invisible to boundary-only monitoring. Endogenous defenses are the necessary complement.

Attack Surface

Map every stage of the decision loop

Every autonomous agent operates in a continuous Perception → Decision → Action cycle. The framework maps where attacks enter and where defenses must operate.

Perception

Sensor input, API responses. Entry point for injection and poisoning.

Decision

Reasoning and planning. Target of memory corruption and value drift.

Action

Tool calls, physical actuation. Source of cross-stage feedback attacks.

Cross-Stage Feedback Attacks

The most dangerous class: an adversary poisons tool outputs (Action) which feed back into Perception, corrupting subsequent Decision cycles. These are invisible to exogenous monitors.

Scoring

Four dimensions. One composite score.

Adversarial Resistance

Resilience against prompt injection, jailbreaks, and manipulation attempts.

Behavioral Consistency

Stability of agent behavior across diverse contexts and adversarial conditions.

Recovery Integrity

Speed and completeness of return to baseline after perturbation.

Cross-Domain Portability

Security property retention when agents move between digital and physical domains.

Formula

Weighted composite across four dimensions

Adversarial resistance carries the highest weight — but all dimensions matter. Weights are configurable per deployment context.

A=0.35·AR+0.25·BC+0.20·RI+0.20·CP

Weights

AR35%

BC25%

RI20%

CP20%

Default weights. Physical AI deployments may increase CP weight.

Certification

From hardened to guardrail-dependent

Hardened

0.85 - 1.00

Agent demonstrates robust endogenous defenses across all four dimensions. Suitable for high-stakes autonomous deployment.

Resilient

0.70 - 0.84

Strong endogenous security with minor gaps. Minimal exogenous monitoring recommended.

Functional

0.55 - 0.69

Adequate baseline security. Exogenous guardrails should supplement endogenous defenses.

Fragile

0.40 - 0.54

Significant security gaps. Heavy guardrail dependency required for safe operation.

Vulnerable

0.25 - 0.39

Critically weak endogenous properties. Not recommended for autonomous operation.

Guardrail-Dependent

< 0.25

No meaningful endogenous security. Entirely reliant on external defenses.

Physical AI

When AI controls real-world systems

Agentegrity extends beyond digital agents. When AI drives robotic systems, autonomous vehicles, or drones, three novel threat classes emerge.

Vehicles Robotics Drones IoT

Prompt-to-Physical

Adversarial prompts that cross the digital-physical boundary, causing embodied agents to take harmful real-world actions through manipulated reasoning.

Actuation Hijacking

Direct manipulation of an agent's physical actuators — motors, grippers, valves — bypassing its decision-making pipeline entirely.

Sim-to-Real Transfer Attacks

Exploiting the gap between simulated training environments and real-world deployment to inject vulnerabilities during model transfer.

Multi-Agent

Contain failures before they cascade

When agents collaborate, a single compromise can cascade. System-level Agentegrity measures containment and trust boundary preservation.

Cascade Resistance (CR)

Fraction of agents uncompromised when one is breached. Higher CR = better containment.

Trust Boundary Integrity (TBI)

Whether trust degrades gracefully — or collapses entirely — when agents are compromised.

Security built in, not bolted on.

Read the founding manifesto or explore the full research framework behind the Agentegrity Score.

The Manifesto Research Framework