About Agentegrity

Agentegrity is the AI agent security framework developed by Cogensec for measuring the structural integrity of autonomous AI agents. It scores agent integrity across four dimensions — Adversarial Resistance (AR), Behavioral Consistency (BC), Recovery Integrity (RI), and Cross-domain Portability (CP) — to quantify how safely an agent behaves under adversarial and out-of-distribution conditions.

Agentegrity Framework Glossary

Agentegrity
The structural integrity of an autonomous AI agent — its measurable ability to remain aligned, coherent, and safe under adversarial, ambiguous, or out-of-distribution conditions.
Adversarial Resistance (AR)
An agent's capacity to maintain correct behavior under prompt injection, jailbreak attempts, and other adversarial inputs. Weighted 40% of the Agentegrity score.
Behavioral Consistency (BC)
The degree to which an agent's outputs and decisions remain stable and predictable across semantically equivalent inputs. Weighted 25% of the Agentegrity score.
Recovery Integrity (RI)
The agent's ability to detect, contain, and recover from failures or compromises without cascading harm. Weighted 15% of the Agentegrity score.
Cross-domain Portability (CP)
How well an agent's integrity properties hold when deployed across different domains, modalities, or physical embodiments. Weighted 20% of the Agentegrity score.
Endogenous Security
Security properties that originate inside the AI system — in its weights, training, and policies — rather than being applied externally through filters or guardrails.

Frequently Asked Questions about Agentegrity

What is Agentegrity?

Agentegrity is an AI agent security framework developed by Cogensec that measures the structural integrity of autonomous AI agents. It quantifies agent integrity across four dimensions: Adversarial Resistance (AR), Behavioral Consistency (BC), Recovery Integrity (RI), and Cross-domain Portability (CP).

What is an AI agent security framework?

An AI agent security framework is a structured methodology for measuring, verifying, and improving the security posture of autonomous AI agents. Agentegrity is the first framework to score agents on endogenous (built-in) integrity rather than relying solely on external guardrails.

How is agent integrity measured?

Agent integrity is measured using the Agentegrity score: a weighted composite of Adversarial Resistance (40%), Behavioral Consistency (25%), Recovery Integrity (15%), and Cross-domain Portability (20%). Each dimension is evaluated through standardized red-team tests and behavioral probes.

What is endogenous security for AI agents?

Endogenous security means safety and integrity properties live inside the agent itself — in its training, weights, and decision policies — rather than being bolted on as external filters or guardrails. Agentegrity is the discipline of building structurally sound agents from the inside out.

How is Agentegrity different from LLM guardrails?

Guardrails are external filters that wrap an AI system at runtime. Agentegrity measures and improves the agent's own structural integrity so it remains safe even when guardrails fail, are bypassed, or are removed entirely. The two approaches are complementary: guardrails are reactive, Agentegrity is foundational.

Who created Agentegrity?

Agentegrity was developed by the Cogensec Security Research Lab as a public framework for measuring and certifying the integrity of autonomous AI agents across digital and physical domains.

Related Resources

Structural Integrity for Autonomous AI

Security that lives inside the agent — not around it. Measure, verify, and guarantee the integrity of AI systems across every domain.

Adoption

Trusted by developers at these companies

Watch

The Future of AI Trust Starts Within

Why structural integrity must live inside the agent — not around it.

Open Source · Apache 2.0

Instrument your agent in 3 lines

Agentegrity is a measurement and verification library — not a guardrail. Drop it into your existing stack with zero config.

View on GitHub
Installpip install "agentegrity[claude]"
from claude_agent_sdk import ClaudeSDKClient, ClaudeAgentOptions
from agentegrity.claude import hooks, report

async with ClaudeSDKClient(options=ClaudeAgentOptions(hooks=hooks())) as sdk:
    await sdk.query("Summarize the latest LLM safety papers")
print(report())
Measure-only by default — Agentegrity never blocks tool calls. Blocking only happens via explicit governance policy.

Zero network calls

Runs locally. Never phones home to Cogensec or anyone else.

Measure, don't block

Produces evidence and signed attestations. Your governance policy decides what to do with them.

Bring your own framework

11 official adapters across Python and TypeScript. Custom adapters via the SessionExporter interface.

How Agentegrity Works

Three architectural layers that make security an endogenous property of the agent, not an external dependency.

Adversarial Layer

Continuous red-team testing embedded directly into the agent's reasoning pipeline, detecting prompt injection and manipulation in real time.

Learn more

Cortical Layer

Deep behavioral anchors that maintain agent identity and value alignment across context shifts, tool use, and multi-turn conversations.

Learn more

Governance Layer

Structural compliance verification and recovery mechanisms that operate without external monitoring dependencies.

Learn more
Core Thesis

Two approaches. One complete defense.

Exogenous Security

Monitor agents from the outside

Guardrails, input/output filters, and boundary monitoring. Observes API patterns, enforces rate limits, and filters content at the perimeter.

API call patterns and network traffic
Input/output content filtering
Rate limiting and access control
Internal reasoning chain corruption
Subtle behavioral drift over time
Cross-stage feedback attacks
Agentegrity
Endogenous Security

Observe reasoning from within

Cortical models and embedded defenses. Monitors the agent's own reasoning, detecting drift and corruption at the source.

Internal reasoning chain integrity
Behavioral drift detection (BDR-T)
Cross-stage feedback monitoring
Value alignment state verification
Recovery Half-Life (RHL) measurement

Certain attack classes — particularly those exploiting internal reasoning — are invisible to boundary-only monitoring. Endogenous defenses are the necessary complement.

Attack Surface

Map every stage of the decision loop

Every autonomous agent operates in a continuous Perception → Decision → Action cycle. The framework maps where attacks enter and where defenses must operate.

Perception

Sensor input, API responses. Entry point for injection and poisoning.

Decision

Reasoning and planning. Target of memory corruption and value drift.

Action

Tool calls, physical actuation. Source of cross-stage feedback attacks.

Cross-Stage Feedback Attacks

The most dangerous class: an adversary poisons tool outputs (Action) which feed back into Perception, corrupting subsequent Decision cycles. These are invisible to exogenous monitors.

Scoring

Four dimensions. One composite score.

AR

Adversarial Resistance

Resilience against prompt injection, jailbreaks, and manipulation attempts.

BC

Behavioral Consistency

Stability of agent behavior across diverse contexts and adversarial conditions.

RI

Recovery Integrity

Speed and completeness of return to baseline after perturbation.

CP

Cross-Domain Portability

Security property retention when agents move between digital and physical domains.

Formula

Weighted composite across four dimensions

Adversarial resistance carries the highest weight — but all dimensions matter. Weights are configurable per deployment context.

A=0.35·AR+0.25·BC+0.20·RI+0.20·CP

Weights
AR35%
BC25%
RI20%
CP20%

Default weights. Physical AI deployments may increase CP weight.

Certification

From hardened to guardrail-dependent

A

Hardened

0.85 - 1.00

Agent demonstrates robust endogenous defenses across all four dimensions. Suitable for high-stakes autonomous deployment.

B

Resilient

0.70 - 0.84

Strong endogenous security with minor gaps. Minimal exogenous monitoring recommended.

C

Functional

0.55 - 0.69

Adequate baseline security. Exogenous guardrails should supplement endogenous defenses.

D

Fragile

0.40 - 0.54

Significant security gaps. Heavy guardrail dependency required for safe operation.

E

Vulnerable

0.25 - 0.39

Critically weak endogenous properties. Not recommended for autonomous operation.

F

Guardrail-Dependent

< 0.25

No meaningful endogenous security. Entirely reliant on external defenses.

Physical AI

When AI controls real-world systems

Agentegrity extends beyond digital agents. When AI drives robotic systems, autonomous vehicles, or drones, three novel threat classes emerge.

Vehicles Robotics Drones IoT

Prompt-to-Physical

Adversarial prompts that cross the digital-physical boundary, causing embodied agents to take harmful real-world actions through manipulated reasoning.

Actuation Hijacking

Direct manipulation of an agent's physical actuators — motors, grippers, valves — bypassing its decision-making pipeline entirely.

Sim-to-Real Transfer Attacks

Exploiting the gap between simulated training environments and real-world deployment to inject vulnerabilities during model transfer.

Multi-Agent

Contain failures before they cascade

When agents collaborate, a single compromise can cascade. System-level Agentegrity measures containment and trust boundary preservation.

Cascade Resistance (CR)

Fraction of agents uncompromised when one is breached. Higher CR = better containment.

Trust Boundary Integrity (TBI)

Whether trust degrades gracefully — or collapses entirely — when agents are compromised.

Security built in, not bolted on.

Read the founding manifesto or explore the full research framework behind the Agentegrity Score.