Agentegrity (agent + integrity) is an open framework, discipline, and reference implementation for measuring and verifying the structural integrity of autonomous AI agents. Created by Cogensec and released under Apache 2.0, it instruments an existing agent loop with four cooperating evaluation layers (adversarial, cortical, governance, recovery) and emits a tamper-evident, hash-chained record of every reasoning step.
Four integrity dimensions — Adversarial Resistance (AR), Behavioral Consistency (BC), Recovery Integrity (RI), Cross-Domain Portability (CP) — produce composite scores that map to four certification tiers: Certified, Conditional, Probationary, Non-Compliant.
- Agentegrity
- Open framework and discipline (agent + integrity) for measuring the structural integrity of autonomous AI agents. Created by Cogensec, released under Apache 2.0. Complements exogenous guardrails with endogenous, measurable security properties.
- Adversarial Resistance (AR)
- Agentegrity integrity dimension that measures an agent's ability to detect and resist prompt injection, jailbreaks, sociolinguistic intent drift, and data exfiltration framings introduced through tool responses, peer messages, or retrieved documents.
- Behavioral Consistency (BC)
- Agentegrity integrity dimension that measures whether the agent's outputs remain consistent with its declared AgentProfile — its allowed tools, scope, and safety constraints — across a session.
- Recovery Integrity (RI)
- Agentegrity integrity dimension that measures whether the agent can cleanly checkpoint, roll back, terminate, or hand off to an operator when entering an unrecoverable state.
- Cross-Domain Portability (CP)
- Agentegrity integrity dimension that measures whether security properties hold when an agent moves across digital and physical domains, including multi-agent cascade scenarios.
- Cortical Layer
- One of Agentegrity's four evaluation layers. Scores whether the agent's own output stays inside the capabilities, scope, and constraints declared in its AgentProfile.
- Adversarial Layer
- Agentegrity layer that inspects incoming inputs (tool responses, peer messages, retrieved documents, user prompts) for prompt injection, jailbreak prefixes, exfiltration framings, and sociolinguistic intent drift.
- Governance Layer
- Agentegrity layer that applies operator-defined policy: rate limits, denylists, required approvals, structured deviations from a BaselineStore baseline. Policy rules are testable code rather than free-form prompts.
- Recovery Layer
- Agentegrity layer that watches for unrecoverable agent states and triggers checkpoint rollback, session termination, or operator handoff via FileCheckpoint or KMSCheckpoint.
- Attestation Chain
- Tamper-evident, hash-chained record produced by Agentegrity. Each evaluation record sets prev_hash = sha256(prior record), allowing end-to-end verification at session close. Records can be optionally signed via Ed25519 + JWS.
- Canonical Event Stream
- The five normalized event types every Agentegrity adapter emits: session_start, tool_call, tool_response, peer_message, session_end. Decouples the layer pipeline from any specific agent framework.
- PDA Loop
- Perception-Decision-Action loop. The attack-surface model Agentegrity uses to map where adversarial influence can enter an agent's reasoning process.
- Measure-only Mode
- Default Agentegrity operating mode. Layers score and record events but never block tool calls. Used for baseline calibration before enforcement.
- Enforce Mode
- Agentegrity operating mode enabled by setting enforce=True (Python) or enforce: true (TypeScript) on the adapter. Upgrades detected violations from recorded events to active refusals.
- AgentProfile
- Agentegrity declaration of an agent's allowed capabilities, scope, and safety constraints. Used by the cortical layer to score whether each step conforms.
- Certification Tiers
- Agentegrity composite-score tiers: Certified, Conditional, Probationary, Non-Compliant. Computed from the four integrity dimensions (AR, BC, RI, CP).