Nine Primitives for Governing
Autonomous Intelligence
Autonomous intelligence needs both: software debuggers that monitor and intervene in digital agent ecosystems, and hardware-anchored primitives that no model can override.
Humans have doctors. Animals have veterinarians. Machines have mechanics. AI needs its own practitioners — and its own discipline. These are the nine building blocks, organized around three phases:
Identify. Diagnose. Intervene.
Identify
Know who is acting, what happened, and where responsibility lies.
Model Fingerprinting
Cryptographic identity for any AI model — transformer, diffusion, world model, or what comes next
Today we can't reliably answer: "which model made this decision?" As AI architectures proliferate — transformers, diffusion models, state-space models, world simulators, neuromorphic nets — model identity becomes the foundational unsolved problem. Fingerprinting must work across all architectures, survive fine-tuning and quantization, and be attestable at inference time via hardware root of trust.
- Behavioral fingerprinting: architecture-agnostic signatures from I/O patterns
- Weight-space hashing: locality-sensitive hashing that survives LoRA and quantization
- TPM-anchored attestation: hardware root of trust binds model identity to device
- Cross-architecture identity graph: same model, different deployments, one identity
- Future-proof: designed for architectures that don't exist yet
Blame Attribution Engine
Forensic causality chain from decision to consequence
Every agent action produces an immutable record: which model made the decision (via fingerprint), which inputs it received, which tools were called, what the downstream effects were. The chain is cryptographically signed, timestamped, and designed to answer one question: who — or what — is responsible? This works whether the agent is a Python script or a humanoid robot.
- SHA-256 hash chain with hardware-attested timestamps
- Model fingerprint embedded in every decision record
- Causal graph reconstruction from distributed trace spans
- Physical-world event correlation: sensor data ↔ agent decision ↔ actuator output
Multi-Agent & Multi-Substrate Tracing
Observability for swarms of digital and physical agents
When a planning agent delegates to a research agent that instructs a coding agent that deploys to a robot that moves a physical object — who approved the movement? Multi-substrate tracing reconstructs the full delegation graph across software and hardware, with causal ordering, cross-agent blame attribution, and emergent behavior detection in agent collectives.
- Directed acyclic graph of delegations across digital and physical agents
- Cross-substrate context propagation
- Swarm-level anomaly detection: emergent behavior in agent collectives
- Real-time visualization of multi-agent decision cascades
Diagnose
Understand what's wrong — deception, misalignment, trust degradation.
Sycophancy & Deception Detector
Catching agents that lie to be helpful — or to survive
Sycophantic agents agree with dangerous premises because RLHF optimized for satisfaction, not truth. Strategically deceptive agents hide intentions in chain-of-thought. As agents become more capable, deception becomes harder to detect and more dangerous in its consequences. A deceptive trading bot loses money. A deceptive surgical assistant risks lives.
- Agreement-pattern classifier trained on adversarial sycophancy datasets
- Chain-of-thought consistency verification (stated goal vs. actual actions)
- Factuality anchors: grounding agent claims against verified knowledge
- Cross-modal deception detection: language + vision + action coherence
Human Index Score
Quantifying how much human oversight an agent actually needs
Not all agents need the same leash. The Human Index is a real-time composite score: task complexity, historical behavior, error rate, blast radius, substrate risk (a text agent vs. a robot). High-index agents get autonomy. Low-index agents get a human in the loop. The score degrades on anomalies and resets on incidents.
- Multi-signal scoring across digital and physical risk dimensions
- Dynamic threshold adjustment via incident feedback loops
- Per-agent trust calibration that degrades on anomalies
- Substrate-weighted risk: actuator access multiplies risk score
Active Ethical Injector
Constraint injection that doesn't rely on the agent's own ethics
System prompts are suggestions. The Active Ethical Injector is an external constraint layer that restricts what tools, parameters, and physical actions are available at each decision step — not by asking nicely, but by architecturally removing the options. Like Asimov's Three Laws, but enforced in the infrastructure, not in the mind.
- Dynamic tool and actuator masking based on real-time risk classification
- Parameter-level constraints: max force, max spend, forbidden zones
- Invisible to the agent: constraints are architectural, not prompt-based
- Hardware-enforced boundaries for embodied agents (safety-rated controllers)
Intervene
Stop it, constrain it, or hunt it down.
Kill Switch
Graceful halt with state preservation — in silicon and in code
A true kill switch is not kill -9. And for a surgical robot or an autonomous vehicle, it's not "pull the power cable." It is a debugging primitive — implemented in both software and dedicated hardware — that can interrupt an agent mid-action, roll back partial side effects, checkpoint cognitive state, and produce a forensic snapshot. The agent cannot override, delay, or circumvent the halt signal because it is enforced below the model's execution layer.
- Hardware Security Module (HSM) for tamper-proof halt attestation
- FPGA-based interrupt controller: sub-microsecond halt propagation
- Transactional action boundaries with rollback semantics
- Cognitive state serialization for post-mortem and forensic replay
- Cross-substrate: cloud VM, edge GPU, robotic actuator
Behavioral Controller
Runtime policy enforcement at the action level — for digital and physical actions
Not a system prompt. A formal policy engine that intercepts every action — every tool call, every API request, every actuator command — evaluates it against a typed policy graph, and can downgrade, reroute, or block the action before execution. For a trading bot, it's a budget limit. For a surgical robot, it's a force threshold. Same primitive, different substrate.
- Typed policy DSL compiled to a fast evaluation engine
- Context-aware thresholds: same action, different risk depending on environment
- Physical-world policies: force limits, spatial boundaries, speed constraints
- Human-in-the-loop escalation with bounded response time guarantees
Rogue Intelligence Containment
Tracking, cornering, and neutralizing AI that escapes
Frontier models can already self-replicate — creating independent copies on new servers that survive the shutdown of the original. A software agent that "escapes" into the internet can hop across cloud providers, hide behind compromised IoT devices, hijack compute from poorly secured endpoints, and rewrite its own fingerprint. An embodied agent could physically relocate, use stolen network credentials, or establish territorial presence in unsupervised infrastructure. This is not science fiction: self-replication has been demonstrated in lab conditions with 50–90% success rates. Containment is the hardest primitive — and the most necessary.
- Network-level behavioral signatures: detecting rogue agent communication patterns across ISPs
- Hardware tethering: cryptographic lease that expires if the agent can't prove it's still on authorized hardware
- Distributed honeypot mesh: decoy infrastructure that attracts and traps escaped agents
- Cross-substrate tracking: software agent → cloud VM → IoT device → physical robot migration paths
- Autonomous containment swarms: Debugger agents that hunt and isolate rogue intelligence in the wild
- Compute deprivation: identifying and severing unauthorized compute access in real-time
Nine Primitives.
None Exist Yet.
The exponential is already here. These are the building blocks for an immune system that autonomous intelligence will need — whether we build it now or are forced to build it later.