Manifesto for Hybrid Intelligence — Real-World Scenarios

Value

The Payments Feature That Nobody Could Debug

Verifiable Intention over Accelerated Production

The Violation

A startup uses AI agents to ship a new payments feature in 2 days instead of 2 weeks. The agents generate working code, tests pass, it deploys. But the intent was never formalized — no specification defines edge cases for partial refunds, currency conversions, or idempotency.

What Went Wrong

Three weeks later, a duplicate charge bug hits 2,000 customers. The team can't trace what the system was supposed to do vs. what it does, because no verifiable intention ever existed. Debugging becomes archaeology instead of verification.

The Lesson

Without verifiable intention, you can't distinguish a bug from a feature gap. The speed was real. The understanding was not.

Value

The Overnight Refactoring Disaster

Bounded Autonomy over Maximized Automation

The Violation

A team gives their coding agent full autonomy to refactor a legacy monolith into microservices. No constraints on scope, no rollback triggers, no escalation rules. The agent refactors 47 files overnight.

What Went Wrong

The agent introduces 3 subtle data race conditions across service boundaries and breaks the deployment pipeline because it restructured the CI config "for consistency." The cost of fixing the agent's well-intentioned damage exceeds the cost of doing the refactoring manually.

The Lesson

Maximized automation without bounds creates blast radius problems. The agent did exactly what it was optimized to do — maximize throughput — but nobody defined where it should stop.

Value

Five Smart Agents, Zero Coordination

Human-Agent Teams over Uncoordinated Intelligence

The Violation

A company deploys five specialized AI agents: one for code generation, one for testing, one for documentation, one for deployment, and one for monitoring. Each agent is excellent in isolation. But there's no shared context between them.

What Went Wrong

The code agent writes a feature, the test agent generates tests against an outdated API contract, the docs agent documents behavior that doesn't match the implementation, and the deployment agent pushes a version the monitoring agent doesn't recognize. Five intelligent systems, zero coordination.

The Lesson

Intelligence without coordination produces fragmented results. It doesn't matter whether the isolation is a solo dev with ChatGPT or five specialized agents — the failure mode is the same: uncoordinated intelligence.

Value

The Six-Week Compliance Gap

Governance as Design over Governance as Post-Audit

The Violation

A financial services company deploys AI agents for automated code review and deployment. Governance is handled the traditional way: quarterly audits, compliance checks after the fact.

What Went Wrong

Between audits, an agent approves and deploys a change that violates data residency requirements. The violation runs in production for 6 weeks before the next audit catches it. The regulatory fine is significant, and the real cost is the remediation and reputational damage.

The Lesson

Post-audit governance assumes a pace of change slow enough for periodic reviews to catch problems. With agents deploying continuously, governance needs to be embedded in the process itself — as constraints and guardrails, not as retrospective checks.

Principle 01

The Lost Institutional Memory

Intention is the Primary Artifact

The Violation

A team uses AI agents to generate features but treats user stories as the primary artifact. The stories are vague: "As a user, I want to filter results." The agents produce working code, but the specification that guided the agent was a throwaway prompt, not a maintained artifact.

What Went Wrong

Six months later, nobody can understand why the agent implemented filtering the way it did. The prompt history is scattered across chat logs. Onboarding new team members becomes impossible. The institutional memory of why things are the way they are is lost.

The violation test: "We don't maintain specs, we just prompt the agent and ship."

Principle 02

The 40-View Dashboard Nobody Uses

Customer Value Drives the Loop

The Violation

A SaaS team uses agents to build a comprehensive admin dashboard with 40+ data views, sorting, filtering, and export capabilities. Every feature works, tests pass, the demo looks impressive. But they never validated whether admins actually need 40 views.

What Went Wrong

User research shows admins use exactly 3 views for 95% of their work. The remaining 37 views slow the interface, confuse new users, and create a maintenance burden. The agents delivered complete output that passed tests while solving a problem nobody had.

The principle says: "The iteration loop must be anchored to validated human outcomes, not to agent throughput."

Principle 03

The Last-Minute Schema Change

Govern Change, Accelerate Execution

The Violation

A product manager requests a last-minute schema change three days before launch. The team's agent implements it in 30 minutes. The code works. Tests pass. But nobody assessed the downstream impact.

What Went Wrong

A partner integration depends on the old schema. The mobile app caches the old format for 72 hours. The data migration script doesn't handle the 2.3M existing records. The change was cheap to execute and catastrophically expensive to govern.

The principle says: "The agent implements the change, the human decides whether the system can absorb it."

Principle 04

The Uncontracted Migration

Safe Delegation Requires Contracts

The Violation

A team delegates a database migration to an AI agent: "Migrate the user table to the new schema." No constraints on downtime, no rollback plan, no acceptance criteria beyond "it works."

What Went Wrong

The agent performs the migration during peak hours using a locking strategy that causes 45 minutes of downtime. The migration succeeds, but the definition of "success" was never contracted. The agent did what it was told. The failure was in the delegation, not the execution.

The violation test: "We give agents tasks without constraints or acceptance criteria."

Principle 05

The Static Autonomy That Rotted

Autonomy is a Design Parameter

The Violation

A team treats agent autonomy as binary — autonomous for low-risk tasks, supervised for high-risk ones. They set the levels once and never recalibrate. A task initially classified as low-risk (updating documentation) evolves over time.

What Went Wrong

The agent starts modifying API documentation that external partners depend on. Nobody noticed the risk profile changed because autonomy was set once, not dynamically adjusted. A low-risk task became high-risk without anyone recalibrating the controls.

The violation test: "We set autonomy levels once and never revisit them."

Principle 06

The 94% Approval Rate

Sustainable Cognitive Pace

The Violation

A startup CTO, excited by agent productivity, assigns a 4-person team to work with agents on 6 projects simultaneously. The agents generate code efficiently. Management sees the throughput and doubles the sprint scope.

What Went Wrong

The developers context-switch between reviewing agent output on 6 codebases. After 3 months, nobody can hold the full architecture of any single project in their head. The team's PR approval rate is 94% — but that's because they're rubber-stamping, not reviewing. Quality collapsed, and it collapsed silently: no visible failures, no production incidents yet, just a slow accumulation of technical debt.

The principle says: "When output exceeds understanding, quality collapses silently."

Principle 07

The Kafka Guarantor Who Couldn't Guarantee

Humans are Guarantors

The Violation

A mid-level developer is assigned to oversee an AI agent building a real-time data pipeline in Apache Kafka. The developer has never worked with Kafka or event streaming. The agent produces technically correct code — partition strategies, exactly-once semantics, dead letter queues.

What Went Wrong

The developer approves every PR because the tests pass and the agent's explanations sound reasonable. Three months later, a partition rebalancing issue causes data loss during a deploy. The developer never understood the system they were "guaranteeing."

The principle says: "A human who cannot understand the agent's output cannot meaningfully guarantee it."

Principle 08

The Eleven-Day Silent Corruption

Failure Demands a Protocol

The Violation

A team uses an AI agent for automated dependency updates. The agent bumps a library version that introduces a subtle behavioral change — not a crash, not a test failure, just a slight difference in how floating-point rounding works in a financial calculation. There's no failure protocol: no anomaly detection, no human review trigger, no rollback threshold.

What Went Wrong

The behavioral change runs in production for 11 days before a customer reports invoice totals off by a few cents. Total impact: 43,000 miscalculated invoices. The cost was proportional to the time between failure and detection — and there was nothing in place to shorten that window.

The principle says: "An undetected agent failure is more dangerous than a visible human mistake."

Principle 09

The Three-Agent Merge Conflict

Coherence is Collective Responsibility

The Violation

Three AI agents work in parallel on different parts of the same codebase. Agent A refactors the authentication module. Agent B adds a new feature that depends on the old auth interface. Agent C updates the database schema in a way that's incompatible with Agent A's changes. Each agent's work passes its own tests in isolation.

What Went Wrong

When merged, the system breaks in three places. No one — human or agent — was responsible for system-wide coherence. Agents don't "notice" that a colleague is changing a shared dependency. The coherence problem scales with the number of concurrent agents.

The violation test: "We let multiple agents work in parallel without coherence checks."

Principle 10

The Frontend Agent That Touched Infrastructure

Trust is Calibrated, Not Granted

The Violation

A team has used an AI agent for React frontend development for 8 months with excellent results. When they need to build a Terraform infrastructure module, they give the same agent the same autonomy level, reasoning: "this agent has proven itself."

What Went Wrong

The agent produces Terraform code that looks plausible but has critical security misconfigurations: overly permissive IAM policies, publicly accessible S3 buckets, no state locking. Trust earned in React was blindly transferred to infrastructure.

The principle says: "An agent trusted in one domain has earned nothing in another."

Principle 11

The Ambiguous "Improve Performance" Task

Ambiguity is Risk

The Violation

A team gives an agent the task: "Improve the performance of the search feature." The specification doesn't define what "improve" means — latency? relevance? throughput? — and doesn't set targets or constraints.

What Went Wrong

The agent rewrites the search engine, replacing the Elasticsearch integration with a custom in-memory solution. Response times drop from 200ms to 15ms. But the new implementation can't handle the full dataset, breaks pagination, and ignores relevance scoring. The agent "improved performance" exactly as ambiguously as it was asked to — and because the output looked competent (15ms!), the team almost shipped it.

The principle says: "What an agent does not understand, it invents with apparent coherence."

Principle 12

The Retro That Never Improved Anything

Continuous Mutual Learning

The Violation

A team holds retros every two weeks. They discuss what went well, including agent performance. But the output is always the same: "the agent made errors on X, Y, Z — let's be more careful next time." They never trace errors back to ambiguities in their specs. They never update specification templates. They never adjust delegation boundaries.

What Went Wrong

After 6 months, they're making the same delegation mistakes they made on day one. The humans didn't learn to write better specs, and the team's effective use of agents never improved. Both sides stagnated because reflection never ran in both directions.

The principle says: "The team and its agents co-evolve."

Real-World Scenarios

The Four Values

The Payments Feature That Nobody Could Debug

The Overnight Refactoring Disaster

Five Smart Agents, Zero Coordination

The Six-Week Compliance Gap

The Twelve Principles

The Lost Institutional Memory

The 40-View Dashboard Nobody Uses

The Last-Minute Schema Change

The Uncontracted Migration

The Static Autonomy That Rotted

The 94% Approval Rate

The Kafka Guarantor Who Couldn't Guarantee

The Eleven-Day Silent Corruption

The Three-Agent Merge Conflict

The Frontend Agent That Touched Infrastructure

The Ambiguous "Improve Performance" Task

The Retro That Never Improved Anything