The Four Values
Core trade-offs that define hybrid intelligence teams
The Payments Feature That Nobody Could Debug
Verifiable Intention over Accelerated Production
The Violation
A startup uses AI agents to ship a new payments feature in 2 days instead of 2 weeks. The agents generate working code, tests pass, it deploys. But the intent was never formalized — no specification defines edge cases for partial refunds, currency conversions, or idempotency.
What Went Wrong
Three weeks later, a duplicate charge bug hits 2,000 customers. The team can't trace what the system was supposed to do vs. what it does, because no verifiable intention ever existed. Debugging becomes archaeology instead of verification.
The Lesson
Without verifiable intention, you can't distinguish a bug from a feature gap. The speed was real. The understanding was not.
The Overnight Refactoring Disaster
Bounded Autonomy over Maximized Automation
The Violation
A team gives their coding agent full autonomy to refactor a legacy monolith into microservices. No constraints on scope, no rollback triggers, no escalation rules. The agent refactors 47 files overnight.
What Went Wrong
The agent introduces 3 subtle data race conditions across service boundaries and breaks the deployment pipeline because it restructured the CI config "for consistency." The cost of fixing the agent's well-intentioned damage exceeds the cost of doing the refactoring manually.
The Lesson
Maximized automation without bounds creates blast radius problems. The agent did exactly what it was optimized to do — maximize throughput — but nobody defined where it should stop.
Five Smart Agents, Zero Coordination
Human-Agent Teams over Uncoordinated Intelligence
The Violation
A company deploys five specialized AI agents: one for code generation, one for testing, one for documentation, one for deployment, and one for monitoring. Each agent is excellent in isolation. But there's no shared context between them.
What Went Wrong
The code agent writes a feature, the test agent generates tests against an outdated API contract, the docs agent documents behavior that doesn't match the implementation, and the deployment agent pushes a version the monitoring agent doesn't recognize. Five intelligent systems, zero coordination.
The Lesson
Intelligence without coordination produces fragmented results. It doesn't matter whether the isolation is a solo dev with ChatGPT or five specialized agents — the failure mode is the same: uncoordinated intelligence.
The Six-Week Compliance Gap
Governance as Design over Governance as Post-Audit
The Violation
A financial services company deploys AI agents for automated code review and deployment. Governance is handled the traditional way: quarterly audits, compliance checks after the fact.
What Went Wrong
Between audits, an agent approves and deploys a change that violates data residency requirements. The violation runs in production for 6 weeks before the next audit catches it. The regulatory fine is significant, and the real cost is the remediation and reputational damage.
The Lesson
Post-audit governance assumes a pace of change slow enough for periodic reviews to catch problems. With agents deploying continuously, governance needs to be embedded in the process itself — as constraints and guardrails, not as retrospective checks.
The Twelve Principles
Concrete scenarios that bring each principle to life
The Lost Institutional Memory
Intention is the Primary Artifact
The Violation
A team uses AI agents to generate features but treats user stories as the primary artifact. The stories are vague: "As a user, I want to filter results." The agents produce working code, but the specification that guided the agent was a throwaway prompt, not a maintained artifact.
What Went Wrong
Six months later, nobody can understand why the agent implemented filtering the way it did. The prompt history is scattered across chat logs. Onboarding new team members becomes impossible. The institutional memory of why things are the way they are is lost.
The 40-View Dashboard Nobody Uses
Customer Value Drives the Loop
The Violation
A SaaS team uses agents to build a comprehensive admin dashboard with 40+ data views, sorting, filtering, and export capabilities. Every feature works, tests pass, the demo looks impressive. But they never validated whether admins actually need 40 views.
What Went Wrong
User research shows admins use exactly 3 views for 95% of their work. The remaining 37 views slow the interface, confuse new users, and create a maintenance burden. The agents delivered complete output that passed tests while solving a problem nobody had.
The Last-Minute Schema Change
Govern Change, Accelerate Execution
The Violation
A product manager requests a last-minute schema change three days before launch. The team's agent implements it in 30 minutes. The code works. Tests pass. But nobody assessed the downstream impact.
What Went Wrong
A partner integration depends on the old schema. The mobile app caches the old format for 72 hours. The data migration script doesn't handle the 2.3M existing records. The change was cheap to execute and catastrophically expensive to govern.
The Uncontracted Migration
Safe Delegation Requires Contracts
The Violation
A team delegates a database migration to an AI agent: "Migrate the user table to the new schema." No constraints on downtime, no rollback plan, no acceptance criteria beyond "it works."
What Went Wrong
The agent performs the migration during peak hours using a locking strategy that causes 45 minutes of downtime. The migration succeeds, but the definition of "success" was never contracted. The agent did what it was told. The failure was in the delegation, not the execution.
The Static Autonomy That Rotted
Autonomy is a Design Parameter
The Violation
A team treats agent autonomy as binary — autonomous for low-risk tasks, supervised for high-risk ones. They set the levels once and never recalibrate. A task initially classified as low-risk (updating documentation) evolves over time.
What Went Wrong
The agent starts modifying API documentation that external partners depend on. Nobody noticed the risk profile changed because autonomy was set once, not dynamically adjusted. A low-risk task became high-risk without anyone recalibrating the controls.
The 94% Approval Rate
Sustainable Cognitive Pace
The Violation
A startup CTO, excited by agent productivity, assigns a 4-person team to work with agents on 6 projects simultaneously. The agents generate code efficiently. Management sees the throughput and doubles the sprint scope.
What Went Wrong
The developers context-switch between reviewing agent output on 6 codebases. After 3 months, nobody can hold the full architecture of any single project in their head. The team's PR approval rate is 94% — but that's because they're rubber-stamping, not reviewing. Quality collapsed, and it collapsed silently: no visible failures, no production incidents yet, just a slow accumulation of technical debt.
The Kafka Guarantor Who Couldn't Guarantee
Humans are Guarantors
The Violation
A mid-level developer is assigned to oversee an AI agent building a real-time data pipeline in Apache Kafka. The developer has never worked with Kafka or event streaming. The agent produces technically correct code — partition strategies, exactly-once semantics, dead letter queues.
What Went Wrong
The developer approves every PR because the tests pass and the agent's explanations sound reasonable. Three months later, a partition rebalancing issue causes data loss during a deploy. The developer never understood the system they were "guaranteeing."
The Eleven-Day Silent Corruption
Failure Demands a Protocol
The Violation
A team uses an AI agent for automated dependency updates. The agent bumps a library version that introduces a subtle behavioral change — not a crash, not a test failure, just a slight difference in how floating-point rounding works in a financial calculation. There's no failure protocol: no anomaly detection, no human review trigger, no rollback threshold.
What Went Wrong
The behavioral change runs in production for 11 days before a customer reports invoice totals off by a few cents. Total impact: 43,000 miscalculated invoices. The cost was proportional to the time between failure and detection — and there was nothing in place to shorten that window.
The Three-Agent Merge Conflict
Coherence is Collective Responsibility
The Violation
Three AI agents work in parallel on different parts of the same codebase. Agent A refactors the authentication module. Agent B adds a new feature that depends on the old auth interface. Agent C updates the database schema in a way that's incompatible with Agent A's changes. Each agent's work passes its own tests in isolation.
What Went Wrong
When merged, the system breaks in three places. No one — human or agent — was responsible for system-wide coherence. Agents don't "notice" that a colleague is changing a shared dependency. The coherence problem scales with the number of concurrent agents.
The Frontend Agent That Touched Infrastructure
Trust is Calibrated, Not Granted
The Violation
A team has used an AI agent for React frontend development for 8 months with excellent results. When they need to build a Terraform infrastructure module, they give the same agent the same autonomy level, reasoning: "this agent has proven itself."
What Went Wrong
The agent produces Terraform code that looks plausible but has critical security misconfigurations: overly permissive IAM policies, publicly accessible S3 buckets, no state locking. Trust earned in React was blindly transferred to infrastructure.
The Ambiguous "Improve Performance" Task
Ambiguity is Risk
The Violation
A team gives an agent the task: "Improve the performance of the search feature." The specification doesn't define what "improve" means — latency? relevance? throughput? — and doesn't set targets or constraints.
What Went Wrong
The agent rewrites the search engine, replacing the Elasticsearch integration with a custom in-memory solution. Response times drop from 200ms to 15ms. But the new implementation can't handle the full dataset, breaks pagination, and ignores relevance scoring. The agent "improved performance" exactly as ambiguously as it was asked to — and because the output looked competent (15ms!), the team almost shipped it.
The Retro That Never Improved Anything
Continuous Mutual Learning
The Violation
A team holds retros every two weeks. They discuss what went well, including agent performance. But the output is always the same: "the agent made errors on X, Y, Z — let's be more careful next time." They never trace errors back to ambiguities in their specs. They never update specification templates. They never adjust delegation boundaries.
What Went Wrong
After 6 months, they're making the same delegation mistakes they made on day one. The humans didn't learn to write better specs, and the team's effective use of agents never improved. Both sides stagnated because reflection never ran in both directions.