The Conductor Problem

March 2026 · Essay

You have a Code Reviewer, a Security Auditor, a QA Engineer, a Design Auditor, and an Incident Commander. A pull request lands. Who activates?

If it touches authentication code, the Security Auditor should go first. If it is a CSS-only change, the Design Auditor. If it modifies a database migration, maybe the Code Reviewer and Security Auditor in parallel. If it is a hotfix during an outage, the Incident Commander overrides everything.

This is the conductor problem: in a system with multiple specialized constructs, something has to decide which ones activate, in what order, with what priority. That something is the hardest part of multi-agent systems to get right.

Three Approaches

The industry is converging on three orchestration patterns, each with distinct trade-offs.

The human conductor. The simplest and most common approach today. The human types a slash command: /review, /qa, /plan-ceo-review. Garry Tan's gstack works this way — twelve constructs, each invoked explicitly. The human is the orchestrator. This works at small scale because the human has context that no system has: what matters right now, what the politics are, what shipped last week and broke.

The deterministic conductor. A pre-defined workflow that routes tasks to constructs based on rules. “When a PR is opened, run Code Reviewer. If it touches files in /auth, also run Security Auditor. If tests fail, run Debugger.” This is how CI/CD pipelines work. The routing is explicit, predictable, and auditable. But it is brittle — every new case needs a new rule. Microsoft's Semantic Kernel patterns and traditional workflow engines like Temporal take this approach.

The AI conductor. A meta-agent that decides which specialist agents to activate based on the situation. Anthropic's multi-agent research systemuses this pattern: a lead agent analyzes the task, develops a strategy, and spawns specialist subagents in parallel. The lead maintains oversight and synthesizes results. This is the most flexible approach but raises a recursive question: who writes the conductor's construct?

The Recursive Question

If the conductor is itself an AI agent, it needs a construct. That construct defines how it decides which other constructs to invoke. But who defines those decision rules? And how do you test them?

A conductor construct for an engineering team might say: “For any code change, always run Code Reviewer. If the change touches security-sensitive paths, add Security Auditor. If a production incident is active, escalate to Incident Commander and defer all non-critical reviews.” These are reasonable rules. But they encode assumptions about what matters, and those assumptions are themselves a form of expertise.

The conductor is not a neutral router. It is an opinionated system that embodies an organization's priorities. “Security first” produces a different conductor than “ship fast.” “Customer experience above all” routes differently than “technical excellence above all.”

This makes the conductor construct the most important construct in any fleet — and the one most likely to be written poorly, because it requires not domain expertise but organizational judgment. Knowing when to prioritize security over speed. Knowing when a design review can wait and when it cannot. Knowing which exceptions to the process are acceptable and which are reckless.

Why This Is Hard

Multi-agent orchestration looks simple on a whiteboard. Draw boxes, draw arrows. But the hard problems are in the gaps:

Context handoff. When the Code Reviewer finishes and the Security Auditor starts, what does the Security Auditor know? Does it re-read the entire diff? Does it receive a summary from the Code Reviewer? Does it see the Code Reviewer's findings? Each choice has cost and quality implications. Anthropic found that their multi-agent system outperformed single-agent approaches by 90% — largely because distributing work across agents with separate context windows enabled parallel reasoning that one context window could not achieve.

Conflict resolution. The Code Reviewer says “this abstraction is premature, inline it.” The Security Auditor says “this logic must be centralized for audit purposes.” They are both right. They are in direct conflict. Who wins? The conductor has to resolve this, and resolution requires understanding the relative weight of each concern in this specific context.

Termination. When is the process done? The QA Engineer finds a bug. The Debugger fixes it. The QA Engineer re-tests and finds a regression introduced by the fix. The Debugger fixes the regression. At what point does the conductor say “good enough, ship it”? Perfectionism in an automated system is a loop that never terminates.

Cost. Every agent invocation costs tokens. Running five constructs on every pull request might cost $2-5 per PR. At 100 PRs per week, that is $200-500 per week in API costs. The conductor has to make economic decisions: is this PR important enough to warrant a full review? Or is it a README typo that needs only a cursory check?

The Emerging Answer

The most pragmatic approach emerging in 2026 is a hybrid: human intent, deterministic routing, AI judgment at the edges.

The human decides what to do: “review this PR,” “ship this feature,” “investigate this incident.” A deterministic workflow maps the intent to a sequence of constructs. And within each construct, the AI exercises judgment about how to execute.

The human is not choosing which construct to run. The human is choosing the goal, and the system figures out which constructs serve that goal. But the mapping from goal to constructs is explicit and auditable, not a black box.

This is how an orchestra works. The conductor does not play the instruments. The conductor does not decide what piece to perform — the program does. The conductor interprets: tempo, dynamics, balance. Faster here. Softer there. Bring out the cellos in this passage.

The AI conductor does the same thing with constructs. The program (workflow) determines the sequence. The conductor determines the interpretation — how much scrutiny, how much parallelism, how much to invest in this particular task. The instruments (constructs) do the work.

Great conductor constructs do not really exist yet. Most multi-agent systems in 2026 are either fully human-directed or fully deterministic. The art of writing a conductor construct — one that makes good organizational judgment calls about when to escalate, when to skip, when to run things in parallel, and when to stop — is a craft that barely exists.

It may be the most important construct anyone writes.

Related: The Soul Stack covers how individual constructs compose. The Constructs That Run a Company maps which constructs a company needs at each stage. When Constructs Disagree explores the conflict resolution problem in depth.

Explore Constructs