π€π€ Multi-Agent Orchestration with MCP: Spawn, Delegate, and Aggregate
One agent cannot do everything well β build an orchestrator that decomposes tasks, spawns specialist sub-agents, runs them in parallel, and synthesises their results into a single coherent answer
Hi π, I'm Tushar Patil. Currently I am working as Frontend Developer (Angular) and also have expertise with .Net Core and Framework.
This is Part 14 of the AI Engineering with TypeScript series.
Prerequisites: Part 3 β AI Agent Β· Part 8 β MCP Client SDK Β· Part 13 β Real-Time Agents
Stack: Node.js 20+ Β· TypeScript 5.x Β· @anthropic-ai/sdk Β· @modelcontextprotocol/sdk v1.x Β· Zod
πΊοΈ What we'll cover
Throughout this series the agent has always been a single Claude instance β one model, one tool loop, one answer. That works beautifully for focused tasks. But complex real-world requests often span multiple domains simultaneously:
"Research what our internal docs say about deployment, check the current weather in the target region, summarise the findings, and write the result back to the knowledge base."
A single agent tackles these sequentially. A multi-agent system runs them in parallel β three specialists working simultaneously, an orchestrator collecting their results and synthesising a final answer. For a four-step task this can reduce wall-clock time by 60β70%. π
In Part 14 we build exactly this. The key insight is that sub-agents are just MCP tool calls from the orchestrator's perspective. The orchestrator calls a spawn_agent tool, the tool runs a full agent loop internally, and returns a string result β just like any other tool. The orchestrator never needs to know about Claude API calls, streaming, or message history inside the sub-agent.
By the end you will have:
- π A
SubAgentRunnerβ a self-contained class that runs a full agent loop and returns a typed result - π§ A
spawn_agentMCP tool β the interface the orchestrator uses to delegate tasks - π€ An
OrchestratorAgentβ the top-level Claude instance that decomposes tasks, callsspawn_agentin parallel, and synthesises results - β‘ Parallel execution β multiple sub-agents running simultaneously with
Promise.allSettled - π‘οΈ Failure isolation β one sub-agent failing never crashes the orchestrator
- π Result aggregation β structured output from each sub-agent fed back to the orchestrator as context
- π§ͺ A test harness for multi-agent systems using
InMemoryTransport
π§ Part 1: The Architecture β Why This Pattern Works
The orchestrator pattern maps naturally onto MCP's tool abstraction:
User prompt
β
βΌ
Orchestrator (Claude instance)
β decides to call spawn_agent three times
β
ββββ spawn_agent({ role: "researcher", task: "find docs on deployment" })
β ββββ Sub-agent A runs independently
β ββββ calls search_knowledge_base
β ββββ returns findings
β
ββββ spawn_agent({ role: "weather-analyst", task: "check Mumbai weather" })
β ββββ Sub-agent B runs independently
β ββββ calls get_current_weather + get_forecast
β ββββ returns weather summary
β
ββββ spawn_agent({ role: "writer", task: "draft the final report" })
ββββ Sub-agent C runs after A and B complete
ββββ receives A and B results as context
ββββ calls index_document
ββββ returns confirmation
The orchestrator sees only tool results β clean strings. It has no visibility into how many API calls the sub-agent made, how long it took internally, or which tools it used. This separation means you can improve any sub-agent independently without touching the orchestrator. π―
π Part 2: The SubAgentRunner
The runner encapsulates a complete agent loop. It takes a task description, a set of available tools, and a system prompt defining the agent's role. It runs until the model stops requesting tool calls and returns the final text output:
// src/agents/sub-agent-runner.ts
import Anthropic from "@anthropic-ai/sdk";
import type { McpClientWrapper } from "@techtush/mcp-client";
export interface SubAgentConfig {
role: string;
systemPrompt: string;
task: string;
context?: string; // results from sibling agents to use as context
maxToolCalls?: number; // safety cap β prevents runaway loops
timeoutMs?: number;
}
export interface SubAgentResult {
role: string;
task: string;
output: string;
toolCallCount: number;
durationMs: number;
success: boolean;
error?: string;
}
const anthropic = new Anthropic();
export class SubAgentRunner {
constructor(private readonly client: McpClientWrapper) {}
async run(config: SubAgentConfig): Promise<SubAgentResult> {
const start = Date.now();
const maxToolCalls = config.maxToolCalls ?? 10;
const tools = this.client.getTools().map((t) => ({
name: t.name,
description: t.description ?? "",
input_schema: t.inputSchema as Anthropic.Tool["input_schema"],
}));
const userContent = config.context
? `Context from other agents:\n\({config.context}\n\nYour task: \){config.task}`
: config.task;
const messages: Anthropic.MessageParam[] = [
{ role: "user", content: userContent },
];
let toolCallCount = 0;
try {
while (true) {
if (toolCallCount >= maxToolCalls) {
throw new Error(
`Sub-agent "\({config.role}" exceeded max tool calls (\){maxToolCalls})`
);
}
const response = await Promise.race([
anthropic.messages.create({
model: "claude-sonnet-4-20250514",
max_tokens: 2048,
tools,
messages,
system: config.systemPrompt,
}),
this.timeout(config.timeoutMs ?? 60_000, config.role),
]);
messages.push({ role: "assistant", content: response.content });
if (response.stop_reason === "end_turn") {
const output = response.content
.filter((b): b is Anthropic.TextBlock => b.type === "text")
.map((b) => b.text)
.join("\n");
return {
role: config.role,
task: config.task,
output,
toolCallCount,
durationMs: Date.now() - start,
success: true,
};
}
if (response.stop_reason === "tool_use") {
const toolResults: Anthropic.ToolResultBlockParam[] = [];
for (const block of response.content) {
if (block.type !== "tool_use") continue;
toolCallCount++;
const resultText = await this.callTool(block.name, block.input as Record<string, unknown>);
toolResults.push({
type: "tool_result",
tool_use_id: block.id,
content: resultText,
});
}
messages.push({ role: "user", content: toolResults });
}
}
} catch (err) {
return {
role: config.role,
task: config.task,
output: "",
toolCallCount,
durationMs: Date.now() - start,
success: false,
error: err instanceof Error ? err.message : String(err),
};
}
}
private async callTool(name: string, input: Record<string, unknown>): Promise<string> {
try {
const result = await this.client["raw"].callTool({ name, arguments: input });
const text = result.content
.filter((c) => c.type === "text")
.map((c) => (c as { type: "text"; text: string }).text)
.join("\n");
return result.isError ? `Error: ${text}` : text;
} catch (err) {
return `Tool \({name} failed: \){err instanceof Error ? err.message : String(err)}`;
}
}
private timeout(ms: number, role: string): Promise<never> {
return new Promise((_, reject) =>
setTimeout(() => reject(new Error(`Sub-agent "\({role}" timed out after \){ms}ms`)), ms)
);
}
}
Two safety mechanisms deserve attention. The maxToolCalls cap prevents a runaway sub-agent from calling tools in an infinite loop β you always want a hard ceiling. The Promise.race against a timeout means a hung sub-agent cannot block the orchestrator forever. Both are essential in a production multi-agent system. π‘οΈ
π§ Part 3: The spawn_agent MCP Tool
Now expose sub-agent execution as an MCP tool that the orchestrator can call:
// src/tools/spawn-agent.ts
import { z } from "zod";
import type { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import type { McpClientWrapper } from "@techtush/mcp-client";
import { SubAgentRunner } from "../agents/sub-agent-runner.js";
const SPECIALIST_PROMPTS: Record<string, string> = {
researcher:
"You are a research specialist. Your only job is to search the knowledge base thoroughly and return a comprehensive summary of relevant findings. Always call search_knowledge_base at least once before answering.",
"weather-analyst":
"You are a weather analysis specialist. Check current conditions and forecast, then summarise weather impact on the user's plans. Always use both get_current_weather and get_forecast.",
writer:
"You are a technical writer. Given context from other agents, synthesise a clear, structured document and store it using index_document. Return a confirmation with the document summary.",
analyst:
"You are a data analyst. Given findings from other agents, identify patterns, risks, and recommendations. Return a structured analysis.",
};
export function registerSpawnAgentTool(server: McpServer, client: McpClientWrapper): void {
const runner = new SubAgentRunner(client);
server.tool(
"spawn_agent",
"Spawn a specialist sub-agent to handle a specific part of a complex task. The sub-agent runs independently and returns its result as a string. Use this to parallelise work across specialist roles.",
{
role: z
.enum(["researcher", "weather-analyst", "writer", "analyst"])
.describe("The specialist role for this sub-agent"),
task: z
.string()
.min(10)
.describe("Clear, specific task description for the sub-agent"),
context: z
.string()
.optional()
.describe("Optional context from other agents to pass to this sub-agent"),
timeout_seconds: z
.number()
.int()
.min(10)
.max(120)
.default(60)
.describe("Maximum seconds to wait for this sub-agent"),
},
async (args) => {
const systemPrompt = SPECIALIST_PROMPTS[args.role];
const result = await runner.run({
role: args.role,
systemPrompt,
task: args.task,
context: args.context,
timeoutMs: args.timeout_seconds * 1000,
});
if (!result.success) {
return {
isError: true,
content: [
{
type: "text",
text: `Sub-agent "\({args.role}" failed: \){result.error ?? "unknown error"}`,
},
],
};
}
const summary = [
`[Sub-agent: ${result.role}]`,
`Task: ${result.task}`,
`Duration: \({result.durationMs}ms (\){result.toolCallCount} tool calls)`,
``,
result.output,
].join("\n");
return {
content: [{ type: "text", text: summary }],
};
}
);
}
The tool returns a structured string that includes the sub-agent's role, task, duration, and output. When the orchestrator receives this, it has full context about which specialist produced which result β essential for synthesis. β
π€ Part 4: The Orchestrator Agent
The orchestrator is the top-level Claude instance. It receives the user's complex request, decides how to decompose it into specialist tasks, calls spawn_agent for each, and synthesises the collected results:
// src/agents/orchestrator.ts
import Anthropic from "@anthropic-ai/sdk";
import type { McpClientWrapper } from "@techtush/mcp-client";
const anthropic = new Anthropic();
const ORCHESTRATOR_SYSTEM = `You are an orchestration agent. Your job is to:
1. Decompose complex tasks into specialist sub-tasks
2. Call spawn_agent for each sub-task β run independent tasks IN PARALLEL by calling spawn_agent multiple times before reading results
3. Pass relevant results as context when one agent depends on another's output
4. Synthesise all results into a final coherent answer
Available specialist roles:
- researcher: searches the knowledge base
- weather-analyst: fetches and analyses weather data
- writer: synthesises findings into a document and indexes it
- analyst: analyses data and produces recommendations
Always explain your decomposition strategy before spawning agents.`;
export async function runOrchestrator(
userQuery: string,
client: McpClientWrapper,
onToken?: (text: string) => void
): Promise<string> {
const tools = client.getTools().map((t) => ({
name: t.name,
description: t.description ?? "",
input_schema: t.inputSchema as Anthropic.Tool["input_schema"],
}));
const messages: Anthropic.MessageParam[] = [
{ role: "user", content: userQuery },
];
let finalAnswer = "";
while (true) {
// Use streaming so the user sees orchestrator reasoning in real time (Part 4 pattern)
const stream = await anthropic.messages.stream({
model: "claude-sonnet-4-20250514",
max_tokens: 4096,
tools,
messages,
system: ORCHESTRATOR_SYSTEM,
});
for await (const event of stream) {
if (
event.type === "content_block_delta" &&
event.delta.type === "text_delta" &&
onToken
) {
onToken(event.delta.text);
}
}
const response = await stream.finalMessage();
messages.push({ role: "assistant", content: response.content });
if (response.stop_reason === "end_turn") {
finalAnswer = response.content
.filter((b): b is Anthropic.TextBlock => b.type === "text")
.map((b) => b.text)
.join("\n");
break;
}
if (response.stop_reason === "tool_use") {
// Collect ALL tool calls from this response
const toolBlocks = response.content.filter((b) => b.type === "tool_use");
// Run ALL tool calls in parallel β this is where the speedup comes from
const toolResults = await Promise.allSettled(
toolBlocks.map(async (block) => {
if (block.type !== "tool_use") return null;
const result = await client["raw"].callTool({
name: block.name,
arguments: block.input as Record<string, unknown>,
});
const text = result.content
.filter((c) => c.type === "text")
.map((c) => (c as { type: "text"; text: string }).text)
.join("\n");
return {
type: "tool_result" as const,
tool_use_id: block.id,
content: result.isError ? `Error: ${text}` : text,
};
})
);
// Assemble results β failed calls get an error message, not a throw
const resultBlocks: Anthropic.ToolResultBlockParam[] = toolResults
.map((r, i) => {
const block = toolBlocks[i];
if (block.type !== "tool_use") return null;
if (r.status === "fulfilled" && r.value) {
return r.value;
}
return {
type: "tool_result" as const,
tool_use_id: block.id,
content: `Tool call failed: ${r.status === "rejected" ? r.reason : "unknown error"}`,
};
})
.filter(Boolean) as Anthropic.ToolResultBlockParam[];
messages.push({ role: "user", content: resultBlocks });
}
}
return finalAnswer;
}
The critical line is await Promise.allSettled(toolBlocks.map(...)). When the orchestrator calls spawn_agent three times in one response, all three sub-agents run simultaneously. Promise.allSettled (not Promise.all) means one failing sub-agent never rejects the entire batch β you get partial results and the orchestrator synthesises what it has. π‘οΈ
β‘ Part 5: Parallel vs Sequential β The Real Performance Impact
Let's make the speedup concrete. Without parallelism:
Task A: research docs β 3.2s
Task B: check weather β 1.8s
Task C: write report (needs A+B) β 2.1s
βββββββββββββββββββββββββββββββββββββ
Sequential total β 7.1s
With parallel orchestration:
Task A + Task B run simultaneously β max(3.2, 1.8) = 3.2s
Task C runs after A+B complete β 2.1s
βββββββββββββββββββββββββββββββββββββ
Parallel total β 5.3s (25% faster)
For longer tasks or more sub-agents the gains compound. The orchestrator also avoids waiting for sequential reasoning β while sub-agent A is calling the vector DB, sub-agent B is hitting the weather API. The wall-clock time is bounded by the slowest parallel batch, not the sum of all steps. β‘
The orchestrator system prompt includes a critical instruction: "run independent tasks IN PARALLEL by calling spawn_agent multiple times before reading results." Without this nudge, Claude tends to call one tool, wait for the result, then decide to call the next β sequential by default. The explicit instruction shifts it toward batch spawning. π‘
π¬ Part 6: A Real Orchestration Run
Let's trace a real end-to-end run with this query:
"Research what our internal docs say about Redis session TTL,
check the current weather in Pune, and write a combined
summary document to the knowledge base."
Terminal output with streaming:
π€ Orchestrator: I'll decompose this into three specialist tasks and
run the research and weather analysis in parallel before writing.
π§ spawn_agent({ role: "researcher", task: "Find all information about Redis session TTL..." })
π§ spawn_agent({ role: "weather-analyst", task: "Get current weather in Pune, IN..." })
[both sub-agents running simultaneously...]
β
researcher (3241ms, 2 tool calls):
[Sub-agent: researcher]
Found: Sessions use 30-minute sliding TTL via Redis set_config...
β
weather-analyst (1893ms, 2 tool calls):
[Sub-agent: weather-analyst]
Pune: 31Β°C, partly cloudy. Weekend forecast: light rain Saturday...
π§ spawn_agent({
role: "writer",
task: "Write a combined summary...",
context: "[researcher output]\n[weather-analyst output]"
})
β
writer (2108ms, 1 tool call):
[Sub-agent: writer]
Document indexed as "session-ttl-pune-weather-summary.md" (312 words)
π€ Orchestrator: I've completed all three tasks. The researcher found
that Redis session TTL is set to 30 minutes with sliding expiry...
Meanwhile, Pune is 31Β°C and partly cloudy today...
A combined summary has been saved to the knowledge base. β
Wall-clock time: 7.2 seconds (research + weather parallel: 3.2s, then write: 2.1s, plus orchestrator reasoning: 1.9s). Sequentially this would have taken around 10 seconds. π
π‘οΈ Part 7: Failure Isolation in Practice
What happens when the weather API is down? Without isolation:
spawn_agent(researcher) β success
spawn_agent(weather-analyst) β throws β Promise.all rejects β everything fails
With Promise.allSettled:
spawn_agent(researcher) β success: findings[]
spawn_agent(weather-analyst) β failure: "OpenWeatherMap API timeout"
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Orchestrator receives both results (one error string, one success)
Orchestrator: "I couldn't retrieve weather data due to an API timeout.
Based on the research findings alone, here is what I can tell you..."
The user gets a partial but useful answer. The orchestrator explains what it could and could not retrieve. No crash, no silent failure, no generic error message. This is the correct behaviour for production AI systems. β
π§ͺ Part 8: Testing Multi-Agent Systems
Multi-agent tests can be expensive β each test run fires multiple Claude API calls. The strategy is to mock sub-agent results at the spawn_agent tool level, keeping the orchestrator logic testable without hitting the API:
// src/__tests__/orchestrator.test.ts
import { describe, it, expect, vi, beforeAll } from "vitest";
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { Client } from "@modelcontextprotocol/sdk/client/index.js";
import { InMemoryTransport } from "@modelcontextprotocol/sdk/inMemory.js";
// A mock spawn_agent tool that returns deterministic results
// without firing real sub-agent loops
async function setupMockOrchestrationServer(): Promise<Client> {
const server = new McpServer({ name: "mock-orchestration", version: "0.0.1" });
server.tool(
"spawn_agent",
"Spawn a specialist sub-agent",
{
role: { type: "string" },
task: { type: "string" },
context: { type: "string" },
},
async (args) => {
// Return deterministic mock results per role
const mockResults: Record<string, string> = {
researcher:
"[Sub-agent: researcher]\nFound: Redis TTL is 30 minutes with sliding expiry.",
"weather-analyst":
"[Sub-agent: weather-analyst]\nPune: 31C, partly cloudy.",
writer:
"[Sub-agent: writer]\nDocument indexed: summary.md",
analyst:
"[Sub-agent: analyst]\nAnalysis: conditions are favourable.",
};
const role = args.role as string;
return {
content: [{ type: "text", text: mockResults[role] ?? "No mock result for this role" }],
};
}
);
const [clientTransport, serverTransport] = InMemoryTransport.createLinkedPair();
await server.connect(serverTransport);
const client = new Client({ name: "test-client", version: "0.0.1" }, { capabilities: {} });
await client.connect(clientTransport);
return client;
}
describe("Orchestrator tool decomposition", () => {
it("calls spawn_agent for researcher and weather-analyst for compound queries", async () => {
const client = await setupMockOrchestrationServer();
const spawnCalls: string[] = [];
// Spy on callTool to record which roles are spawned
const originalCallTool = client.callTool.bind(client);
vi.spyOn(client, "callTool").mockImplementation(async (params) => {
if (params.name === "spawn_agent" && params.arguments?.role) {
spawnCalls.push(params.arguments.role as string);
}
return originalCallTool(params);
});
// Run a compound query through the orchestrator
await runOrchestrator(
"Research Redis TTL docs and check Pune weather",
client as unknown as McpClientWrapper
);
expect(spawnCalls).toContain("researcher");
expect(spawnCalls).toContain("weather-analyst");
await client.close();
});
});
InMemoryTransport again proves its value β real MCP protocol, real tool calls, zero network overhead. The test verifies that the orchestrator actually decomposes the task into the right specialist calls without burning API budget on every CI run. β
ποΈ Part 9: Wiring Into the Production Server
Add spawn_agent to your existing MCP server alongside the weather and knowledge tools:
// src/server.ts (updated from Part 5)
import { registerSpawnAgentTool } from "./tools/spawn-agent.js";
import { McpClientFactory } from "@techtush/mcp-client";
// The spawn_agent tool needs a client to call sub-agent tools with
// Create a self-referential client that connects back to the same server
const selfClient = await new McpClientFactory()
.named("orchestrator-internal", "1.0.0")
.withRetries(2)
.withTimeout(90_000)
.connectHttp({
url: `http://localhost:${process.env.PORT ?? 3000}`,
token: process.env.INTERNAL_SERVICE_TOKEN!,
});
// Register all tools
registerWeatherTools(server);
registerKnowledgeTools(server, sessionId);
registerSpawnAgentTool(server, selfClient); // π adds spawn_agent
A self-referential client β the MCP server connecting to itself β is the cleanest way to give sub-agents access to all registered tools without duplicating tool registration logic. Add INTERNAL_SERVICE_TOKEN to your env vars and include it in VALID_TOKENS. π
π‘ Key Takeaways
Sub-agents are just tool calls. The orchestrator does not know or care that spawn_agent internally runs a full Claude message loop. From the orchestrator's perspective it is identical to calling get_current_weather. This abstraction keeps the orchestrator clean and makes sub-agents independently testable.
Promise.allSettled over Promise.all. Always. One sub-agent failing should never prevent the orchestrator from receiving results from the others. Partial information is almost always better than a full crash.
Prompt the orchestrator to batch. Claude's default behaviour is sequential tool calls. The system prompt instruction to "run independent tasks IN PARALLEL" is what unlocks batch spawning. Without it, you get the sequential pattern and lose the performance benefit.
Cap tool calls and timeouts per sub-agent. Without caps, a misconfigured sub-agent can loop indefinitely or run for minutes. maxToolCalls = 10 and timeoutMs = 60000 are sensible defaults β tune based on your tools' typical latency.
Keep specialist prompts focused. A researcher that also tries to write documents is worse than a researcher that only searches and a writer that only writes. Narrow role definitions produce more reliable, more testable sub-agents.
π― Summary
In Part 14 you built a complete multi-agent orchestration system on MCP:
- π
SubAgentRunnerβ encapsulates a full agent loop withmaxToolCallsand timeout safety - π§
spawn_agentMCP tool β the clean interface the orchestrator uses to delegate specialist tasks - π€
OrchestratorAgentβ streaming Claude instance that decomposes, delegates, and synthesises - β‘
Promise.allSettledparallelism β all independent sub-agents run simultaneously - π‘οΈ Failure isolation β one failed sub-agent yields a partial answer, not a crash
- π§ͺ Mock orchestration tests β
InMemoryTransport+ deterministic mock tools, zero API cost in CI
This is the final building block of the series. Over 14 parts you went from understanding what MCP is to running a distributed, observable, multi-tenant, multi-agent AI system deployed to production. π
π Full Series Recap
Parts 1β2: MCP concepts, tools, resources, prompts, capability negotiation Parts 3β4: Agent loop, multi-step tool calls, streaming, interactive CLI Parts 5β6: HTTP transport, OAuth, Zod validation, Docker, multi-tenant sessions, Redis Part 7: Observability β pino, OpenTelemetry, Prometheus, Grafana Part 8: Reusable TypeScript client SDK, npm publish Part 9: Production deployment β Koyeb, Railway, Render, Fly.io, GitHub Actions CD Parts 10β11: RAG with pgvector and Qdrant, multi-tenant row-level security Part 12: Eval framework β faithfulness, relevance, precision, CI gate Part 13: Real-time event bus, SSE, live React dashboard Part 14: Multi-agent orchestration β spawn, delegate, aggregate π€π€