🏗️ Multi-Tenant MCP: Session Management, State Isolation, and Horizontal Scaling

This is Part 6 of the AI Engineering with TypeScript series.

Prerequisites: Part 1 · Part 2 · Part 3 · Part 4 · Part 5

Stack: Node.js 20+ · Express 5 · @modelcontextprotocol/sdk v1.x · ioredis · TypeScript 5.x

🗺️ What we'll cover

In Part 5 we shipped a production-ready MCP server over HTTP with auth, Zod validation, and Docker. But there was a silent assumption baked in: one server, one client at a time.

The moment two clients connect simultaneously, problems surface. They share the same in-memory state. A tool call from Client A can corrupt the context for Client B. And if you try to scale horizontally — run two server instances behind a load balancer — a client's second request might land on a different instance that has no idea who it is.

Part 6 fixes all of this. We'll build:

🗂️ Session-scoped state — every client gets an isolated Map keyed by sessionId
🍪 Session ID negotiation — how MCP assigns and tracks session IDs over HTTP
🔒 Tenant isolation — tools can only touch the state of the caller's own session
📌 Sticky sessions — how to configure a load balancer so a client always reaches its instance
🗄️ Redis session store — share state across instances for true horizontal scaling
♻️ Session TTL and cleanup — garbage-collect idle sessions so memory doesn't leak

🧠 Part 1: The Session Problem, Explained

When you use StreamableHTTPServerTransport, the MCP SDK automatically generates a sessionId (a UUID) the first time a client connects and sends it back in the Mcp-Session-Id response header. The client must echo this header on every subsequent request so the server can route the request to the right session context.

Here is the flow:

Client                          MCP Server
  |                                 |
  |  POST /mcp (no session header)  |
  |-------------------------------->|
  |                                 | generates sessionId = "abc-123"
  |  Mcp-Session-Id: abc-123        |
  |<--------------------------------|
  |                                 |
  |  POST /mcp                      |
  |  Mcp-Session-Id: abc-123        |
  |-------------------------------->|  looks up session "abc-123"
  |  tool results                   |
  |<--------------------------------|

The SDK handles the header negotiation for you. Your job is to give each session its own isolated state store. ✅

🗂️ Part 2: In-Process Session State

For a single-instance server, an in-process Map is the right starting point. It is fast, zero-dependency, and easy to reason about:

// src/session-store.ts

export interface SessionState {
  tenantId: string;
  createdAt: Date;
  lastActiveAt: Date;
  toolCallCount: number;
  preferredUnits: "metric" | "imperial";
  cachedWeather: Map<string, { data: unknown; expiresAt: number }>;
}

class InProcessSessionStore {
  private sessions = new Map<string, SessionState>();
  private readonly TTL_MS = 30 * 60 * 1000; // 30 minutes

  create(sessionId: string, tenantId: string): SessionState {
    const state: SessionState = {
      tenantId,
      createdAt: new Date(),
      lastActiveAt: new Date(),
      toolCallCount: 0,
      preferredUnits: "metric",
      cachedWeather: new Map(),
    };
    this.sessions.set(sessionId, state);
    return state;
  }

  get(sessionId: string): SessionState | undefined {
    const state = this.sessions.get(sessionId);
    if (!state) return undefined;

    // Evict if TTL exceeded
    if (Date.now() - state.lastActiveAt.getTime() > this.TTL_MS) {
      this.sessions.delete(sessionId);
      return undefined;
    }

    state.lastActiveAt = new Date();
    return state;
  }

  delete(sessionId: string) {
    this.sessions.delete(sessionId);
  }

  // Call this on a periodic interval to clean up stale sessions
  evictExpired() {
    const now = Date.now();
    for (const [id, state] of this.sessions) {
      if (now - state.lastActiveAt.getTime() > this.TTL_MS) {
        this.sessions.delete(id);
      }
    }
  }
}

export const sessionStore = new InProcessSessionStore();

// Run cleanup every 5 minutes
setInterval(() => sessionStore.evictExpired(), 5 * 60 * 1000);

Three design decisions worth noting here. The lastActiveAt timestamp is updated on every get(), so a session stays alive as long as the client is active. The evictExpired() sweep is called on an interval rather than on every request — checking TTL on every request is O(1), but deleting all stale entries periodically prevents memory from leaking even if clients disconnect without cleanup. The cachedWeather map inside each session means weather lookups for City A by Client A never collide with lookups for the same city by Client B. 🎯

🔒 Part 3: Wiring Sessions into the MCP Server

The MCP SDK exposes session lifecycle hooks. Here is how to create a session on first connect and look it up on subsequent requests:

// src/server.ts (updated from Part 5)
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StreamableHTTPServerTransport } from "@modelcontextprotocol/sdk/server/streamableHttp.js";
import { sessionStore } from "./session-store.js";
import { GetWeatherInput } from "./schemas.js";
import { fetchCurrentWeather } from "./weather.js";
import type { Request, Response } from "express";

// One McpServer per session — created on first connect
const activeMcpServers = new Map<string, McpServer>();

function createSessionServer(sessionId: string, tenantId: string): McpServer {
  const state = sessionStore.create(sessionId, tenantId);
  const server = new McpServer({ name: "weather-server", version: "1.0.0" });

  server.tool(
    "get_current_weather",
    "Get current weather for a city",
    GetWeatherInput.shape,
    async (args) => {
      const input = GetWeatherInput.parse(args);
      const cacheKey = `\({input.city}-\){input.units ?? state.preferredUnits}`;

      // Check session-scoped cache first
      const cached = state.cachedWeather.get(cacheKey);
      if (cached && cached.expiresAt > Date.now()) {
        return {
          content: [{ type: "text", text: JSON.stringify(cached.data) }],
        };
      }

      const data = await fetchCurrentWeather(
        input.city,
        input.country,
        input.units ?? state.preferredUnits
      );

      // Store in session cache, expires in 5 minutes
      state.cachedWeather.set(cacheKey, {
        data,
        expiresAt: Date.now() + 5 * 60 * 1000,
      });

      state.toolCallCount++;
      return { content: [{ type: "text", text: JSON.stringify(data, null, 2) }] };
    }
  );

  server.tool(
    "set_preferred_units",
    "Set the preferred temperature unit for this session",
    {
      units: {
        type: "string",
        enum: ["metric", "imperial"],
        description: "Temperature unit preference for this session",
      },
    },
    async (args) => {
      const units = args.units as "metric" | "imperial";
      state.preferredUnits = units;
      return {
        content: [
          { type: "text", text: `Preferred units set to ${units} for this session.` },
        ],
      };
    }
  );

  activeMcpServers.set(sessionId, server);
  return server;
}

export async function handleMcpRequest(req: Request, res: Response) {
  const sessionId = req.headers["mcp-session-id"] as string | undefined;
  const tenantId = (req as Request & { tenantId?: string }).tenantId ?? "anonymous";

  if (sessionId) {
    // Existing session
    const state = sessionStore.get(sessionId);
    if (!state) {
      res.status(404).json({ error: "session_not_found", sessionId });
      return;
    }

    const existingServer = activeMcpServers.get(sessionId);
    if (!existingServer) {
      res.status(404).json({ error: "session_server_not_found", sessionId });
      return;
    }

    // Transport is stateless — create fresh per request, server holds the state
    const transport = new StreamableHTTPServerTransport({ path: "/mcp" });
    await existingServer.connect(transport);
    await transport.handleRequest(req, res);
  } else {
    // New session
    const newSessionId = crypto.randomUUID();
    const server = createSessionServer(newSessionId, tenantId);
    const transport = new StreamableHTTPServerTransport({ path: "/mcp" });

    res.setHeader("Mcp-Session-Id", newSessionId);
    await server.connect(transport);
    await transport.handleRequest(req, res);
  }
}

The key insight here is that McpServer holds your business logic and session state, while StreamableHTTPServerTransport is created fresh per request. The server is the stateful part; the transport is the stateless plumbing. This separation is what makes per-session isolation clean. ✅

📌 Part 4: Sticky Sessions at the Load Balancer

If you run two instances of your server, you need to ensure that Client A always routes to Instance 1 (where its McpServer lives). This is called session affinity or sticky sessions.

With nginx, configure ip_hash or use the Mcp-Session-Id header as the hash key:

upstream mcp_servers {
  hash $http_mcp_session_id consistent;
  server mcp-instance-1:3000;
  server mcp-instance-2:3000;
}

server {
  listen 80;

  location /mcp {
    proxy_pass         http://mcp_servers;
    proxy_set_header   Host $host;
    proxy_set_header   X-Real-IP $remote_addr;
    proxy_read_timeout 3600;
  }
}

With AWS ALB, create a target group with stickiness enabled using a custom cookie, and set the duration to match your session TTL (30 minutes in our case).

The downside of sticky sessions: if Instance 1 crashes, all its sessions are lost — there is no failover. This is acceptable for ephemeral agent sessions but not for anything that holds critical state. For true resilience you need an external session store, which is exactly what Redis gives you. 🗄️

🗄️ Part 5: Redis Session Store for Horizontal Scaling

To share session state across instances (and survive instance restarts), move state into Redis. We serialize the parts of SessionState that can be JSON-stringified and keep the non-serializable parts (like cachedWeather) in-process with a short TTL:

# Install ioredis
npm install ioredis

// src/redis-session-store.ts
import Redis from "ioredis";

const redis = new Redis(process.env.REDIS_URL ?? "redis://localhost:6379");

const SESSION_TTL_SECONDS = 30 * 60; // 30 minutes
const KEY = (sessionId: string) => `mcp:session:${sessionId}`;

export interface PersistedSessionState {
  tenantId: string;
  createdAt: string;
  lastActiveAt: string;
  toolCallCount: number;
  preferredUnits: "metric" | "imperial";
}

export async function createSession(
  sessionId: string,
  tenantId: string
): Promise<PersistedSessionState> {
  const state: PersistedSessionState = {
    tenantId,
    createdAt: new Date().toISOString(),
    lastActiveAt: new Date().toISOString(),
    toolCallCount: 0,
    preferredUnits: "metric",
  };

  await redis.set(KEY(sessionId), JSON.stringify(state), "EX", SESSION_TTL_SECONDS);
  return state;
}

export async function getSession(
  sessionId: string
): Promise<PersistedSessionState | null> {
  const raw = await redis.get(KEY(sessionId));
  if (!raw) return null;

  const state: PersistedSessionState = JSON.parse(raw);
  state.lastActiveAt = new Date().toISOString();

  // Slide the TTL on every access
  await redis.set(KEY(sessionId), JSON.stringify(state), "EX", SESSION_TTL_SECONDS);
  return state;
}

export async function updateSession(
  sessionId: string,
  patch: Partial<PersistedSessionState>
): Promise<void> {
  const state = await getSession(sessionId);
  if (!state) return;
  const updated = { ...state, ...patch };
  await redis.set(KEY(sessionId), JSON.stringify(updated), "EX", SESSION_TTL_SECONDS);
}

export async function deleteSession(sessionId: string): Promise<void> {
  await redis.del(KEY(sessionId));
}

With Redis, any instance can serve any client because session state lives outside the process. You can drop sticky sessions from the load balancer and let it round-robin freely. 🎉

The sliding TTL ("EX", SESSION_TTL_SECONDS on every write) means active sessions never expire. Only sessions that have been idle for 30 minutes get cleaned up — automatically by Redis, no cron job needed. ♻️

🏋️ Part 6: Load Testing Your Sessions

Before deploying, verify that concurrent sessions stay isolated. Here is a simple script using autocannon:

npm install -D autocannon

// scripts/load-test.ts
import autocannon from "autocannon";

// Simulate 10 concurrent clients, each with their own session
const sessions: string[] = [];

// First, create 10 sessions
for (let i = 0; i < 10; i++) {
  const res = await fetch("http://localhost:3000/mcp", {
    method: "POST",
    headers: {
      "Content-Type": "application/json",
      Authorization: "Bearer dev-token",
    },
    body: JSON.stringify({
      jsonrpc: "2.0",
      id: 1,
      method: "initialize",
      params: { protocolVersion: "2025-03-26", capabilities: {} },
    }),
  });

  const sessionId = res.headers.get("mcp-session-id");
  if (sessionId) sessions.push(sessionId);
}

console.log(`Created ${sessions.length} sessions`);

// Then hammer all 10 sessions concurrently
const instance = autocannon({
  url: "http://localhost:3000/mcp",
  connections: 10,
  duration: 30,
  requests: sessions.map((sessionId) => ({
    method: "POST",
    headers: {
      "Content-Type": "application/json",
      Authorization: "Bearer dev-token",
      "Mcp-Session-Id": sessionId,
    },
    body: JSON.stringify({
      jsonrpc: "2.0",
      id: 2,
      method: "tools/call",
      params: { name: "get_current_weather", arguments: { city: "Pune" } },
    }),
  })),
});

autocannon.track(instance);

Run it and check that each session's toolCallCount increments independently — proof that state is isolated. 💪

🧹 Part 7: Session Cleanup on Disconnect

When a client sends a DELETE /mcp request (the MCP spec's way of saying "I'm done"), clean up immediately rather than waiting for TTL expiry:

// In your Express app (server.ts)
app.delete("/mcp", bearerAuth, async (req, res) => {
  const sessionId = req.headers["mcp-session-id"] as string;

  if (!sessionId) {
    res.status(400).json({ error: "missing_session_id" });
    return;
  }

  await deleteSession(sessionId);
  activeMcpServers.delete(sessionId);

  console.log(`Session ${sessionId} cleaned up on client disconnect`);
  res.status(204).send();
});

On the client side, always send this on shutdown:

// In your HTTP MCP client (from Part 5)
export async function closeSession(serverUrl: string, sessionId: string, token: string) {
  await fetch(`${serverUrl}/mcp`, {
    method: "DELETE",
    headers: {
      Authorization: `Bearer ${token}`,
      "Mcp-Session-Id": sessionId,
    },
  });
}

Proactive cleanup keeps your Redis memory usage proportional to active connections rather than total historical connections. Always close what you open. 🚪

🏗️ Part 8: Final Architecture Overview

Here is the full picture of what you have built across Parts 5 and 6:

Internet
    |
    v
[ Nginx / ALB ]  (TLS termination, optional sticky sessions)
    |        |
    v        v
[Instance 1] [Instance 2]   (Node.js + Express + MCP SDK)
    |   \   /   |
    |    ---    |
    v           v
[ Redis ]                   (session state, TTL, pub/sub)

Per request inside each instance:
  1. bearerAuth middleware validates token
  2. Mcp-Session-Id header routes to correct McpServer
  3. McpServer reads/writes PersistedSessionState from Redis
  4. Tool executes with session-scoped context
  5. Result streamed back via StreamableHTTPServerTransport

Each layer has a single responsibility. Nginx terminates TLS and (optionally) provides stickiness. Express handles HTTP routing and auth. The MCP SDK handles JSON-RPC framing. Redis holds durable session state. Your tools contain business logic and nothing else. 🎯

💡 Key Takeaways

Session IDs are the unit of isolation. Everything — state, cache, preferences, tool call history — scopes to a sessionId. Never let one session's data bleed into another.

In-process Map for single instances, Redis for multiple. Do not introduce Redis until you actually need horizontal scaling. The in-process store is simpler, faster, and easier to debug.

Sliding TTL beats fixed TTL. Reset the expiry on every access so active sessions stay alive indefinitely and idle sessions clean themselves up.

DELETE /mcp is your friend. Implement and call it. Proactive cleanup is better than leaking memory and Redis keys until TTL fires.

Sticky sessions reduce Redis reads, but are not required. If your tool calls are fast and Redis round-trips are cheap, you can skip stickiness entirely and let the load balancer round-robin freely.

🎯 Summary

In Part 6 you made the MCP server truly multi-tenant:

🗂️ Per-session McpServer instances with isolated state, cache, and preferences
🔒 Tenant isolation — one client's tool calls can never touch another's state
📌 Sticky session routing via nginx hash directive or ALB stickiness
🗄️ Redis session store with sliding TTL for horizontal scale-out
♻️ Proactive cleanup via DELETE /mcp and periodic in-process eviction
🏋️ Load testing to verify isolation under concurrent connections

In Part 7 we'll go deeper on observability — adding structured logging, distributed tracing with OpenTelemetry, and a Prometheus metrics endpoint so you can actually see what your production MCP server is doing. 📊

📚 Further Reading

🗄️ ioredis documentation
📌 nginx upstream sticky sessions
🌐 MCP Streamable HTTP spec — session lifecycle
🏋️ autocannon load testing
🐳 Part 5: Production MCP Servers with HTTP, OAuth, Zod and Docker

🏗️ Multi-Tenant MCP: Session Management, State Isolation, and Horizontal Scaling

🗺️ What we'll cover

🧠 Part 1: The Session Problem, Explained

🗂️ Part 2: In-Process Session State

🔒 Part 3: Wiring Sessions into the MCP Server

📌 Part 4: Sticky Sessions at the Load Balancer

🗄️ Part 5: Redis Session Store for Horizontal Scaling

🏋️ Part 6: Load Testing Your Sessions

🧹 Part 7: Session Cleanup on Disconnect

🏗️ Part 8: Final Architecture Overview

💡 Key Takeaways

🎯 Summary

📚 Further Reading

Comments

More from this blog

📦 Build and Publish a Reusable TypeScript MCP Client SDK

📊 Observability for MCP Servers: Structured Logging, Distributed Tracing, and Metrics

🐳 Production MCP Servers: Streamable HTTP, OAuth 2.0, Zod Validation, and Docker

💬 Streaming AI Agents and an Interactive CLI: Real-Time MCP in TypeScript

Command Palette

🗺️ What we'll cover

🧠 Part 1: The Session Problem, Explained

🗂️ Part 2: In-Process Session State

🔒 Part 3: Wiring Sessions into the MCP Server

📌 Part 4: Sticky Sessions at the Load Balancer

🗄️ Part 5: Redis Session Store for Horizontal Scaling

🏋️ Part 6: Load Testing Your Sessions

🧹 Part 7: Session Cleanup on Disconnect

🏗️ Part 8: Final Architecture Overview

💡 Key Takeaways

🎯 Summary

📚 Further Reading

Comments

More from this blog