ποΈ Multi-Tenant MCP: Session Management, State Isolation, and Horizontal Scaling
Give every connected client its own isolated world β then scale your MCP server across multiple instances without dropping a session
Hi π, I'm Tushar Patil. Currently I am working as Frontend Developer (Angular) and also have expertise with .Net Core and Framework.
This is Part 6 of the AI Engineering with TypeScript series.
Prerequisites: Part 1 Β· Part 2 Β· Part 3 Β· Part 4 Β· Part 5
Stack: Node.js 20+ Β· Express 5 Β· @modelcontextprotocol/sdk v1.x Β· ioredis Β· TypeScript 5.x
πΊοΈ What we'll cover
In Part 5 we shipped a production-ready MCP server over HTTP with auth, Zod validation, and Docker. But there was a silent assumption baked in: one server, one client at a time.
The moment two clients connect simultaneously, problems surface. They share the same in-memory state. A tool call from Client A can corrupt the context for Client B. And if you try to scale horizontally β run two server instances behind a load balancer β a client's second request might land on a different instance that has no idea who it is.
Part 6 fixes all of this. We'll build:
- ποΈ Session-scoped state β every client gets an isolated
Mapkeyed bysessionId - πͺ Session ID negotiation β how MCP assigns and tracks session IDs over HTTP
- π Tenant isolation β tools can only touch the state of the caller's own session
- π Sticky sessions β how to configure a load balancer so a client always reaches its instance
- ποΈ Redis session store β share state across instances for true horizontal scaling
- β»οΈ Session TTL and cleanup β garbage-collect idle sessions so memory doesn't leak
π§ Part 1: The Session Problem, Explained
When you use StreamableHTTPServerTransport, the MCP SDK automatically generates a sessionId (a UUID) the first time a client connects and sends it back in the Mcp-Session-Id response header. The client must echo this header on every subsequent request so the server can route the request to the right session context.
Here is the flow:
Client MCP Server
| |
| POST /mcp (no session header) |
|-------------------------------->|
| | generates sessionId = "abc-123"
| Mcp-Session-Id: abc-123 |
|<--------------------------------|
| |
| POST /mcp |
| Mcp-Session-Id: abc-123 |
|-------------------------------->| looks up session "abc-123"
| tool results |
|<--------------------------------|
The SDK handles the header negotiation for you. Your job is to give each session its own isolated state store. β
ποΈ Part 2: In-Process Session State
For a single-instance server, an in-process Map is the right starting point. It is fast, zero-dependency, and easy to reason about:
// src/session-store.ts
export interface SessionState {
tenantId: string;
createdAt: Date;
lastActiveAt: Date;
toolCallCount: number;
preferredUnits: "metric" | "imperial";
cachedWeather: Map<string, { data: unknown; expiresAt: number }>;
}
class InProcessSessionStore {
private sessions = new Map<string, SessionState>();
private readonly TTL_MS = 30 * 60 * 1000; // 30 minutes
create(sessionId: string, tenantId: string): SessionState {
const state: SessionState = {
tenantId,
createdAt: new Date(),
lastActiveAt: new Date(),
toolCallCount: 0,
preferredUnits: "metric",
cachedWeather: new Map(),
};
this.sessions.set(sessionId, state);
return state;
}
get(sessionId: string): SessionState | undefined {
const state = this.sessions.get(sessionId);
if (!state) return undefined;
// Evict if TTL exceeded
if (Date.now() - state.lastActiveAt.getTime() > this.TTL_MS) {
this.sessions.delete(sessionId);
return undefined;
}
state.lastActiveAt = new Date();
return state;
}
delete(sessionId: string) {
this.sessions.delete(sessionId);
}
// Call this on a periodic interval to clean up stale sessions
evictExpired() {
const now = Date.now();
for (const [id, state] of this.sessions) {
if (now - state.lastActiveAt.getTime() > this.TTL_MS) {
this.sessions.delete(id);
}
}
}
}
export const sessionStore = new InProcessSessionStore();
// Run cleanup every 5 minutes
setInterval(() => sessionStore.evictExpired(), 5 * 60 * 1000);
Three design decisions worth noting here. The lastActiveAt timestamp is updated on every get(), so a session stays alive as long as the client is active. The evictExpired() sweep is called on an interval rather than on every request β checking TTL on every request is O(1), but deleting all stale entries periodically prevents memory from leaking even if clients disconnect without cleanup. The cachedWeather map inside each session means weather lookups for City A by Client A never collide with lookups for the same city by Client B. π―
π Part 3: Wiring Sessions into the MCP Server
The MCP SDK exposes session lifecycle hooks. Here is how to create a session on first connect and look it up on subsequent requests:
// src/server.ts (updated from Part 5)
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StreamableHTTPServerTransport } from "@modelcontextprotocol/sdk/server/streamableHttp.js";
import { sessionStore } from "./session-store.js";
import { GetWeatherInput } from "./schemas.js";
import { fetchCurrentWeather } from "./weather.js";
import type { Request, Response } from "express";
// One McpServer per session β created on first connect
const activeMcpServers = new Map<string, McpServer>();
function createSessionServer(sessionId: string, tenantId: string): McpServer {
const state = sessionStore.create(sessionId, tenantId);
const server = new McpServer({ name: "weather-server", version: "1.0.0" });
server.tool(
"get_current_weather",
"Get current weather for a city",
GetWeatherInput.shape,
async (args) => {
const input = GetWeatherInput.parse(args);
const cacheKey = `\({input.city}-\){input.units ?? state.preferredUnits}`;
// Check session-scoped cache first
const cached = state.cachedWeather.get(cacheKey);
if (cached && cached.expiresAt > Date.now()) {
return {
content: [{ type: "text", text: JSON.stringify(cached.data) }],
};
}
const data = await fetchCurrentWeather(
input.city,
input.country,
input.units ?? state.preferredUnits
);
// Store in session cache, expires in 5 minutes
state.cachedWeather.set(cacheKey, {
data,
expiresAt: Date.now() + 5 * 60 * 1000,
});
state.toolCallCount++;
return { content: [{ type: "text", text: JSON.stringify(data, null, 2) }] };
}
);
server.tool(
"set_preferred_units",
"Set the preferred temperature unit for this session",
{
units: {
type: "string",
enum: ["metric", "imperial"],
description: "Temperature unit preference for this session",
},
},
async (args) => {
const units = args.units as "metric" | "imperial";
state.preferredUnits = units;
return {
content: [
{ type: "text", text: `Preferred units set to ${units} for this session.` },
],
};
}
);
activeMcpServers.set(sessionId, server);
return server;
}
export async function handleMcpRequest(req: Request, res: Response) {
const sessionId = req.headers["mcp-session-id"] as string | undefined;
const tenantId = (req as Request & { tenantId?: string }).tenantId ?? "anonymous";
if (sessionId) {
// Existing session
const state = sessionStore.get(sessionId);
if (!state) {
res.status(404).json({ error: "session_not_found", sessionId });
return;
}
const existingServer = activeMcpServers.get(sessionId);
if (!existingServer) {
res.status(404).json({ error: "session_server_not_found", sessionId });
return;
}
// Transport is stateless β create fresh per request, server holds the state
const transport = new StreamableHTTPServerTransport({ path: "/mcp" });
await existingServer.connect(transport);
await transport.handleRequest(req, res);
} else {
// New session
const newSessionId = crypto.randomUUID();
const server = createSessionServer(newSessionId, tenantId);
const transport = new StreamableHTTPServerTransport({ path: "/mcp" });
res.setHeader("Mcp-Session-Id", newSessionId);
await server.connect(transport);
await transport.handleRequest(req, res);
}
}
The key insight here is that McpServer holds your business logic and session state, while StreamableHTTPServerTransport is created fresh per request. The server is the stateful part; the transport is the stateless plumbing. This separation is what makes per-session isolation clean. β
π Part 4: Sticky Sessions at the Load Balancer
If you run two instances of your server, you need to ensure that Client A always routes to Instance 1 (where its McpServer lives). This is called session affinity or sticky sessions.
With nginx, configure ip_hash or use the Mcp-Session-Id header as the hash key:
upstream mcp_servers {
hash $http_mcp_session_id consistent;
server mcp-instance-1:3000;
server mcp-instance-2:3000;
}
server {
listen 80;
location /mcp {
proxy_pass http://mcp_servers;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_read_timeout 3600;
}
}
With AWS ALB, create a target group with stickiness enabled using a custom cookie, and set the duration to match your session TTL (30 minutes in our case).
The downside of sticky sessions: if Instance 1 crashes, all its sessions are lost β there is no failover. This is acceptable for ephemeral agent sessions but not for anything that holds critical state. For true resilience you need an external session store, which is exactly what Redis gives you. ποΈ
ποΈ Part 5: Redis Session Store for Horizontal Scaling
To share session state across instances (and survive instance restarts), move state into Redis. We serialize the parts of SessionState that can be JSON-stringified and keep the non-serializable parts (like cachedWeather) in-process with a short TTL:
# Install ioredis
npm install ioredis
// src/redis-session-store.ts
import Redis from "ioredis";
const redis = new Redis(process.env.REDIS_URL ?? "redis://localhost:6379");
const SESSION_TTL_SECONDS = 30 * 60; // 30 minutes
const KEY = (sessionId: string) => `mcp:session:${sessionId}`;
export interface PersistedSessionState {
tenantId: string;
createdAt: string;
lastActiveAt: string;
toolCallCount: number;
preferredUnits: "metric" | "imperial";
}
export async function createSession(
sessionId: string,
tenantId: string
): Promise<PersistedSessionState> {
const state: PersistedSessionState = {
tenantId,
createdAt: new Date().toISOString(),
lastActiveAt: new Date().toISOString(),
toolCallCount: 0,
preferredUnits: "metric",
};
await redis.set(KEY(sessionId), JSON.stringify(state), "EX", SESSION_TTL_SECONDS);
return state;
}
export async function getSession(
sessionId: string
): Promise<PersistedSessionState | null> {
const raw = await redis.get(KEY(sessionId));
if (!raw) return null;
const state: PersistedSessionState = JSON.parse(raw);
state.lastActiveAt = new Date().toISOString();
// Slide the TTL on every access
await redis.set(KEY(sessionId), JSON.stringify(state), "EX", SESSION_TTL_SECONDS);
return state;
}
export async function updateSession(
sessionId: string,
patch: Partial<PersistedSessionState>
): Promise<void> {
const state = await getSession(sessionId);
if (!state) return;
const updated = { ...state, ...patch };
await redis.set(KEY(sessionId), JSON.stringify(updated), "EX", SESSION_TTL_SECONDS);
}
export async function deleteSession(sessionId: string): Promise<void> {
await redis.del(KEY(sessionId));
}
With Redis, any instance can serve any client because session state lives outside the process. You can drop sticky sessions from the load balancer and let it round-robin freely. π
The sliding TTL ("EX", SESSION_TTL_SECONDS on every write) means active sessions never expire. Only sessions that have been idle for 30 minutes get cleaned up β automatically by Redis, no cron job needed. β»οΈ
ποΈ Part 6: Load Testing Your Sessions
Before deploying, verify that concurrent sessions stay isolated. Here is a simple script using autocannon:
npm install -D autocannon
// scripts/load-test.ts
import autocannon from "autocannon";
// Simulate 10 concurrent clients, each with their own session
const sessions: string[] = [];
// First, create 10 sessions
for (let i = 0; i < 10; i++) {
const res = await fetch("http://localhost:3000/mcp", {
method: "POST",
headers: {
"Content-Type": "application/json",
Authorization: "Bearer dev-token",
},
body: JSON.stringify({
jsonrpc: "2.0",
id: 1,
method: "initialize",
params: { protocolVersion: "2025-03-26", capabilities: {} },
}),
});
const sessionId = res.headers.get("mcp-session-id");
if (sessionId) sessions.push(sessionId);
}
console.log(`Created ${sessions.length} sessions`);
// Then hammer all 10 sessions concurrently
const instance = autocannon({
url: "http://localhost:3000/mcp",
connections: 10,
duration: 30,
requests: sessions.map((sessionId) => ({
method: "POST",
headers: {
"Content-Type": "application/json",
Authorization: "Bearer dev-token",
"Mcp-Session-Id": sessionId,
},
body: JSON.stringify({
jsonrpc: "2.0",
id: 2,
method: "tools/call",
params: { name: "get_current_weather", arguments: { city: "Pune" } },
}),
})),
});
autocannon.track(instance);
Run it and check that each session's toolCallCount increments independently β proof that state is isolated. πͺ
π§Ή Part 7: Session Cleanup on Disconnect
When a client sends a DELETE /mcp request (the MCP spec's way of saying "I'm done"), clean up immediately rather than waiting for TTL expiry:
// In your Express app (server.ts)
app.delete("/mcp", bearerAuth, async (req, res) => {
const sessionId = req.headers["mcp-session-id"] as string;
if (!sessionId) {
res.status(400).json({ error: "missing_session_id" });
return;
}
await deleteSession(sessionId);
activeMcpServers.delete(sessionId);
console.log(`Session ${sessionId} cleaned up on client disconnect`);
res.status(204).send();
});
On the client side, always send this on shutdown:
// In your HTTP MCP client (from Part 5)
export async function closeSession(serverUrl: string, sessionId: string, token: string) {
await fetch(`${serverUrl}/mcp`, {
method: "DELETE",
headers: {
Authorization: `Bearer ${token}`,
"Mcp-Session-Id": sessionId,
},
});
}
Proactive cleanup keeps your Redis memory usage proportional to active connections rather than total historical connections. Always close what you open. πͺ
ποΈ Part 8: Final Architecture Overview
Here is the full picture of what you have built across Parts 5 and 6:
Internet
|
v
[ Nginx / ALB ] (TLS termination, optional sticky sessions)
| |
v v
[Instance 1] [Instance 2] (Node.js + Express + MCP SDK)
| \ / |
| --- |
v v
[ Redis ] (session state, TTL, pub/sub)
Per request inside each instance:
1. bearerAuth middleware validates token
2. Mcp-Session-Id header routes to correct McpServer
3. McpServer reads/writes PersistedSessionState from Redis
4. Tool executes with session-scoped context
5. Result streamed back via StreamableHTTPServerTransport
Each layer has a single responsibility. Nginx terminates TLS and (optionally) provides stickiness. Express handles HTTP routing and auth. The MCP SDK handles JSON-RPC framing. Redis holds durable session state. Your tools contain business logic and nothing else. π―
π‘ Key Takeaways
Session IDs are the unit of isolation. Everything β state, cache, preferences, tool call history β scopes to a sessionId. Never let one session's data bleed into another.
In-process Map for single instances, Redis for multiple. Do not introduce Redis until you actually need horizontal scaling. The in-process store is simpler, faster, and easier to debug.
Sliding TTL beats fixed TTL. Reset the expiry on every access so active sessions stay alive indefinitely and idle sessions clean themselves up.
DELETE /mcp is your friend. Implement and call it. Proactive cleanup is better than leaking memory and Redis keys until TTL fires.
Sticky sessions reduce Redis reads, but are not required. If your tool calls are fast and Redis round-trips are cheap, you can skip stickiness entirely and let the load balancer round-robin freely.
π― Summary
In Part 6 you made the MCP server truly multi-tenant:
- ποΈ Per-session McpServer instances with isolated state, cache, and preferences
- π Tenant isolation β one client's tool calls can never touch another's state
- π Sticky session routing via nginx
hashdirective or ALB stickiness - ποΈ Redis session store with sliding TTL for horizontal scale-out
- β»οΈ Proactive cleanup via
DELETE /mcpand periodic in-process eviction - ποΈ Load testing to verify isolation under concurrent connections
In Part 7 we'll go deeper on observability β adding structured logging, distributed tracing with OpenTelemetry, and a Prometheus metrics endpoint so you can actually see what your production MCP server is doing. π