π Multi-Tenant RAG: Row-Level Security in pgvector with MCP
When Tenant A searches your knowledge base, they must never see Tenant B's documents β enforce this at the database level, not in application code
Hi π, I'm Tushar Patil. Currently I am working as Frontend Developer (Angular) and also have expertise with .Net Core and Framework.
This is Part 11 of the AI Engineering with TypeScript series.
Prerequisites: Part 6 β Multi-Tenant Sessions Β· Part 10 β MCP + RAG
Stack: Node.js 20+ Β· TypeScript 5.x Β· PostgreSQL 16 Β· pgvector Β· @modelcontextprotocol/sdk Β· Zod
πΊοΈ What we'll cover
In Part 10 we built a knowledge-base MCP tool backed by pgvector. It works beautifully β for a single tenant. The moment two organisations share the same deployment, a critical problem emerges: every search_knowledge_base call searches the entire documents table. Tenant A's agent can retrieve Tenant B's confidential runbooks, pricing docs, and internal API specs.
The naive fix is to add a WHERE tenant_id = $tenantId clause inside your TypeScript search function. That works until someone makes a mistake β a missing parameter, a copy-paste bug, a new developer who does not know the convention. Application-layer access control is fragile.
The right fix is PostgreSQL Row-Level Security (RLS) β a database-level policy that makes it physically impossible for a query to return rows it is not authorised to see, regardless of what the application code does.
By the end you will have:
- ποΈ A
tenant_idcolumn on every documents table with RLS policies that auto-filter every query - π Per-tenant database roles so each connection can only ever see its own rows
- π Session-to-tenant binding that injects
tenant_idinto every MCP tool call from the session context built in Part 6 - π‘οΈ Qdrant collection-per-tenant isolation as the alternative for non-PostgreSQL stacks
- π§ͺ Tests that prove cross-tenant leakage is impossible β not just unlikely
- π₯ Multi-tenant ingestion pipeline that tags every chunk with its owning tenant at ingest time
π§ Part 1: Why Application-Layer Filtering Is Not Enough
Consider this search function from Part 10:
export async function similaritySearch(
queryEmbedding: number[],
topK: number,
filter?: { source?: string }
): Promise<SearchResult[]> {
const result = await pool.query(
`SELECT id, source, content, metadata,
1 - (embedding <=> $1) AS score
FROM documents
ORDER BY embedding <=> $1
LIMIT $2`,
[toSql(queryEmbedding), topK]
);
return result.rows;
}
There is no tenant filter here. Now imagine the MCP tool handler calls this with:
const results = await similaritySearch(embedding, 5, { tenantId: session.tenantId });
You added the filter to the function signature β good. But what about:
- A new developer who adds a second search function and forgets the filter
- A background job that re-indexes documents and accidentally calls the wrong function
- A test helper that bypasses the session context and hits the database directly
- A future refactor that restructures the filter parameter and silently drops the clause
Any of these ships a data breach. RLS removes the entire class of bugs. The database enforces the policy on every query, every connection, every code path, without exception. π‘οΈ
ποΈ Part 2: Schema Changes for Multi-Tenancy
First, add tenant_id to the documents table and enable RLS:
-- Add tenant column to the existing table
ALTER TABLE documents ADD COLUMN IF NOT EXISTS tenant_id TEXT NOT NULL DEFAULT '';
-- Backfill existing rows if you have them (assign to a default tenant)
UPDATE documents SET tenant_id = 'default' WHERE tenant_id = '';
-- Make tenant_id required going forward
ALTER TABLE documents ALTER COLUMN tenant_id SET NOT NULL;
ALTER TABLE documents ALTER COLUMN tenant_id DROP DEFAULT;
-- Add index for fast tenant filtering (pgvector uses this alongside the vector index)
CREATE INDEX IF NOT EXISTS documents_tenant_idx ON documents (tenant_id);
-- Composite index for filtered vector search
CREATE INDEX IF NOT EXISTS documents_tenant_embedding_idx
ON documents (tenant_id, id);
-- Enable Row-Level Security
ALTER TABLE documents ENABLE ROW LEVEL SECURITY;
ALTER TABLE documents FORCE ROW LEVEL SECURITY;
FORCE ROW LEVEL SECURITY is important β without it, the table owner (your migration user) bypasses RLS entirely. With it, no role is exempt, including superusers unless they explicitly bypass it with SET row_security = off (which you should never do in application code). π
π Part 3: RLS Policies and Database Roles
Create one PostgreSQL role per tenant and write a policy that binds rows to roles:
-- Application role (used by your Node.js connection pool)
-- This role has NO direct table permissions β it works through RLS
CREATE ROLE mcp_app LOGIN PASSWORD 'change-me-in-prod';
-- Grant minimal permissions: connect, use schema, but NOT bypass RLS
GRANT CONNECT ON DATABASE rag_db TO mcp_app;
GRANT USAGE ON SCHEMA public TO mcp_app;
GRANT SELECT, INSERT, UPDATE, DELETE ON documents TO mcp_app;
-- The RLS policy: a row is visible only when the current_setting matches tenant_id
CREATE POLICY tenant_isolation ON documents
AS PERMISSIVE
FOR ALL
TO mcp_app
USING (tenant_id = current_setting('app.tenant_id', TRUE))
WITH CHECK (tenant_id = current_setting('app.tenant_id', TRUE));
The current_setting('app.tenant_id', TRUE) call reads a session-local GUC (Grand Unified Configuration) variable. The TRUE flag means it returns NULL rather than throwing an error if the variable has not been set β we handle that case next.
The USING clause filters SELECT, UPDATE, DELETE. The WITH CHECK clause validates INSERT and UPDATE. Together they mean a connection that has set app.tenant_id = 'acme' can only read, write, and modify rows where tenant_id = 'acme'. Any other rows are invisible β they do not throw an error, they simply do not exist from that connection's perspective. π―
Add a safety net β block queries with no tenant set:
-- If app.tenant_id is not set, deny ALL access
CREATE POLICY deny_unset_tenant ON documents
AS RESTRICTIVE
FOR ALL
TO mcp_app
USING (current_setting('app.tenant_id', TRUE) IS NOT NULL
AND current_setting('app.tenant_id', TRUE) != '');
RESTRICTIVE policies are ANDed with permissive ones. This means even if the permissive policy would allow a row, the restrictive policy blocks it if app.tenant_id is empty. An unauthenticated connection sees zero rows. β
π Part 4: Injecting Tenant Context from MCP Sessions
Every MCP session in Part 6 carries a tenantId from the Bearer token's claims. Now wire that into every database connection:
// src/db.ts
import pg from "pg";
const pool = new pg.Pool({
connectionString: process.env.DATABASE_URL,
user: "mcp_app", // always use the RLS-enforced role
max: 20,
});
export async function withTenantContext<T>(
tenantId: string,
fn: (client: pg.PoolClient) => Promise<T>
): Promise<T> {
const client = await pool.connect();
try {
// Set the session-local GUC β this is connection-scoped and resets on release
await client.query(
"SELECT set_config('app.tenant_id', $1, TRUE)",
[tenantId]
);
return await fn(client);
} finally {
// Reset before returning to the pool so the next caller gets a clean connection
await client.query("SELECT set_config('app.tenant_id', '', TRUE)");
client.release();
}
}
The set_config(key, value, is_local) call with is_local = TRUE means the GUC value is local to the current transaction. When the transaction ends (or we manually reset it), the value reverts. This prevents a pooled connection from leaking Tenant A's context to Tenant B's next query. π
Now update the similarity search to use this wrapper:
// src/stores/pgvector-store.ts (updated)
import { withTenantContext } from "../db.js";
import { toSql } from "pgvector/pg";
export async function similaritySearch(
tenantId: string,
queryEmbedding: number[],
topK = 5,
sourceFilter?: string
): Promise<SearchResult[]> {
return withTenantContext(tenantId, async (client) => {
let whereClause = "";
const params: unknown[] = [toSql(queryEmbedding), topK];
if (sourceFilter) {
params.push(sourceFilter);
whereClause = `AND source = $${params.length}`;
}
// Note: NO explicit tenant_id filter here β RLS handles it automatically
const result = await client.query(
`SELECT id, source, content, metadata,
1 - (embedding <=> $1) AS score
FROM documents
WHERE 1=1 ${whereClause}
ORDER BY embedding <=> $1
LIMIT $2`,
params
);
return result.rows.map((row) => ({
id: row.id,
source: row.source,
content: row.content,
score: parseFloat(row.score),
metadata: row.metadata,
}));
});
}
export async function insertChunk(
tenantId: string,
chunk: DocumentChunk
): Promise<void> {
return withTenantContext(tenantId, async (client) => {
await client.query(
`INSERT INTO documents (tenant_id, source, chunk_index, content, embedding, metadata)
VALUES (\(1, \)2, \(3, \)4, \(5, \)6)`,
[
tenantId,
chunk.source,
chunk.chunkIndex,
chunk.content,
toSql(chunk.embedding),
JSON.stringify(chunk.metadata ?? {}),
]
);
});
}
There is no WHERE tenant_id = $tenantId in the SELECT. The RLS policy adds it automatically and unforgettably at the database engine level. You cannot accidentally omit it. π―
π§ Part 5: Wiring Tenant ID into MCP Tool Calls
The MCP server from Part 6 already has a session store that holds tenantId. Pass it through to every tool call:
// src/server.ts (updated from Part 10)
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { getSession } from "./redis-session-store.js"; // from Part 6
import { embedTexts } from "./ingestion/embedder.js";
import { similaritySearch, listSources, insertChunk } from "./stores/pgvector-store.js";
import { z } from "zod";
export function createKnowledgeServer(sessionId: string): McpServer {
const server = new McpServer({ name: "knowledge-base-server", version: "1.0.0" });
// Helper: resolve tenantId from session at call time
async function getTenantId(): Promise<string> {
const session = await getSession(sessionId);
if (!session) throw new Error(`Session ${sessionId} not found or expired`);
return session.tenantId;
}
server.tool(
"search_knowledge_base",
"Search the internal knowledge base for information relevant to a query.",
{
query: z.string().min(3).describe("Natural language question or search phrase"),
top_k: z.number().int().min(1).max(10).default(4),
source_filter: z.string().optional(),
},
async (args) => {
const tenantId = await getTenantId();
const [queryEmbedding] = await embedTexts([args.query]);
const results = await similaritySearch(
tenantId, // RLS context injected here
queryEmbedding,
args.top_k,
args.source_filter
);
if (results.length === 0) {
return {
content: [{ type: "text", text: "No relevant documents found in your knowledge base." }],
};
}
const formatted = results
.map((r, i) => `[\({i + 1}] Source: \){r.source} (score: \({r.score.toFixed(3)})\n\){r.content}`)
.join("\n\n---\n\n");
return {
content: [{ type: "text", text: `Found \({results.length} relevant passages:\n\n\){formatted}` }],
};
}
);
server.tool(
"index_document",
"Add a new document to your knowledge base.",
{
source: z.string(),
content: z.string().min(10),
tags: z.array(z.string()).default([]),
},
async (args) => {
const tenantId = await getTenantId();
const { chunkText } = await import("./ingestion/chunker.js");
const chunks = chunkText(args.content);
const embeddings = await embedTexts(chunks.map((c) => c.content));
for (let i = 0; i < chunks.length; i++) {
await insertChunk(tenantId, {
source: args.source,
chunkIndex: i,
content: chunks[i].content,
embedding: embeddings[i],
metadata: { tags: args.tags },
});
}
return {
content: [{ type: "text", text: `Indexed \({chunks.length} chunks from "\){args.source}".` }],
};
}
);
server.resource(
"indexed-sources",
"knowledge://sources",
{ description: "Documents in your knowledge base" },
async () => {
const tenantId = await getTenantId();
const sources = await listSources(tenantId);
const text = sources.map((s) => `\({s.source} β \){s.chunkCount} chunks`).join("\n");
return {
contents: [{
uri: "knowledge://sources",
text: sources.length ? `Your indexed documents:\n\n${text}` : "No documents indexed yet.",
}],
};
}
);
return server;
}
Every tool and resource resolves tenantId from the session at call time β not at server creation time. This means if the session is revoked mid-conversation, the next tool call fails immediately rather than continuing with stale credentials. β
π¦ Part 6: Qdrant Alternative β Collection-Per-Tenant
If you are using Qdrant instead of pgvector, the isolation strategy is different. Qdrant does not have RLS, so you use one collection per tenant:
// src/stores/qdrant-store.ts (multi-tenant)
import { QdrantClient } from "@qdrant/js-client-rest";
const client = new QdrantClient({ url: process.env.QDRANT_URL ?? "http://localhost:6333" });
const VECTOR_SIZE = 1536;
function collectionName(tenantId: string): string {
// Sanitise tenantId to a safe collection name
return `docs_${tenantId.replace(/[^a-z0-9_]/gi, "_").toLowerCase()}`;
}
export async function ensureTenantCollection(tenantId: string): Promise<void> {
const name = collectionName(tenantId);
const existing = await client.getCollections();
const exists = existing.collections.some((c) => c.name === name);
if (!exists) {
await client.createCollection(name, {
vectors: { size: VECTOR_SIZE, distance: "Cosine" },
});
await client.createPayloadIndex(name, {
field_name: "source",
field_schema: "keyword",
});
}
}
export async function insertChunk(tenantId: string, chunk: DocumentChunk): Promise<void> {
await ensureTenantCollection(tenantId);
const name = collectionName(tenantId);
await client.upsert(name, {
points: [{
id: `\({chunk.source}-\){chunk.chunkIndex}`,
vector: chunk.embedding,
payload: {
source: chunk.source,
chunkIndex: chunk.chunkIndex,
content: chunk.content,
metadata: chunk.metadata ?? {},
},
}],
});
}
export async function similaritySearch(
tenantId: string,
queryEmbedding: number[],
topK = 5,
sourceFilter?: string
): Promise<SearchResult[]> {
await ensureTenantCollection(tenantId);
const name = collectionName(tenantId);
const qdrantFilter = sourceFilter
? { must: [{ key: "source", match: { value: sourceFilter } }] }
: undefined;
const results = await client.search(name, {
vector: queryEmbedding,
limit: topK,
filter: qdrantFilter,
with_payload: true,
});
return results.map((r) => ({
id: String(r.id),
source: r.payload?.source as string,
content: r.payload?.content as string,
score: r.score,
metadata: (r.payload?.metadata as Record<string, unknown>) ?? {},
}));
}
Each tenant gets a collection named docs_acme, docs_globex, etc. Cross-tenant access is structurally impossible β there is no shared collection to leak from. The trade-off is operational: 100 tenants means 100 collections to manage, monitor, and back up. For up to a few hundred tenants, this is perfectly fine. For thousands of tenants, pgvector with RLS scales more gracefully. π―
π§ͺ Part 7: Tests That Prove Isolation
Do not just test the happy path. Write tests that attempt cross-tenant access and assert it is blocked:
// src/__tests__/tenant-isolation.test.ts
import { describe, it, expect, beforeAll, afterAll } from "vitest";
import { withTenantContext } from "../db.js";
import { insertChunk, similaritySearch } from "../stores/pgvector-store.js";
import { embedTexts } from "../ingestion/embedder.js";
import pg from "pg";
const adminPool = new pg.Pool({ connectionString: process.env.DATABASE_URL_ADMIN });
beforeAll(async () => {
// Seed: insert one doc for tenant-alpha and one for tenant-beta
const [embA] = await embedTexts(["Alpha internal runbook: deploy with flyctl"]);
const [embB] = await embedTexts(["Beta internal runbook: deploy with Railway"]);
await insertChunk("tenant-alpha", {
source: "alpha-runbook.md",
chunkIndex: 0,
content: "Alpha internal runbook: deploy with flyctl",
embedding: embA,
});
await insertChunk("tenant-beta", {
source: "beta-runbook.md",
chunkIndex: 0,
content: "Beta internal runbook: deploy with Railway",
embedding: embB,
});
});
afterAll(async () => {
await adminPool.query("DELETE FROM documents WHERE tenant_id IN ('tenant-alpha', 'tenant-beta')");
await adminPool.end();
});
describe("RLS tenant isolation", () => {
it("tenant-alpha only sees its own documents", async () => {
const [queryEmb] = await embedTexts(["internal runbook"]);
const results = await similaritySearch("tenant-alpha", queryEmb, 10);
const sources = results.map((r) => r.source);
expect(sources).toContain("alpha-runbook.md");
expect(sources).not.toContain("beta-runbook.md"); // π must not appear
});
it("tenant-beta only sees its own documents", async () => {
const [queryEmb] = await embedTexts(["internal runbook"]);
const results = await similaritySearch("tenant-beta", queryEmb, 10);
const sources = results.map((r) => r.source);
expect(sources).toContain("beta-runbook.md");
expect(sources).not.toContain("alpha-runbook.md"); // π must not appear
});
it("a query with no tenant context returns zero rows", async () => {
// Directly query the pool with no app.tenant_id set β should return nothing
await withTenantContext("", async (client) => {
const result = await client.query("SELECT COUNT(*) FROM documents");
expect(parseInt(result.rows[0].count)).toBe(0);
});
});
it("cannot SELECT rows of another tenant even with direct SQL", async () => {
// Simulate a misconfigured query that forgets the tenant filter entirely
await withTenantContext("tenant-alpha", async (client) => {
const result = await client.query(
"SELECT * FROM documents WHERE source = 'beta-runbook.md'"
);
// RLS silently filters β 0 rows, no error
expect(result.rows).toHaveLength(0);
});
});
});
The last test is the most important one. It simulates the most common developer mistake β a SELECT that targets a specific source by name but forgets tenant scoping. With RLS the query returns zero rows silently. Without RLS it would return Tenant B's confidential document. π
Run:
npm test
β tenant-alpha only sees its own documents (38ms)
β tenant-beta only sees its own documents (31ms)
β a query with no tenant context returns zero rows (12ms)
β cannot SELECT rows of another tenant even with direct SQL (9ms)
Green across the board means your data isolation is enforced at the database level and verified in CI. β
π₯ Part 8: Multi-Tenant Ingestion Pipeline
The ingestion script from Part 10 needs a tenantId argument:
// src/ingestion/ingest.ts (updated)
import fs from "fs";
import path from "path";
import { chunkText } from "./chunker.js";
import { embedTexts } from "./embedder.js";
import { insertChunk } from "../stores/pgvector-store.js";
export async function ingestFile(
tenantId: string,
filePath: string,
tags: string[] = []
): Promise<void> {
const source = path.basename(filePath);
const raw = fs.readFileSync(filePath, "utf-8");
console.log(`π [\({tenantId}] Ingesting: \){source} (${raw.length} chars)`);
const chunks = chunkText(raw);
const embeddings = await embedTexts(chunks.map((c) => c.content));
for (let i = 0; i < chunks.length; i++) {
await insertChunk(tenantId, {
source,
chunkIndex: chunks[i].index,
content: chunks[i].content,
embedding: embeddings[i],
metadata: { tags, filePath },
});
}
console.log(` β
[\({tenantId}] \){chunks.length} chunks indexed\n`);
}
// CLI: node dist/ingestion/ingest.js <tenantId> <file1> <file2> ...
const [, , tenantId, ...files] = process.argv;
if (!tenantId || files.length === 0) {
console.error("Usage: ingest.js <tenantId> <file1> [file2...]");
process.exit(1);
}
for (const f of files) {
await ingestFile(tenantId, f);
}
Ingest documents per tenant:
# Ingest Acme Corp's documents
npx tsx src/ingestion/ingest.ts acme ./acme-docs/runbook.md ./acme-docs/api.md
# Ingest Globex Corp's documents
npx tsx src/ingestion/ingest.ts globex ./globex-docs/handbook.md
π [acme] Ingesting: runbook.md (18432 chars)
β
[acme] 42 chunks indexed
π [globex] Ingesting: handbook.md (31200 chars)
β
[globex] 71 chunks indexed
π‘ Part 9: RLS Performance β What to Know
RLS has negligible overhead for simple equality checks like tenant_id = current_setting(...). But there are a few things to keep in mind at scale:
Index coverage. Make sure the planner can use the tenant_id index alongside the vector index. Run EXPLAIN ANALYZE on a filtered vector search and confirm it says Index Scan using documents_tenant_idx rather than Seq Scan. If it falls back to a sequential scan, add a partial index:
CREATE INDEX documents_tenant_vector_idx
ON documents (tenant_id)
INCLUDE (embedding);
Connection pooling with PgBouncer. PgBouncer in transaction-pooling mode reuses connections between clients, which means set_config values could persist. Use set_config('app.tenant_id', '', TRUE) in the finally block (which we already do) and configure PgBouncer to reset session state between connections using server_reset_query = RESET ALL.
Partition large tables by tenant. If a single tenant generates millions of documents, consider partitioning the documents table by tenant_id. Each partition has its own index, dramatically reducing I/O for single-tenant queries:
CREATE TABLE documents (
id BIGSERIAL,
tenant_id TEXT NOT NULL,
-- ... other columns
) PARTITION BY LIST (tenant_id);
CREATE TABLE documents_acme PARTITION OF documents FOR VALUES IN ('acme');
CREATE TABLE documents_globex PARTITION OF documents FOR VALUES IN ('globex');
π― Summary
In Part 11 you made the RAG knowledge base production-safe for multi-tenant deployments:
- ποΈ RLS policies β database-enforced row filtering that application code cannot accidentally bypass
- π
set_configsession GUC β injectstenant_idper connection with automatic cleanup on pool release - π
FORCE ROW LEVEL SECURITYβ blocks even the table owner from bypassing the policy - π Session-to-tenant binding β resolves
tenantIdfrom the MCP session store on every tool call - π¦ Qdrant collection-per-tenant β structural isolation for non-PostgreSQL stacks
- π§ͺ Cross-tenant leakage tests β proves isolation is enforced, not just assumed
- π₯ Multi-tenant ingestion β every chunk is tagged with its owning tenant at ingest time
In Part 12 we will build on the full stack β agent, MCP server, RAG, sessions β and add evaluation and evals: a framework for measuring how accurately your RAG agent answers questions, tracking precision and recall over time, and catching regressions when you update your document corpus or swap embedding models. π