Skip to main content

Command Palette

Search for a command to run...

πŸ”’ Multi-Tenant RAG: Row-Level Security in pgvector with MCP

When Tenant A searches your knowledge base, they must never see Tenant B's documents β€” enforce this at the database level, not in application code

Updated
β€’16 min read
πŸ”’ Multi-Tenant RAG: Row-Level Security in pgvector with MCP
T

Hi πŸ‘‹, I'm Tushar Patil. Currently I am working as Frontend Developer (Angular) and also have expertise with .Net Core and Framework.


This is Part 11 of the AI Engineering with TypeScript series.

Prerequisites: Part 6 β€” Multi-Tenant Sessions Β· Part 10 β€” MCP + RAG

Stack: Node.js 20+ Β· TypeScript 5.x Β· PostgreSQL 16 Β· pgvector Β· @modelcontextprotocol/sdk Β· Zod


πŸ—ΊοΈ What we'll cover

In Part 10 we built a knowledge-base MCP tool backed by pgvector. It works beautifully β€” for a single tenant. The moment two organisations share the same deployment, a critical problem emerges: every search_knowledge_base call searches the entire documents table. Tenant A's agent can retrieve Tenant B's confidential runbooks, pricing docs, and internal API specs.

The naive fix is to add a WHERE tenant_id = $tenantId clause inside your TypeScript search function. That works until someone makes a mistake β€” a missing parameter, a copy-paste bug, a new developer who does not know the convention. Application-layer access control is fragile.

The right fix is PostgreSQL Row-Level Security (RLS) β€” a database-level policy that makes it physically impossible for a query to return rows it is not authorised to see, regardless of what the application code does.

By the end you will have:

  • πŸ—„οΈ A tenant_id column on every documents table with RLS policies that auto-filter every query
  • πŸ”‘ Per-tenant database roles so each connection can only ever see its own rows
  • πŸ”— Session-to-tenant binding that injects tenant_id into every MCP tool call from the session context built in Part 6
  • πŸ›‘οΈ Qdrant collection-per-tenant isolation as the alternative for non-PostgreSQL stacks
  • πŸ§ͺ Tests that prove cross-tenant leakage is impossible β€” not just unlikely
  • πŸ“₯ Multi-tenant ingestion pipeline that tags every chunk with its owning tenant at ingest time

🧠 Part 1: Why Application-Layer Filtering Is Not Enough

Consider this search function from Part 10:

export async function similaritySearch(
  queryEmbedding: number[],
  topK: number,
  filter?: { source?: string }
): Promise<SearchResult[]> {
  const result = await pool.query(
    `SELECT id, source, content, metadata,
            1 - (embedding <=> $1) AS score
     FROM documents
     ORDER BY embedding <=> $1
     LIMIT $2`,
    [toSql(queryEmbedding), topK]
  );
  return result.rows;
}

There is no tenant filter here. Now imagine the MCP tool handler calls this with:

const results = await similaritySearch(embedding, 5, { tenantId: session.tenantId });

You added the filter to the function signature β€” good. But what about:

  • A new developer who adds a second search function and forgets the filter
  • A background job that re-indexes documents and accidentally calls the wrong function
  • A test helper that bypasses the session context and hits the database directly
  • A future refactor that restructures the filter parameter and silently drops the clause

Any of these ships a data breach. RLS removes the entire class of bugs. The database enforces the policy on every query, every connection, every code path, without exception. πŸ›‘οΈ


πŸ—„οΈ Part 2: Schema Changes for Multi-Tenancy

First, add tenant_id to the documents table and enable RLS:

-- Add tenant column to the existing table
ALTER TABLE documents ADD COLUMN IF NOT EXISTS tenant_id TEXT NOT NULL DEFAULT '';

-- Backfill existing rows if you have them (assign to a default tenant)
UPDATE documents SET tenant_id = 'default' WHERE tenant_id = '';

-- Make tenant_id required going forward
ALTER TABLE documents ALTER COLUMN tenant_id SET NOT NULL;
ALTER TABLE documents ALTER COLUMN tenant_id DROP DEFAULT;

-- Add index for fast tenant filtering (pgvector uses this alongside the vector index)
CREATE INDEX IF NOT EXISTS documents_tenant_idx ON documents (tenant_id);

-- Composite index for filtered vector search
CREATE INDEX IF NOT EXISTS documents_tenant_embedding_idx
  ON documents (tenant_id, id);

-- Enable Row-Level Security
ALTER TABLE documents ENABLE ROW LEVEL SECURITY;
ALTER TABLE documents FORCE ROW LEVEL SECURITY;

FORCE ROW LEVEL SECURITY is important β€” without it, the table owner (your migration user) bypasses RLS entirely. With it, no role is exempt, including superusers unless they explicitly bypass it with SET row_security = off (which you should never do in application code). πŸ”’


πŸ”‘ Part 3: RLS Policies and Database Roles

Create one PostgreSQL role per tenant and write a policy that binds rows to roles:

-- Application role (used by your Node.js connection pool)
-- This role has NO direct table permissions β€” it works through RLS
CREATE ROLE mcp_app LOGIN PASSWORD 'change-me-in-prod';

-- Grant minimal permissions: connect, use schema, but NOT bypass RLS
GRANT CONNECT ON DATABASE rag_db TO mcp_app;
GRANT USAGE ON SCHEMA public TO mcp_app;
GRANT SELECT, INSERT, UPDATE, DELETE ON documents TO mcp_app;

-- The RLS policy: a row is visible only when the current_setting matches tenant_id
CREATE POLICY tenant_isolation ON documents
  AS PERMISSIVE
  FOR ALL
  TO mcp_app
  USING (tenant_id = current_setting('app.tenant_id', TRUE))
  WITH CHECK (tenant_id = current_setting('app.tenant_id', TRUE));

The current_setting('app.tenant_id', TRUE) call reads a session-local GUC (Grand Unified Configuration) variable. The TRUE flag means it returns NULL rather than throwing an error if the variable has not been set β€” we handle that case next.

The USING clause filters SELECT, UPDATE, DELETE. The WITH CHECK clause validates INSERT and UPDATE. Together they mean a connection that has set app.tenant_id = 'acme' can only read, write, and modify rows where tenant_id = 'acme'. Any other rows are invisible β€” they do not throw an error, they simply do not exist from that connection's perspective. 🎯

Add a safety net β€” block queries with no tenant set:

-- If app.tenant_id is not set, deny ALL access
CREATE POLICY deny_unset_tenant ON documents
  AS RESTRICTIVE
  FOR ALL
  TO mcp_app
  USING (current_setting('app.tenant_id', TRUE) IS NOT NULL
         AND current_setting('app.tenant_id', TRUE) != '');

RESTRICTIVE policies are ANDed with permissive ones. This means even if the permissive policy would allow a row, the restrictive policy blocks it if app.tenant_id is empty. An unauthenticated connection sees zero rows. βœ…


πŸ”— Part 4: Injecting Tenant Context from MCP Sessions

Every MCP session in Part 6 carries a tenantId from the Bearer token's claims. Now wire that into every database connection:

// src/db.ts
import pg from "pg";

const pool = new pg.Pool({
  connectionString: process.env.DATABASE_URL,
  user: "mcp_app",   // always use the RLS-enforced role
  max: 20,
});

export async function withTenantContext<T>(
  tenantId: string,
  fn: (client: pg.PoolClient) => Promise<T>
): Promise<T> {
  const client = await pool.connect();

  try {
    // Set the session-local GUC β€” this is connection-scoped and resets on release
    await client.query(
      "SELECT set_config('app.tenant_id', $1, TRUE)",
      [tenantId]
    );

    return await fn(client);
  } finally {
    // Reset before returning to the pool so the next caller gets a clean connection
    await client.query("SELECT set_config('app.tenant_id', '', TRUE)");
    client.release();
  }
}

The set_config(key, value, is_local) call with is_local = TRUE means the GUC value is local to the current transaction. When the transaction ends (or we manually reset it), the value reverts. This prevents a pooled connection from leaking Tenant A's context to Tenant B's next query. πŸ”’

Now update the similarity search to use this wrapper:

// src/stores/pgvector-store.ts (updated)
import { withTenantContext } from "../db.js";
import { toSql } from "pgvector/pg";

export async function similaritySearch(
  tenantId: string,
  queryEmbedding: number[],
  topK = 5,
  sourceFilter?: string
): Promise<SearchResult[]> {
  return withTenantContext(tenantId, async (client) => {
    let whereClause = "";
    const params: unknown[] = [toSql(queryEmbedding), topK];

    if (sourceFilter) {
      params.push(sourceFilter);
      whereClause = `AND source = $${params.length}`;
    }

    // Note: NO explicit tenant_id filter here β€” RLS handles it automatically
    const result = await client.query(
      `SELECT id, source, content, metadata,
              1 - (embedding <=> $1) AS score
       FROM documents
       WHERE 1=1 ${whereClause}
       ORDER BY embedding <=> $1
       LIMIT $2`,
      params
    );

    return result.rows.map((row) => ({
      id: row.id,
      source: row.source,
      content: row.content,
      score: parseFloat(row.score),
      metadata: row.metadata,
    }));
  });
}

export async function insertChunk(
  tenantId: string,
  chunk: DocumentChunk
): Promise<void> {
  return withTenantContext(tenantId, async (client) => {
    await client.query(
      `INSERT INTO documents (tenant_id, source, chunk_index, content, embedding, metadata)
       VALUES (\(1, \)2, \(3, \)4, \(5, \)6)`,
      [
        tenantId,
        chunk.source,
        chunk.chunkIndex,
        chunk.content,
        toSql(chunk.embedding),
        JSON.stringify(chunk.metadata ?? {}),
      ]
    );
  });
}

There is no WHERE tenant_id = $tenantId in the SELECT. The RLS policy adds it automatically and unforgettably at the database engine level. You cannot accidentally omit it. 🎯


πŸ”§ Part 5: Wiring Tenant ID into MCP Tool Calls

The MCP server from Part 6 already has a session store that holds tenantId. Pass it through to every tool call:

// src/server.ts (updated from Part 10)
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { getSession } from "./redis-session-store.js";   // from Part 6
import { embedTexts } from "./ingestion/embedder.js";
import { similaritySearch, listSources, insertChunk } from "./stores/pgvector-store.js";
import { z } from "zod";

export function createKnowledgeServer(sessionId: string): McpServer {
  const server = new McpServer({ name: "knowledge-base-server", version: "1.0.0" });

  // Helper: resolve tenantId from session at call time
  async function getTenantId(): Promise<string> {
    const session = await getSession(sessionId);
    if (!session) throw new Error(`Session ${sessionId} not found or expired`);
    return session.tenantId;
  }

  server.tool(
    "search_knowledge_base",
    "Search the internal knowledge base for information relevant to a query.",
    {
      query: z.string().min(3).describe("Natural language question or search phrase"),
      top_k: z.number().int().min(1).max(10).default(4),
      source_filter: z.string().optional(),
    },
    async (args) => {
      const tenantId = await getTenantId();
      const [queryEmbedding] = await embedTexts([args.query]);

      const results = await similaritySearch(
        tenantId,          // RLS context injected here
        queryEmbedding,
        args.top_k,
        args.source_filter
      );

      if (results.length === 0) {
        return {
          content: [{ type: "text", text: "No relevant documents found in your knowledge base." }],
        };
      }

      const formatted = results
        .map((r, i) => `[\({i + 1}] Source: \){r.source} (score: \({r.score.toFixed(3)})\n\){r.content}`)
        .join("\n\n---\n\n");

      return {
        content: [{ type: "text", text: `Found \({results.length} relevant passages:\n\n\){formatted}` }],
      };
    }
  );

  server.tool(
    "index_document",
    "Add a new document to your knowledge base.",
    {
      source: z.string(),
      content: z.string().min(10),
      tags: z.array(z.string()).default([]),
    },
    async (args) => {
      const tenantId = await getTenantId();
      const { chunkText } = await import("./ingestion/chunker.js");
      const chunks = chunkText(args.content);
      const embeddings = await embedTexts(chunks.map((c) => c.content));

      for (let i = 0; i < chunks.length; i++) {
        await insertChunk(tenantId, {
          source: args.source,
          chunkIndex: i,
          content: chunks[i].content,
          embedding: embeddings[i],
          metadata: { tags: args.tags },
        });
      }

      return {
        content: [{ type: "text", text: `Indexed \({chunks.length} chunks from "\){args.source}".` }],
      };
    }
  );

  server.resource(
    "indexed-sources",
    "knowledge://sources",
    { description: "Documents in your knowledge base" },
    async () => {
      const tenantId = await getTenantId();
      const sources = await listSources(tenantId);
      const text = sources.map((s) => `\({s.source} β€” \){s.chunkCount} chunks`).join("\n");
      return {
        contents: [{
          uri: "knowledge://sources",
          text: sources.length ? `Your indexed documents:\n\n${text}` : "No documents indexed yet.",
        }],
      };
    }
  );

  return server;
}

Every tool and resource resolves tenantId from the session at call time β€” not at server creation time. This means if the session is revoked mid-conversation, the next tool call fails immediately rather than continuing with stale credentials. βœ…


πŸ¦€ Part 6: Qdrant Alternative β€” Collection-Per-Tenant

If you are using Qdrant instead of pgvector, the isolation strategy is different. Qdrant does not have RLS, so you use one collection per tenant:

// src/stores/qdrant-store.ts (multi-tenant)
import { QdrantClient } from "@qdrant/js-client-rest";

const client = new QdrantClient({ url: process.env.QDRANT_URL ?? "http://localhost:6333" });
const VECTOR_SIZE = 1536;

function collectionName(tenantId: string): string {
  // Sanitise tenantId to a safe collection name
  return `docs_${tenantId.replace(/[^a-z0-9_]/gi, "_").toLowerCase()}`;
}

export async function ensureTenantCollection(tenantId: string): Promise<void> {
  const name = collectionName(tenantId);
  const existing = await client.getCollections();
  const exists = existing.collections.some((c) => c.name === name);

  if (!exists) {
    await client.createCollection(name, {
      vectors: { size: VECTOR_SIZE, distance: "Cosine" },
    });
    await client.createPayloadIndex(name, {
      field_name: "source",
      field_schema: "keyword",
    });
  }
}

export async function insertChunk(tenantId: string, chunk: DocumentChunk): Promise<void> {
  await ensureTenantCollection(tenantId);
  const name = collectionName(tenantId);

  await client.upsert(name, {
    points: [{
      id: `\({chunk.source}-\){chunk.chunkIndex}`,
      vector: chunk.embedding,
      payload: {
        source: chunk.source,
        chunkIndex: chunk.chunkIndex,
        content: chunk.content,
        metadata: chunk.metadata ?? {},
      },
    }],
  });
}

export async function similaritySearch(
  tenantId: string,
  queryEmbedding: number[],
  topK = 5,
  sourceFilter?: string
): Promise<SearchResult[]> {
  await ensureTenantCollection(tenantId);
  const name = collectionName(tenantId);

  const qdrantFilter = sourceFilter
    ? { must: [{ key: "source", match: { value: sourceFilter } }] }
    : undefined;

  const results = await client.search(name, {
    vector: queryEmbedding,
    limit: topK,
    filter: qdrantFilter,
    with_payload: true,
  });

  return results.map((r) => ({
    id: String(r.id),
    source: r.payload?.source as string,
    content: r.payload?.content as string,
    score: r.score,
    metadata: (r.payload?.metadata as Record<string, unknown>) ?? {},
  }));
}

Each tenant gets a collection named docs_acme, docs_globex, etc. Cross-tenant access is structurally impossible β€” there is no shared collection to leak from. The trade-off is operational: 100 tenants means 100 collections to manage, monitor, and back up. For up to a few hundred tenants, this is perfectly fine. For thousands of tenants, pgvector with RLS scales more gracefully. 🎯


πŸ§ͺ Part 7: Tests That Prove Isolation

Do not just test the happy path. Write tests that attempt cross-tenant access and assert it is blocked:

// src/__tests__/tenant-isolation.test.ts
import { describe, it, expect, beforeAll, afterAll } from "vitest";
import { withTenantContext } from "../db.js";
import { insertChunk, similaritySearch } from "../stores/pgvector-store.js";
import { embedTexts } from "../ingestion/embedder.js";
import pg from "pg";

const adminPool = new pg.Pool({ connectionString: process.env.DATABASE_URL_ADMIN });

beforeAll(async () => {
  // Seed: insert one doc for tenant-alpha and one for tenant-beta
  const [embA] = await embedTexts(["Alpha internal runbook: deploy with flyctl"]);
  const [embB] = await embedTexts(["Beta internal runbook: deploy with Railway"]);

  await insertChunk("tenant-alpha", {
    source: "alpha-runbook.md",
    chunkIndex: 0,
    content: "Alpha internal runbook: deploy with flyctl",
    embedding: embA,
  });

  await insertChunk("tenant-beta", {
    source: "beta-runbook.md",
    chunkIndex: 0,
    content: "Beta internal runbook: deploy with Railway",
    embedding: embB,
  });
});

afterAll(async () => {
  await adminPool.query("DELETE FROM documents WHERE tenant_id IN ('tenant-alpha', 'tenant-beta')");
  await adminPool.end();
});

describe("RLS tenant isolation", () => {
  it("tenant-alpha only sees its own documents", async () => {
    const [queryEmb] = await embedTexts(["internal runbook"]);
    const results = await similaritySearch("tenant-alpha", queryEmb, 10);

    const sources = results.map((r) => r.source);
    expect(sources).toContain("alpha-runbook.md");
    expect(sources).not.toContain("beta-runbook.md");   // πŸ”’ must not appear
  });

  it("tenant-beta only sees its own documents", async () => {
    const [queryEmb] = await embedTexts(["internal runbook"]);
    const results = await similaritySearch("tenant-beta", queryEmb, 10);

    const sources = results.map((r) => r.source);
    expect(sources).toContain("beta-runbook.md");
    expect(sources).not.toContain("alpha-runbook.md");  // πŸ”’ must not appear
  });

  it("a query with no tenant context returns zero rows", async () => {
    // Directly query the pool with no app.tenant_id set β€” should return nothing
    await withTenantContext("", async (client) => {
      const result = await client.query("SELECT COUNT(*) FROM documents");
      expect(parseInt(result.rows[0].count)).toBe(0);
    });
  });

  it("cannot SELECT rows of another tenant even with direct SQL", async () => {
    // Simulate a misconfigured query that forgets the tenant filter entirely
    await withTenantContext("tenant-alpha", async (client) => {
      const result = await client.query(
        "SELECT * FROM documents WHERE source = 'beta-runbook.md'"
      );
      // RLS silently filters β€” 0 rows, no error
      expect(result.rows).toHaveLength(0);
    });
  });
});

The last test is the most important one. It simulates the most common developer mistake β€” a SELECT that targets a specific source by name but forgets tenant scoping. With RLS the query returns zero rows silently. Without RLS it would return Tenant B's confidential document. πŸ”’

Run:

npm test

βœ“ tenant-alpha only sees its own documents (38ms)
βœ“ tenant-beta only sees its own documents (31ms)
βœ“ a query with no tenant context returns zero rows (12ms)
βœ“ cannot SELECT rows of another tenant even with direct SQL (9ms)

Green across the board means your data isolation is enforced at the database level and verified in CI. βœ…


πŸ“₯ Part 8: Multi-Tenant Ingestion Pipeline

The ingestion script from Part 10 needs a tenantId argument:

// src/ingestion/ingest.ts (updated)
import fs from "fs";
import path from "path";
import { chunkText } from "./chunker.js";
import { embedTexts } from "./embedder.js";
import { insertChunk } from "../stores/pgvector-store.js";

export async function ingestFile(
  tenantId: string,
  filePath: string,
  tags: string[] = []
): Promise<void> {
  const source = path.basename(filePath);
  const raw = fs.readFileSync(filePath, "utf-8");

  console.log(`πŸ“„ [\({tenantId}] Ingesting: \){source} (${raw.length} chars)`);

  const chunks = chunkText(raw);
  const embeddings = await embedTexts(chunks.map((c) => c.content));

  for (let i = 0; i < chunks.length; i++) {
    await insertChunk(tenantId, {
      source,
      chunkIndex: chunks[i].index,
      content: chunks[i].content,
      embedding: embeddings[i],
      metadata: { tags, filePath },
    });
  }

  console.log(`  βœ… [\({tenantId}] \){chunks.length} chunks indexed\n`);
}

// CLI: node dist/ingestion/ingest.js <tenantId> <file1> <file2> ...
const [, , tenantId, ...files] = process.argv;

if (!tenantId || files.length === 0) {
  console.error("Usage: ingest.js <tenantId> <file1> [file2...]");
  process.exit(1);
}

for (const f of files) {
  await ingestFile(tenantId, f);
}

Ingest documents per tenant:

# Ingest Acme Corp's documents
npx tsx src/ingestion/ingest.ts acme ./acme-docs/runbook.md ./acme-docs/api.md

# Ingest Globex Corp's documents
npx tsx src/ingestion/ingest.ts globex ./globex-docs/handbook.md

πŸ“„ [acme] Ingesting: runbook.md (18432 chars)
  βœ… [acme] 42 chunks indexed

πŸ“„ [globex] Ingesting: handbook.md (31200 chars)
  βœ… [globex] 71 chunks indexed

πŸ’‘ Part 9: RLS Performance β€” What to Know

RLS has negligible overhead for simple equality checks like tenant_id = current_setting(...). But there are a few things to keep in mind at scale:

Index coverage. Make sure the planner can use the tenant_id index alongside the vector index. Run EXPLAIN ANALYZE on a filtered vector search and confirm it says Index Scan using documents_tenant_idx rather than Seq Scan. If it falls back to a sequential scan, add a partial index:

CREATE INDEX documents_tenant_vector_idx
  ON documents (tenant_id)
  INCLUDE (embedding);

Connection pooling with PgBouncer. PgBouncer in transaction-pooling mode reuses connections between clients, which means set_config values could persist. Use set_config('app.tenant_id', '', TRUE) in the finally block (which we already do) and configure PgBouncer to reset session state between connections using server_reset_query = RESET ALL.

Partition large tables by tenant. If a single tenant generates millions of documents, consider partitioning the documents table by tenant_id. Each partition has its own index, dramatically reducing I/O for single-tenant queries:

CREATE TABLE documents (
  id          BIGSERIAL,
  tenant_id   TEXT NOT NULL,
  -- ... other columns
) PARTITION BY LIST (tenant_id);

CREATE TABLE documents_acme PARTITION OF documents FOR VALUES IN ('acme');
CREATE TABLE documents_globex PARTITION OF documents FOR VALUES IN ('globex');

🎯 Summary

In Part 11 you made the RAG knowledge base production-safe for multi-tenant deployments:

  • πŸ—„οΈ RLS policies β€” database-enforced row filtering that application code cannot accidentally bypass
  • πŸ”‘ set_config session GUC β€” injects tenant_id per connection with automatic cleanup on pool release
  • πŸ”’ FORCE ROW LEVEL SECURITY β€” blocks even the table owner from bypassing the policy
  • πŸ”— Session-to-tenant binding β€” resolves tenantId from the MCP session store on every tool call
  • πŸ¦€ Qdrant collection-per-tenant β€” structural isolation for non-PostgreSQL stacks
  • πŸ§ͺ Cross-tenant leakage tests β€” proves isolation is enforced, not just assumed
  • πŸ“₯ Multi-tenant ingestion β€” every chunk is tagged with its owning tenant at ingest time

In Part 12 we will build on the full stack β€” agent, MCP server, RAG, sessions β€” and add evaluation and evals: a framework for measuring how accurately your RAG agent answers questions, tracking precision and recall over time, and catching regressions when you update your document corpus or swap embedding models. πŸ“


πŸ“š Further Reading

AI Engineering with TypeScript

Part 11 of 14

A comprehensive, code-first series on building production-grade AI systems with the Model Context Protocol (MCP) and TypeScript. From your first MCP server to multi-agent orchestration, RAG pipelines, observability, and global deployment β€” every post is packed with real, runnable code.

Up next

πŸ“ RAG Evaluation Framework for MCP Agents: Measuring What Actually Matters

Stop guessing whether your RAG agent is getting better or worse β€” build a systematic eval framework that scores faithfulness, relevance, and precision, then gate your CI pipeline on it