Skip to main content

Knowledge & RAG

Ingestion Pipeline

Uploaded documents flow through a pipeline that extracts text, scrubs PII, chunks content, generates embeddings, and creates synthetic Q&A pairs.

Pipeline Stages

Upload → Parse → PII Scrub → Chunk → Embed → Synthetic Q&A → Store

1. Upload

Upload files via POST /api/v1/knowledge/upload (multipart), or create from a URL with POST /api/v1/knowledge/documents. The file lands in KB_UPLOAD_DIR and a Document record is created with status pending.

2. Queue

A BullMQ job is enqueued to knowledge-ingest for async processing:

  • Concurrency: 1 (sequential)
  • Retries with exponential backoff
  • Graceful shutdown on SIGTERM

3. Text Extraction

Handled by src/knowledge/parsers.ts:

TypeParserNotes
PDFpdf-parseFull text extraction
DOCXmammothConverts to plain text
TXTDirect file readUTF-8
URLHTTP fetch + HTML stripRemoves tags
ImageOpenAI Vision APIOCR via gpt-4o

4. PII Scrubbing (One-Way)

Before chunking, all PII is irreversibly masked. Raw personal data never lands in the knowledge base.

Detected patterns:

  • Email addresses
  • Phone numbers
  • Credit card numbers
  • IP addresses
  • IBAN numbers
  • Monetary amounts

Replaced with [REDACTED_EMAIL], [REDACTED_PHONE], etc.

See PII Scrubber for details.

5. Chunking

Text is split into chunks using the legal-document-aware chunker.

  • Default max chunk size: 2000 characters
  • Overlap between chunks for context preservation
  • Ukrainian legal structure detection (Розділ, Стаття, п.)
  • Section path tracking for hierarchical context

See Semantic Chunker for details.

6. Embedding

Each chunk is embedded using OpenAI text-embedding-3-small:

  • Vector dimension: 1536
  • Batch size: 32 chunks per API call
  • Stored in PostgreSQL via pgvector

7. Synthetic Q&A Generation

For each chunk, gpt-4o-mini generates 3 synthetic questions that the chunk could answer. These questions are:

  • embedded separately (1536-dim vectors);
  • stored in the ChunkQuestion table;
  • used as an extra retrieval path during RAG queries.

8. Completion

On success:

  • Document status updated to ready
  • Metadata updated with chunkCount, textLength
  • Temporary upload file deleted

On failure:

  • Document status updated to failed
  • Error stored in metadata.lastError
  • BullMQ retries if attempts remaining

Monitoring

Track ingestion status via:

  • GET /api/v1/knowledge/documents?status=processing — in-progress jobs
  • GET /api/v1/knowledge/documents?status=failed — failed documents
  • Langfuse traces for ingestion.synthetic-qa spans

RAG Pipeline

The RAG pipeline combines hybrid search (vector + keyword) with LLM generation to produce source-grounded replies. The shared agent-tasks worker calls it on every inbound message; the RAG draft endpoint calls it for manual tests.

Architecture

Query → Injection Guard → Embed → Hybrid Retrieval → Rank → Context Assembly → LLM → PII Restore

Implementation: src/knowledge/rag.ts (OpenAiRagPipeline)

Retrieval Strategy

Three retrieval paths are combined:

PathDefault WeightCandidatesDescription
Chunk vectors85% of vector budgetTop 18Cosine similarity on chunk embeddings
Question vectors15% of vector budgetTop 18Cosine similarity on synthetic Q&A embeddings
Keyword search35% (default keyword weight)Top 18PostgreSQL full-text search on chunk content

The default split is: vector search 65% vs keyword search 35% (configurable per knowledge base via config.ragWeights using vector and keyword fields). Within the vector budget, 85% goes to chunk-content embeddings and 15% to synthetic Q&A embeddings.

Scoring

Each candidate receives a hybrid score:

score = (vector × 0.85) × vecScore + (vector × 0.15) × qScore + keyword × keyScore

Scores are normalized per-path before combining.

Top-K Selection

After ranking, the top 6 chunks are picked. A max context length cap keeps the prompt within LLM token limits.

Context Assembly

Selected chunks are formatted with:

  • Document title
  • Section path (from chunker metadata)
  • Chunk content

LLM Generation

ParameterValue
ModelOPENAI_MODEL (default: gpt-4o)
Temperature0.2
Max tokens700
StreamingSupported

The prompt includes:

  1. System prompt — from namespace persona configuration
  2. Employee profile — summary, current projects, preferences (if available)
  3. Conversation history — last 6 messages (normalized)
  4. RAG context — assembled chunks with source metadata
  5. User query

PII Handling

  1. PII in the user message is replaced with placeholders before the prompt goes to the LLM.
  2. Placeholders are stored encrypted in PiiRedactionMap.
  3. After generation, placeholders in the response are restored to the real values.

See PII Scrubber.

Injection Protection

Both the user message and the RAG context chunks are checked for prompt-injection patterns before assembly. If detected:

  • the query is flagged (injectionDetected: true);
  • a reason is included (injectionReason);
  • the message routes to HITL regardless of trust status.

See Injection Guard.

Testing

Use POST /api/v1/rag/draft to test RAG queries without sending to any channel:

curl -X POST http://localhost:3000/api/v1/rag/draft \
-H 'Authorization: Bearer <token>' \
-H 'Content-Type: application/json' \
-d '{"departmentId": "<department-id>", "text": "What is the leave policy?"}'

Configuration per Knowledge Base

RAG weights can be tuned per knowledge base via config.ragWeights:

{
"ragWeights": {
"vector": 0.65,
"keyword": 0.35
}
}

See RAG Weights.


Semantic Chunker

The chunker splits documents into context-preserving chunks and knows about legal document structure. It's tuned for Ukrainian legal texts.

Implementation

src/knowledge/chunker.tssplitLegalTextIntoChunks()

The chunker recognizes Ukrainian legal document headings:

PatternExampleLevel
Розділ N / Розділ IРозділ 3. ОплатаSection
Стаття N / Стаття N.NСтаття 15. ВідповідальністьArticle
Пункт N.N / п. Nп. 5Clause
N.N. (standalone numbered)1.1. Загальні положенняNumbered
N)1) перший варіантList item
а) / б) (Cyrillic)а) перша умоваSub-item

Chunking Strategy

  1. Split by headings — legal headings are natural chunk boundaries.
  2. Section path tracking — each chunk carries a breadcrumb path (e.g. ["Розділ 3", "Стаття 15", "п. 2"]).
  3. Size enforcement — chunks above maxChunkSize (default 2000 chars) split at paragraph or sentence boundaries.
  4. Overlap — configurable overlap between consecutive chunks preserves context across boundaries.
  5. Metadata — each chunk stores chunkIndex, startChar, endChar, sectionPath.

Fallback

When no legal structure is detected, the chunker falls back to:

  1. paragraph split (double newline);
  2. sentence split (period + space);
  3. hard split at max size.

The chunker runs inside the ingestion pipeline after text extraction and PII scrubbing. It's not exposed via API.