Knowledge & RAG
Ingestion Pipeline
Uploaded documents flow through a pipeline that extracts text, scrubs PII, chunks content, generates embeddings, and creates synthetic Q&A pairs.
Pipeline Stages
Upload → Parse → PII Scrub → Chunk → Embed → Synthetic Q&A → Store
1. Upload
Upload files via POST /api/v1/knowledge/upload (multipart), or create from a URL with POST /api/v1/knowledge/documents. The file lands in KB_UPLOAD_DIR and a Document record is created with status pending.
2. Queue
A BullMQ job is enqueued to knowledge-ingest for async processing:
- Concurrency: 1 (sequential)
- Retries with exponential backoff
- Graceful shutdown on SIGTERM
3. Text Extraction
Handled by src/knowledge/parsers.ts:
| Type | Parser | Notes |
|---|---|---|
pdf-parse | Full text extraction | |
| DOCX | mammoth | Converts to plain text |
| TXT | Direct file read | UTF-8 |
| URL | HTTP fetch + HTML strip | Removes tags |
| Image | OpenAI Vision API | OCR via gpt-4o |
4. PII Scrubbing (One-Way)
Before chunking, all PII is irreversibly masked. Raw personal data never lands in the knowledge base.
Detected patterns:
- Email addresses
- Phone numbers
- Credit card numbers
- IP addresses
- IBAN numbers
- Monetary amounts
Replaced with [REDACTED_EMAIL], [REDACTED_PHONE], etc.
See PII Scrubber for details.
5. Chunking
Text is split into chunks using the legal-document-aware chunker.
- Default max chunk size: 2000 characters
- Overlap between chunks for context preservation
- Ukrainian legal structure detection (Розділ, Стаття, п.)
- Section path tracking for hierarchical context
See Semantic Chunker for details.
6. Embedding
Each chunk is embedded using OpenAI text-embedding-3-small:
- Vector dimension: 1536
- Batch size: 32 chunks per API call
- Stored in PostgreSQL via pgvector
7. Synthetic Q&A Generation
For each chunk, gpt-4o-mini generates 3 synthetic questions that the chunk could answer. These questions are:
- embedded separately (1536-dim vectors);
- stored in the
ChunkQuestiontable; - used as an extra retrieval path during RAG queries.
8. Completion
On success:
- Document status updated to
ready - Metadata updated with
chunkCount,textLength - Temporary upload file deleted
On failure:
- Document status updated to
failed - Error stored in
metadata.lastError - BullMQ retries if attempts remaining
Monitoring
Track ingestion status via:
GET /api/v1/knowledge/documents?status=processing— in-progress jobsGET /api/v1/knowledge/documents?status=failed— failed documents- Langfuse traces for
ingestion.synthetic-qaspans
RAG Pipeline
The RAG pipeline combines hybrid search (vector + keyword) with LLM generation to produce source-grounded replies. The shared agent-tasks worker calls it on every inbound message; the RAG draft endpoint calls it for manual tests.
Architecture
Query → Injection Guard → Embed → Hybrid Retrieval → Rank → Context Assembly → LLM → PII Restore
Implementation: src/knowledge/rag.ts (OpenAiRagPipeline)
Retrieval Strategy
Hybrid Search
Three retrieval paths are combined:
| Path | Default Weight | Candidates | Description |
|---|---|---|---|
| Chunk vectors | 85% of vector budget | Top 18 | Cosine similarity on chunk embeddings |
| Question vectors | 15% of vector budget | Top 18 | Cosine similarity on synthetic Q&A embeddings |
| Keyword search | 35% (default keyword weight) | Top 18 | PostgreSQL full-text search on chunk content |
The default split is: vector search 65% vs keyword search 35% (configurable per knowledge base via config.ragWeights using vector and keyword fields). Within the vector budget, 85% goes to chunk-content embeddings and 15% to synthetic Q&A embeddings.
Scoring
Each candidate receives a hybrid score:
score = (vector × 0.85) × vecScore + (vector × 0.15) × qScore + keyword × keyScore
Scores are normalized per-path before combining.
Top-K Selection
After ranking, the top 6 chunks are picked. A max context length cap keeps the prompt within LLM token limits.
Context Assembly
Selected chunks are formatted with:
- Document title
- Section path (from chunker metadata)
- Chunk content
LLM Generation
| Parameter | Value |
|---|---|
| Model | OPENAI_MODEL (default: gpt-4o) |
| Temperature | 0.2 |
| Max tokens | 700 |
| Streaming | Supported |
The prompt includes:
- System prompt — from namespace persona configuration
- Employee profile — summary, current projects, preferences (if available)
- Conversation history — last 6 messages (normalized)
- RAG context — assembled chunks with source metadata
- User query
PII Handling
- PII in the user message is replaced with placeholders before the prompt goes to the LLM.
- Placeholders are stored encrypted in
PiiRedactionMap. - After generation, placeholders in the response are restored to the real values.
See PII Scrubber.
Injection Protection
Both the user message and the RAG context chunks are checked for prompt-injection patterns before assembly. If detected:
- the query is flagged (
injectionDetected: true); - a reason is included (
injectionReason); - the message routes to HITL regardless of trust status.
See Injection Guard.
Testing
Use POST /api/v1/rag/draft to test RAG queries without sending to any channel:
curl -X POST http://localhost:3000/api/v1/rag/draft \
-H 'Authorization: Bearer <token>' \
-H 'Content-Type: application/json' \
-d '{"departmentId": "<department-id>", "text": "What is the leave policy?"}'
Configuration per Knowledge Base
RAG weights can be tuned per knowledge base via config.ragWeights:
{
"ragWeights": {
"vector": 0.65,
"keyword": 0.35
}
}
See RAG Weights.
Semantic Chunker
The chunker splits documents into context-preserving chunks and knows about legal document structure. It's tuned for Ukrainian legal texts.
Implementation
src/knowledge/chunker.ts — splitLegalTextIntoChunks()
Legal Structure Detection
The chunker recognizes Ukrainian legal document headings:
| Pattern | Example | Level |
|---|---|---|
| Розділ N / Розділ I | Розділ 3. Оплата | Section |
| Стаття N / Стаття N.N | Стаття 15. Відповідальність | Article |
| Пункт N.N / п. N | п. 5 | Clause |
| N.N. (standalone numbered) | 1.1. Загальні положення | Numbered |
| N) | 1) перший варіант | List item |
| а) / б) (Cyrillic) | а) перша умова | Sub-item |
Chunking Strategy
- Split by headings — legal headings are natural chunk boundaries.
- Section path tracking — each chunk carries a breadcrumb path (e.g.
["Розділ 3", "Стаття 15", "п. 2"]). - Size enforcement — chunks above
maxChunkSize(default 2000 chars) split at paragraph or sentence boundaries. - Overlap — configurable overlap between consecutive chunks preserves context across boundaries.
- Metadata — each chunk stores
chunkIndex,startChar,endChar,sectionPath.
Fallback
When no legal structure is detected, the chunker falls back to:
- paragraph split (double newline);
- sentence split (period + space);
- hard split at max size.
The chunker runs inside the ingestion pipeline after text extraction and PII scrubbing. It's not exposed via API.