Knowledge & RAG

Ingestion Pipeline

Uploaded documents flow through a pipeline that extracts text, scrubs PII, chunks content, generates embeddings, and creates synthetic Q&A pairs.

Pipeline Stages

Upload → Parse → PII Scrub → Chunk → Embed → Synthetic Q&A → Store

1. Upload

Upload files via POST /api/v1/knowledge/upload (multipart), or create from a URL with POST /api/v1/knowledge/documents. The file lands in KB_UPLOAD_DIR and a Document record is created with status pending.

2. Queue

A BullMQ job is enqueued to knowledge-ingest for async processing:

Concurrency: 1 (sequential)
Retries with exponential backoff
Graceful shutdown on SIGTERM

3. Text Extraction

Handled by src/knowledge/parsers.ts:

Type	Parser	Notes
PDF	`pdf-parse`	Full text extraction
DOCX	`mammoth`	Converts to plain text
TXT	Direct file read	UTF-8
URL	HTTP fetch + HTML strip	Removes tags
Image	OpenAI Vision API	OCR via `gpt-4o`

4. PII Scrubbing (One-Way)

Before chunking, all PII is irreversibly masked. Raw personal data never lands in the knowledge base.

Detected patterns:

Email addresses
Phone numbers
Credit card numbers
IP addresses
IBAN numbers
Monetary amounts

Replaced with [REDACTED_EMAIL], [REDACTED_PHONE], etc.

See PII Scrubber for details.

5. Chunking

Text is split into chunks using the legal-document-aware chunker.

Default max chunk size: 2000 characters
Overlap between chunks for context preservation
Ukrainian legal structure detection (Розділ, Стаття, п.)
Section path tracking for hierarchical context

See Semantic Chunker for details.

6. Embedding

Each chunk is embedded using OpenAI text-embedding-3-small:

Vector dimension: 1536
Batch size: 32 chunks per API call
Stored in PostgreSQL via pgvector

7. Synthetic Q&A Generation

For each chunk, gpt-4o-mini generates 3 synthetic questions that the chunk could answer. These questions are:

embedded separately (1536-dim vectors);
stored in the ChunkQuestion table;
used as an extra retrieval path during RAG queries.

8. Completion

On success:

Document status updated to ready
Metadata updated with chunkCount, textLength
Temporary upload file deleted

On failure:

Document status updated to failed
Error stored in metadata.lastError
BullMQ retries if attempts remaining

Monitoring

Track ingestion status via:

GET /api/v1/knowledge/documents?status=processing — in-progress jobs
GET /api/v1/knowledge/documents?status=failed — failed documents
Langfuse traces for ingestion.synthetic-qa spans

RAG Pipeline

The RAG pipeline combines hybrid search (vector + keyword) with LLM generation to produce source-grounded replies. The shared agent-tasks worker calls it on every inbound message; the RAG draft endpoint calls it for manual tests.

Architecture

Query → Injection Guard → Embed → Hybrid Retrieval → Rank → Context Assembly → LLM → PII Restore

Implementation: src/knowledge/rag.ts (OpenAiRagPipeline)

Retrieval Strategy

Hybrid Search

Three retrieval paths are combined:

Path	Default Weight	Candidates	Description
Chunk vectors	85% of vector budget	Top 18	Cosine similarity on chunk embeddings
Question vectors	15% of vector budget	Top 18	Cosine similarity on synthetic Q&A embeddings
Keyword search	35% (default `keyword` weight)	Top 18	PostgreSQL full-text search on chunk content

The default split is: vector search 65% vs keyword search 35% (configurable per knowledge base via config.ragWeights using vector and keyword fields). Within the vector budget, 85% goes to chunk-content embeddings and 15% to synthetic Q&A embeddings.

Scoring

Each candidate receives a hybrid score:

score = (vector × 0.85) × vecScore + (vector × 0.15) × qScore + keyword × keyScore

Scores are normalized per-path before combining.

Top-K Selection

After ranking, the top 6 chunks are picked. A max context length cap keeps the prompt within LLM token limits.

Context Assembly

Selected chunks are formatted with:

Document title
Section path (from chunker metadata)
Chunk content

LLM Generation

Parameter	Value
Model	`OPENAI_MODEL` (default: `gpt-4o`)
Temperature	0.2
Max tokens	700
Streaming	Supported

The prompt includes:

System prompt — from namespace persona configuration
Employee profile — summary, current projects, preferences (if available)
Conversation history — last 6 messages (normalized)
RAG context — assembled chunks with source metadata
User query

PII Handling

PII in the user message is replaced with placeholders before the prompt goes to the LLM.
Placeholders are stored encrypted in PiiRedactionMap.
After generation, placeholders in the response are restored to the real values.

See PII Scrubber.

Injection Protection

Both the user message and the RAG context chunks are checked for prompt-injection patterns before assembly. If detected:

the query is flagged (injectionDetected: true);
a reason is included (injectionReason);
the message routes to HITL regardless of trust status.

See Injection Guard.

Testing

Use POST /api/v1/rag/draft to test RAG queries without sending to any channel:

curl -X POST http://localhost:3000/api/v1/rag/draft \
  -H 'Authorization: Bearer <token>' \
  -H 'Content-Type: application/json' \
  -d '{"departmentId": "<department-id>", "text": "What is the leave policy?"}'

Configuration per Knowledge Base

RAG weights can be tuned per knowledge base via config.ragWeights:

{
  "ragWeights": {
    "vector": 0.65,
    "keyword": 0.35
  }
}

See RAG Weights.

Semantic Chunker

The chunker splits documents into context-preserving chunks and knows about legal document structure. It's tuned for Ukrainian legal texts.

Implementation

src/knowledge/chunker.ts — splitLegalTextIntoChunks()

Legal Structure Detection

The chunker recognizes Ukrainian legal document headings:

Pattern	Example	Level
Розділ N / Розділ I	`Розділ 3. Оплата`	Section
Стаття N / Стаття N.N	`Стаття 15. Відповідальність`	Article
Пункт N.N / п. N	`п. 5`	Clause
N.N. (standalone numbered)	`1.1. Загальні положення`	Numbered
N)	`1) перший варіант`	List item
а) / б) (Cyrillic)	`а) перша умова`	Sub-item

Chunking Strategy

Split by headings — legal headings are natural chunk boundaries.
Section path tracking — each chunk carries a breadcrumb path (e.g. ["Розділ 3", "Стаття 15", "п. 2"]).
Size enforcement — chunks above maxChunkSize (default 2000 chars) split at paragraph or sentence boundaries.
Overlap — configurable overlap between consecutive chunks preserves context across boundaries.
Metadata — each chunk stores chunkIndex, startChar, endChar, sectionPath.

Fallback

When no legal structure is detected, the chunker falls back to:

paragraph split (double newline);
sentence split (period + space);
hard split at max size.

The chunker runs inside the ingestion pipeline after text extraction and PII scrubbing. It's not exposed via API.

Knowledge & RAG

Ingestion Pipeline​

Pipeline Stages​

1. Upload​

2. Queue​

3. Text Extraction​

4. PII Scrubbing (One-Way)​

5. Chunking​

6. Embedding​

7. Synthetic Q&A Generation​

8. Completion​

Monitoring​

RAG Pipeline​

Architecture​

Retrieval Strategy​

Hybrid Search​

Scoring​

Top-K Selection​

Context Assembly​

LLM Generation​

PII Handling​

Injection Protection​

Testing​

Configuration per Knowledge Base​

Semantic Chunker​

Implementation​

Legal Structure Detection​

Chunking Strategy​

Fallback​