Skip to main content

Cold Start

Import Contracts

Bulk import contracts and documents into a knowledge base.

Usage

npx tsx scripts/import-contracts.ts

What It Does

  1. Scans a directory for contract files (PDF, DOCX, TXT).
  2. Creates a knowledge base if none exists.
  3. Uploads each file with metadata.
  4. Enqueues ingestion jobs.

Workflow

import-contracts.ts → [ingestion pipeline processes files] → knowledge base ready

Monitor document status:

GET /api/v1/knowledge/documents?status=processing

Analyze Chats

Analyze chat exports to discover intent patterns for seeding the intent classifier.

Usage

npx tsx scripts/analyze-chats.ts

What It Does

  1. Parses WhatsApp or Telegram chat exports.
  2. Clusters similar messages by content.
  3. Finds recurring question patterns.
  4. Builds an intent taxonomy.
  5. Writes JSON that seed-intents.ts consumes.

Workflow

First step in the cold-start pipeline:

analyze-chats.ts → [produces intent JSON] → seed-intents.ts → intent examples ready

Output

Intent records with:

  • intentName — discovered intent identifier
  • phrases — example phrases for each intent
  • count — frequency in the chat history

Seed Intents

Bulk seed intent examples with embeddings from analysis output.

Usage

npx tsx scripts/seed-intents.ts --input ./analysis.json --namespace legal

Arguments

FlagRequiredDescription
--inputyesPath to JSON file with intent records
--namespaceyesTarget namespace name

Input Format

[
{
"intentName": "leave_policy",
"phrases": ["How many vacation days do I have?", "What is the leave policy?"]
}
]

What It Does

  1. Reads the JSON file with intent records.
  2. Deduplicates by (intentName, phrase) — skips existing examples.
  3. Generates embeddings for each phrase (batch size 32).
  4. Upserts IntentExample rows with embedding vectors.
  5. Shows a progress bar while running.

Workflow

Second step in the cold-start pipeline:

analyze-chats.ts → [produces intent JSON] → seed-intents.ts → intent classification ready

After seeding, the intent classifier uses these examples for vector-based classification of incoming messages.