Skip to main content

What You’ll Build

An intelligent agent that can:
  • Store documents in an encrypted vault with automatic OCR and embedding generation
  • Answer questions by retrieving relevant document chunks and synthesizing responses
  • Add knowledge dynamically as users provide new information
  • Cite sources with page numbers and document references

Why RAG?

Large Language Models are powerful, but they can only reason on their training data. RAG solves this by:
  1. Embedding your documents into a searchable vector space
  2. Retrieving relevant chunks when a user asks a question
  3. Augmenting the LLM’s context with those chunks
  4. Generating an accurate, grounded response
With Case.dev, you don’t need to manage embeddings, vector databases, or chunking strategies — Vaults handle all of this automatically.

Architecture

┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│   User Query    │ ──▶ │  Vault Search   │ ──▶ │   LLM Gateway   │
│                 │     │  (Semantic)     │     │   + Context     │
└─────────────────┘     └─────────────────┘     └─────────────────┘

┌─────────────────┐                                     ▼
│  New Document   │ ──▶ Vault Upload ──▶ Auto-embed ──▶ Ready
└─────────────────┘

Prerequisites

  • Case.dev API key (get one here)
  • Node.js 18+ or Python 3.9+
  • Vercel AI SDK (optional, for streaming UI)

Project Setup

Step 1: Install dependencies

npm install casedev ai zod

Step 2: Set up environment variables

Environment
CASEDEV_API_KEY=sk_case_your_api_key

Step 3: Create a vault for your knowledge base

import Casedev from 'casedev';

const client = new Casedev({ apiKey: process.env.CASEDEV_API_KEY });

// Create a vault to store your knowledge base
const vault = await client.vault.create({
  name: 'Knowledge Base',
  description: 'Document intelligence agent knowledge store'
});

console.log(`Vault created: ${vault.id}`);
// Save this ID - you'll need it for queries

Core Functions

1. Add Documents to Knowledge Base

When a user uploads a document or provides information, store it in the vault:
import Casedev from 'casedev';

const client = new Casedev({ apiKey: process.env.CASEDEV_API_KEY });
const VAULT_ID = process.env.VAULT_ID;

/**
 * Add a document or text to the knowledge base
 */
async function addToKnowledgeBase(content: string, metadata?: Record<string, string>) {
  // For text content, create a text file
  const blob = new Blob([content], { type: 'text/plain' });
  const filename = `knowledge-${Date.now()}.txt`;
  
  // Get upload URL
  const upload = await client.vault.upload(VAULT_ID, {
    filename,
    contentType: 'text/plain',
    metadata: {
      source: 'user-input',
      timestamp: new Date().toISOString(),
      ...metadata
    }
  });
  
  // Upload content
  await fetch(upload.uploadUrl, {
    method: 'PUT',
    headers: { 'Content-Type': 'text/plain' },
    body: content
  });
  
  // Trigger ingestion (generates embeddings automatically)
  await client.vault.ingest(VAULT_ID, upload.objectId);
  
  return { objectId: upload.objectId, filename };
}

/**
 * Add a file (PDF, Word, image) to the knowledge base
 */
async function addFileToKnowledgeBase(
  file: Buffer, 
  filename: string, 
  contentType: string
) {
  const upload = await client.vault.upload(VAULT_ID, {
    filename,
    contentType,
    metadata: {
      source: 'file-upload',
      timestamp: new Date().toISOString()
    }
  });
  
  await fetch(upload.uploadUrl, {
    method: 'PUT',
    headers: { 'Content-Type': contentType },
    body: file
  });
  
  // Ingestion handles OCR (if needed) and embedding generation
  const job = await client.vault.ingest(VAULT_ID, upload.objectId);
  
  return { objectId: upload.objectId, jobId: job.id };
}

2. Retrieve Relevant Information

Search the knowledge base for content relevant to a user’s question:
/**
 * Find relevant content from the knowledge base
 */
async function findRelevantContent(query: string, topK: number = 5) {
  const results = await client.vault.search(VAULT_ID, {
    query,
    method: 'hybrid',  // Combines semantic + keyword search
    topK
  });
  
  // Format results for LLM context
  return results.chunks.map(chunk => ({
    content: chunk.text,
    source: chunk.filename,
    page: chunk.page,
    score: chunk.hybridScore
  }));
}

3. Generate Responses with Context

Use the LLM Gateway to generate responses grounded in your documents:
/**
 * Answer a question using the knowledge base
 */
async function answerQuestion(question: string) {
  // 1. Retrieve relevant context
  const relevantContent = await findRelevantContent(question);
  
  if (relevantContent.length === 0) {
    return {
      answer: "I don't have any information about that in my knowledge base.",
      sources: []
    };
  }
  
  // 2. Format context for the LLM
  const context = relevantContent
    .map((c, i) => `[${i + 1}] ${c.content} (Source: ${c.source}, Page ${c.page})`)
    .join('\n\n');
  
  // 3. Generate response with LLM
  const response = await client.llm.v1.chat.createCompletion({
    model: 'anthropic/claude-sonnet-4.5',
    messages: [
      {
        role: 'system',
        content: `You are a helpful assistant that answers questions based on the provided context.
        
Rules:
- Only use information from the provided context
- Cite sources using [1], [2], etc.
- If the context doesn't contain relevant information, say so
- Be concise and accurate`
      },
      {
        role: 'user',
        content: `Context:\n${context}\n\nQuestion: ${question}`
      }
    ],
    temperature: 0.3,
    max_tokens: 1000
  });
  
  return {
    answer: response.choices[0].message.content,
    sources: relevantContent.map(c => ({
      filename: c.source,
      page: c.page,
      excerpt: c.content.substring(0, 200) + '...'
    }))
  };
}

Building the Agent with Tools

For a more sophisticated agent that can decide when to search vs. add knowledge, use tool calling:
import { z } from 'zod';

// Define tools for the agent
const tools = {
  searchKnowledgeBase: {
    description: 'Search the knowledge base for information relevant to a question',
    parameters: z.object({
      query: z.string().describe('The search query')
    }),
    execute: async ({ query }) => {
      const results = await findRelevantContent(query);
      return results.length > 0 
        ? results 
        : 'No relevant information found in knowledge base.';
    }
  },
  
  addToKnowledgeBase: {
    description: 'Add new information to the knowledge base. Use this when the user provides facts or documents.',
    parameters: z.object({
      content: z.string().describe('The content to add'),
      topic: z.string().optional().describe('Topic or category')
    }),
    execute: async ({ content, topic }) => {
      const result = await addToKnowledgeBase(content, { topic });
      return `Added to knowledge base: ${result.filename}`;
    }
  }
};

/**
 * Run the agent with tool support
 */
async function runAgent(userMessage: string, conversationHistory: any[] = []) {
  const messages = [
    {
      role: 'system',
      content: `You are a helpful document intelligence assistant.

Your capabilities:
1. Search your knowledge base to answer questions
2. Add new information when users provide it

Always search the knowledge base before answering factual questions.
If you don't find relevant information, say so honestly.
When adding information, confirm what was added.`
    },
    ...conversationHistory,
    { role: 'user', content: userMessage }
  ];
  
  // First call - may request tool use
  let response = await client.llm.v1.chat.createCompletion({
    model: 'anthropic/claude-sonnet-4.5',
    messages,
    tools: Object.entries(tools).map(([name, tool]) => ({
      type: 'function',
      function: {
        name,
        description: tool.description,
        parameters: tool.parameters
      }
    })),
    tool_choice: 'auto'
  });
  
  // Handle tool calls
  while (response.choices[0].message.tool_calls) {
    const toolCalls = response.choices[0].message.tool_calls;
    
    // Execute each tool
    const toolResults = await Promise.all(
      toolCalls.map(async (call) => {
        const tool = tools[call.function.name];
        const args = JSON.parse(call.function.arguments);
        const result = await tool.execute(args);
        return {
          role: 'tool',
          tool_call_id: call.id,
          content: JSON.stringify(result)
        };
      })
    );
    
    // Continue conversation with tool results
    messages.push(response.choices[0].message);
    messages.push(...toolResults);
    
    response = await client.llm.v1.chat.createCompletion({
      model: 'anthropic/claude-sonnet-4.5',
      messages,
      tools: Object.entries(tools).map(([name, tool]) => ({
        type: 'function',
        function: {
          name,
          description: tool.description,
          parameters: tool.parameters
        }
      }))
    });
  }
  
  return response.choices[0].message.content;
}

Integration with Vercel AI SDK

For Next.js applications, integrate with the Vercel AI SDK for streaming responses:
TypeScript
// app/api/chat/route.ts
import { streamText, tool } from 'ai';
import { z } from 'zod';
import Casedev from 'casedev';

const client = new Casedev({ apiKey: process.env.CASEDEV_API_KEY });
const VAULT_ID = process.env.VAULT_ID;

export async function POST(req: Request) {
  const { messages } = await req.json();

  const result = streamText({
    model: 'anthropic/claude-sonnet-4.5',
    system: `You are a document intelligence assistant. 
Check your knowledge base before answering questions.
Only respond using information from tool calls.
If no relevant information is found, say "I don't have information about that."`,
    messages,
    maxSteps: 5,
    tools: {
      searchDocuments: tool({
        description: 'Search the document knowledge base',
        parameters: z.object({
          query: z.string().describe('The search query')
        }),
        execute: async ({ query }) => {
          const results = await client.vault.search(VAULT_ID, {
            query,
            method: 'hybrid',
            topK: 5
          });
          return results.chunks.map(c => ({
            text: c.text,
            source: c.filename,
            page: c.page
          }));
        }
      }),
      
      addDocument: tool({
        description: 'Add information to the knowledge base',
        parameters: z.object({
          content: z.string().describe('Content to add')
        }),
        execute: async ({ content }) => {
          const upload = await client.vault.upload(VAULT_ID, {
            filename: `note-${Date.now()}.txt`,
            contentType: 'text/plain'
          });
          await fetch(upload.uploadUrl, {
            method: 'PUT',
            body: content
          });
          await client.vault.ingest(VAULT_ID, upload.objectId);
          return 'Added to knowledge base successfully';
        }
      })
    }
  });

  return result.toDataStreamResponse();
}

Example Usage

TypeScript
// Add some knowledge
await addToKnowledgeBase(
  'The Smith v. Jones case was filed on March 15, 2024. The plaintiff alleges negligence in the maintenance of the property.',
  { topic: 'case-facts' }
);

await addToKnowledgeBase(
  'Deposition of John Smith on April 2, 2024: Witness stated he observed water damage on the ceiling two weeks before the incident.',
  { topic: 'depositions' }
);

// Ask questions
const result = await answerQuestion('When was the Smith v. Jones case filed?');
console.log(result.answer);
// "The Smith v. Jones case was filed on March 15, 2024 [1]."

const result2 = await answerQuestion('What did John Smith observe?');
console.log(result2.answer);
// "John Smith observed water damage on the ceiling two weeks before the incident [1]."

// Using the agent
const response = await runAgent('My favorite pizza topping is pepperoni. Remember that.');
console.log(response);
// "I've added that to my knowledge base. Your favorite pizza topping is pepperoni."

const response2 = await runAgent('What is my favorite pizza topping?');
console.log(response2);
// "According to my knowledge base, your favorite pizza topping is pepperoni."

Best Practices

Chunking is automatic. Case.dev Vaults automatically chunk documents into semantic segments optimized for retrieval. You don’t need to implement chunking yourself.
Combine semantic and keyword search for best results:
TypeScript
const results = await client.vault.search(VAULT_ID, {
  query: 'liability insurance coverage limits',
  method: 'hybrid',  // Best of both worlds
  topK: 10
});

2. Set appropriate temperature

Use low temperature for factual retrieval:
TypeScript
const response = await client.llm.v1.chat.createCompletion({
  model: 'anthropic/claude-sonnet-4.5',
  messages: [...],
  temperature: 0.2  // More deterministic for factual tasks
});

3. Structure your prompts

Be explicit about using only provided context:
TypeScript
const systemPrompt = `You are a legal research assistant.

Rules:
- ONLY use information from the provided context
- If information is not in the context, say "I don't have that information"
- Always cite sources using [1], [2], etc.
- Never make up or infer facts not explicitly stated`;

4. Handle no results gracefully

TypeScript
const results = await findRelevantContent(query);

if (results.length === 0 || results[0].score < 0.5) {
  return "I couldn't find relevant information in the knowledge base.";
}

Cost Estimate

ComponentCost
Document storage$0.023/GB/month
OCR processing$0.01/page
Embedding generationIncluded with ingestion
Semantic search$0.001/query
LLM (Claude Sonnet)3/3/15 per 1M tokens
Example: A knowledge base with 1,000 pages, 100 queries/day ≈ $15/month.

Next Steps