What You’ll Build
An intelligent agent that can:- Store documents in an encrypted vault with automatic OCR and embedding generation
- Answer questions by retrieving relevant document chunks and synthesizing responses
- Add knowledge dynamically as users provide new information
- Cite sources with page numbers and document references
Why RAG?
Large Language Models are powerful, but they can only reason on their training data. RAG solves this by:- Embedding your documents into a searchable vector space
- Retrieving relevant chunks when a user asks a question
- Augmenting the LLM’s context with those chunks
- Generating an accurate, grounded response
Architecture
Prerequisites
- Case.dev API key (get one here)
- Node.js 18+ or Python 3.9+
- Vercel AI SDK (optional, for streaming UI)
Project Setup
Step 1: Install dependencies
Step 2: Set up environment variables
Environment
Step 3: Create a vault for your knowledge base
Core Functions
1. Add Documents to Knowledge Base
When a user uploads a document or provides information, store it in the vault:2. Retrieve Relevant Information
Search the knowledge base for content relevant to a user’s question:3. Generate Responses with Context
Use the LLM Gateway to generate responses grounded in your documents:Building the Agent with Tools
For a more sophisticated agent that can decide when to search vs. add knowledge, use tool calling:Integration with Vercel AI SDK
For Next.js applications, integrate with the Vercel AI SDK for streaming responses:TypeScript
Example Usage
TypeScript
Best Practices
Chunking is automatic. Case.dev Vaults automatically chunk documents into semantic segments optimized for retrieval. You don’t need to implement chunking yourself.
1. Use hybrid search
Combine semantic and keyword search for best results:TypeScript
2. Set appropriate temperature
Use low temperature for factual retrieval:TypeScript
3. Structure your prompts
Be explicit about using only provided context:TypeScript
4. Handle no results gracefully
TypeScript
Cost Estimate
| Component | Cost |
|---|---|
| Document storage | $0.023/GB/month |
| OCR processing | $0.01/page |
| Embedding generation | Included with ingestion |
| Semantic search | $0.001/query |
| LLM (Claude Sonnet) | 15 per 1M tokens |
Next Steps
- Add file upload support for PDFs and images
- Implement conversation memory with Vercel AI SDK
- Add citations with page links to your responses
- Scale with batch processing for large document sets