Architecture
Copy
Ask AI
┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Document │ ──▶ │ OCR │ ──▶ │ Embeddings │ ──▶ │ Search │
│ Upload │ │ Processing │ │ Generation │ │ Index │
└──────────────┘ └──────────────┘ └──────────────┘ └──────────────┘
Prerequisites
- Case.dev API key
- Node.js 18+ or Python 3.9+
- Documents to process (PDFs, images, Word docs)
Step 1: Create a vault
Copy
Ask AI
import Casedev from 'casedev';
import fs from 'fs';
import path from 'path';
const client = new Casedev({ apiKey: process.env.CASEDEV_API_KEY });
async function createDiscoveryPipeline(matterId: string, documentsDir: string) {
// 1. Create a vault for this matter
const vault = await client.vault.create({
name: `Matter ${matterId} - Discovery`,
description: 'Documents received from opposing counsel'
});
console.log(`✅ Created vault: ${vault.id}`);
return vault;
}
Step 2: Batch upload documents
Copy
Ask AI
async function uploadDocuments(vaultId: string, documentsDir: string) {
const files = fs.readdirSync(documentsDir);
const results = [];
for (const file of files) {
const filePath = path.join(documentsDir, file);
const stat = fs.statSync(filePath);
if (!stat.isFile()) continue;
// Get content type
const ext = path.extname(file).toLowerCase();
const contentTypes: Record<string, string> = {
'.pdf': 'application/pdf',
'.doc': 'application/msword',
'.docx': 'application/vnd.openxmlformats-officedocument.wordprocessingml.document',
'.jpg': 'image/jpeg',
'.jpeg': 'image/jpeg',
'.png': 'image/png',
'.tiff': 'image/tiff',
'.txt': 'text/plain',
};
const contentType = contentTypes[ext] || 'application/octet-stream';
// Get presigned upload URL
const upload = await client.vault.upload(vaultId, {
filename: file,
contentType,
metadata: {
source: 'discovery',
matter_id: matterId,
original_path: filePath,
}
});
// Upload file to S3
const fileBuffer = fs.readFileSync(filePath);
await fetch(upload.uploadUrl, {
method: 'PUT',
headers: { 'Content-Type': contentType },
body: fileBuffer
});
console.log(`📄 Uploaded: ${file}`);
results.push({ file, objectId: upload.objectId });
}
return results;
}
Step 3: Trigger ingestion
Ingestion runs OCR (if needed) and generates embeddings for search.Copy
Ask AI
async function ingestDocuments(vaultId: string, uploads: { file: string; objectId: string }[]) {
const jobs = [];
for (const { file, objectId } of uploads) {
// Trigger ingestion (OCR + embeddings)
const job = await client.vault.ingest(vaultId, objectId);
jobs.push({ file, jobId: job.id });
console.log(`🔄 Ingesting: ${file}`);
}
// Wait for all jobs to complete
for (const { file, jobId } of jobs) {
let status = 'processing';
while (status === 'processing' || status === 'pending') {
await new Promise(r => setTimeout(r, 5000));
const job = await client.vault.getIngestStatus(vaultId, jobId);
status = job.status;
}
console.log(`✅ Ingested: ${file}`);
}
}
Step 4: Search your documents
Copy
Ask AI
async function searchDiscovery(vaultId: string, query: string) {
const results = await client.vault.search(vaultId, {
query,
method: 'hybrid', // Combines semantic + keyword
topK: 10
});
console.log(`\n🔍 Results for: "${query}"\n`);
for (const chunk of results.chunks) {
console.log(`📄 ${chunk.filename} (page ${chunk.page})`);
console.log(` Score: ${chunk.hybridScore.toFixed(2)}`);
console.log(` "${chunk.text.substring(0, 200)}..."\n`);
}
return results;
}
Complete example
Copy
Ask AI
import Casedev from 'casedev';
import fs from 'fs';
import path from 'path';
const client = new Casedev({ apiKey: process.env.CASEDEV_API_KEY });
async function main() {
const matterId = '2024-1234';
const documentsDir = './discovery_dump';
// 1. Create vault
const vault = await client.vault.create({
name: `Matter ${matterId} - Discovery`,
description: 'Documents from opposing counsel'
});
// 2. Upload all documents
const files = fs.readdirSync(documentsDir);
for (const file of files) {
const filePath = path.join(documentsDir, file);
if (!fs.statSync(filePath).isFile()) continue;
const upload = await client.vault.upload(vault.id, {
filename: file,
contentType: 'application/pdf'
});
await fetch(upload.uploadUrl, {
method: 'PUT',
body: fs.readFileSync(filePath)
});
await client.vault.ingest(vault.id, upload.objectId);
console.log(`✅ ${file}`);
}
// 3. Search
const results = await client.vault.search(vault.id, {
query: 'evidence of safety violations in 2023',
method: 'hybrid'
});
console.log(results.chunks);
}
main();
Production tip: For large document sets (1000+), use parallel uploads with a concurrency limit of 10-20 to maximize throughput while avoiding rate limits.
Cost estimate
| Component | Cost |
|---|---|
| Storage | $0.023/GB/month |
| OCR | $0.01/page |
| Embeddings | $0.0001/1K tokens |
| Search | $0.001/query |