Skip to main content
The problem: Opposing counsel sent you 500 pages of blurry photocopies. You need to search them, but they’re just images. The solution: Run OCR to extract text, then search or analyze with AI.

1. Submit for OCR

import Casedev from 'casedev';

const client = new Casedev({ apiKey: process.env.CASEDEV_API_KEY });

// Process a document uploaded by your user
const job = await client.ocr.v1.process({
  document_url: documentUrl, // URL from your user's upload
  engine: 'doctr',  // Fast, good for printed text
  features: {
    embed: {}  // Generate searchable PDF
  }
});

console.log(`OCR job started: ${job.id}`);

2. Wait for completion

OCR runs asynchronously. Poll for status or use webhooks to notify your users:
// Poll for completion
let result = await client.ocr.v1.retrieve(job.id);

while (result.status === 'processing' || result.status === 'pending') {
  console.log(`Status: ${result.status} (${result.chunks_completed}/${result.chunk_count} pages)`);
  await new Promise(r => setTimeout(r, 5000));
  result = await client.ocr.v1.retrieve(job.id);
}

if (result.status === 'completed') {
  console.log(`✅ OCR complete! ${result.page_count} pages processed.`);
  console.log(`Confidence: ${(result.confidence * 100).toFixed(1)}%`);
}

3. Download results

Provide extracted text, structured data, or a searchable PDF:
// Download plain text for your user
const text = await client.ocr.v1.download(job.id, 'text');

// Download searchable PDF (original with invisible text layer)
const pdf = await client.ocr.v1.download(job.id, 'pdf');
fs.writeFileSync('searchable-document.pdf', Buffer.from(pdf));

// Download structured JSON (with word coordinates for highlighting)
const json = await client.ocr.v1.download(job.id, 'json');
console.log(`Extracted ${json.pages.length} pages`);

4. Analyze with AI

Enhance your feature with automatic data extraction:
// Extract key information for your user
const analysis = await client.llm.v1.chat.createCompletion({
  model: 'anthropic/claude-sonnet-4.5',
  messages: [
    {
      role: 'system',
      content: 'Extract key dates, parties, and claims from this document. Format as JSON.'
    },
    {
      role: 'user',
      content: text
    }
  ],
  temperature: 0  // Deterministic for factual extraction
});

// Return structured data to your user
console.log(analysis.choices[0].message.content);

OCR engines

Choose the right engine based on your users’ document types:
EngineBest forSpeed
doctrClean printed textFast
tesseractMixed print/handwritingMedium
paddleTables, forms, complex layoutsSlower
Recommendation: Start with doctr for most use cases. Switch to paddle if your users need table extraction or have complex document layouts.