Submit a document for OCR. We extract text, detect tables, and optionally generate a searchable PDF. Processing is async — you get a job ID immediately, then poll for results or use webhooks.
curl -X POST https://api.case.dev/ocr/v1/process \
-H "Authorization: Bearer sk_case_YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"document_url": "https://storage.example.com/scanned-deposition.pdf"
}'
{
"id": "1f4a195e-026b-41ff-b367-c61089f5f367",
"status": "pending",
"document_url": "https://storage.example.com/scanned-deposition.pdf",
"engine": "doctr",
"created_at": "2025-11-04T09:30:12Z",
"links": {
"self": "https://api.case.dev/ocr/v1/1f4a195e-026b-41ff-b367-c61089f5f367",
"text": "https://api.case.dev/ocr/v1/1f4a195e-026b-41ff-b367-c61089f5f367/download/text",
"json": "https://api.case.dev/ocr/v1/1f4a195e-026b-41ff-b367-c61089f5f367/download/json"
}
}
Parameters
Required
| Parameter | Type | Description |
|---|
document_url | string | URL to your document. HTTP/HTTPS or s3:// |
Optional
| Parameter | Type | Default | Description |
|---|
document_id | string | auto-generated | Your internal reference ID |
engine | string | doctr | OCR engine (see below) |
callback_url | string | — | Webhook URL for completion notification |
features | object | {} | Additional processing options |
OCR engines
| Engine | Best for | Speed |
|---|
doctr | Clean printed text, typed documents | Fast |
paddleocr | Tables, forms, complex layouts, handwriting | Medium |
For legal documents: Start with doctr. If you’re getting poor results on forms or tables, try paddleocr.
Features
Enable additional processing:
{
"features": {
"embed": {}, // Generate searchable PDF
"tables": { // Extract tables as CSV
"format": "csv"
}
}
}
Checking status
Poll the job to check if processing is complete:
const result = await client.ocr.v1.retrieve(job.id);
if (result.status === 'completed') {
// Download the extracted text
const text = await client.ocr.v1.download(job.id, 'text');
console.log(text);
}
Using webhooks
For large documents, use webhooks instead of polling:
const job = await client.ocr.v1.process({
document_url: 'https://storage.example.com/500-page-discovery.pdf',
callback_url: 'https://your-app.com/api/ocr-complete'
});
We POST the completed job to your callback URL when processing finishes.
S3 URLs
If your document is in S3, use an s3:// URL:
const job = await client.ocr.v1.process({
document_url: 's3://your-bucket/documents/deposition.pdf'
});
We automatically generate a presigned URL to access the file.
Examples
Scanned deposition
const job = await client.ocr.v1.process({
document_url: 'https://storage.example.com/deposition-smith.pdf',
document_id: 'smith-depo-2024',
engine: 'doctr',
features: { embed: {} } // Generate searchable PDF
});
Medical records with tables
const job = await client.ocr.v1.process({
document_url: 'https://storage.example.com/patient-records.pdf',
engine: 'paddleocr', // Better for tables and forms
features: {
tables: { format: 'csv' },
embed: {}
},
callback_url: 'https://your-app.com/webhooks/ocr'
});
Handwritten notes
const job = await client.ocr.v1.process({
document_url: 'https://storage.example.com/witness-notes.jpg',
engine: 'paddleocr' // Better for handwriting
});