The problem: Opposing counsel sent you 500 pages of blurry photocopies. You need to search them, but they’re just images.
The solution: Run OCR to extract text, then search or analyze with AI.
1. Submit for OCR
cURL
CLI
Typescript
Python
Go
curl -X POST https://api.case.dev/ocr/v1/process \
-H "Authorization: Bearer $CASEDEV_API_KEY " \
-H "Content-Type: application/json" \
-d '{
"document_url": "https://your-storage.com/user-upload.pdf",
"engine": "doctr",
"features": {"embed": {}}
}'
2. Wait for completion
OCR runs asynchronously. Poll for status or use webhooks to notify your users:
cURL
CLI
Typescript
Python
Go
# Poll for status
curl "https://api.case.dev/ocr/v1/ $JOB_ID " \
-H "Authorization: Bearer $CASEDEV_API_KEY "
3. Download results
Provide extracted text, structured data, or a searchable PDF:
cURL
CLI
Typescript
Python
Go
# Download text
curl "https://api.case.dev/ocr/v1/ $JOB_ID /download/text" \
-H "Authorization: Bearer $CASEDEV_API_KEY " \
-o extracted.txt
# Download searchable PDF
curl "https://api.case.dev/ocr/v1/ $JOB_ID /download/pdf" \
-H "Authorization: Bearer $CASEDEV_API_KEY " \
-o searchable.pdf
4. Analyze with AI
Enhance your feature with automatic data extraction:
cURL
CLI
Typescript
Python
Go
curl -X POST https://api.case.dev/llm/v1/chat/completions \
-H "Authorization: Bearer $CASEDEV_API_KEY " \
-H "Content-Type: application/json" \
-d '{
"model": "anthropic/claude-sonnet-4.5",
"messages": [
{"role": "system", "content": "Extract key dates, parties, and claims. Format as JSON."},
{"role": "user", "content": "[OCR TEXT]"}
],
"temperature": 0
}'
OCR engines
Choose the right engine based on your users’ document types:
Engine Best for Speed doctrClean printed text Fast paddleocrTables, forms, complex layouts Slower
Recommendation: Start with doctr for most use cases. Switch to paddleocr if your users need table extraction or have complex document layouts.