The most powerful use of Case.dev workflows is processing documents at scale. This guide shows you how to combine Vaults (document storage + search) with Workflows (automation pipelines) to build intelligent document processing systems.
The Document Processing Pattern
Every document workflow follows this pattern:
┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Upload to │ ──▶ │ Ingest │ ──▶ │ Search │ ──▶ │ Analyze │
│ Vault │ │ (OCR + │ │ (Find │ │ (LLM + │
│ │ │ Embed) │ │ Relevant) │ │ Format) │
└─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘
Key Concepts
Vaults as the Foundation
Vaults are the foundation layer for document workflows. A vault is:
- An encrypted, isolated document repository
- Automatic OCR + text extraction on upload
- Semantic search (vector + keyword hybrid)
- Optional knowledge graph (GraphRAG)
Rule of thumb: One vault per matter/case. This keeps search results focused and data isolated.
Workflow Steps for Documents
| Step Type | Action Type | Purpose |
|---|
| Upload | case-vault/vault method: upload | Get presigned URL to upload files |
| Ingest | case-vault/vault method: ingest | OCR + chunk + embed a document |
| Search | case-search/vault-search | Find relevant passages |
| Analyze | case-llm/llm | Summarize, extract, or analyze |
Complete Example: Document Analyzer
This workflow accepts a vault ID and query, searches the vault, and generates an analysis.
{
"name": "Document Analyzer",
"description": "Search vault documents and generate AI analysis",
"nodes": [
{
"id": "trigger",
"type": "trigger",
"label": "Webhook",
"config": { "triggerType": "Webhook" }
},
{
"id": "search",
"type": "action",
"label": "Vault Search",
"config": {
"actionType": "case-search/vault-search",
"vaultId": "{{vaultId}}",
"query": "{{query}}",
"limit": 10,
"searchMethod": "hybrid"
}
},
{
"id": "analyze",
"type": "action",
"label": "Analyze",
"config": {
"actionType": "case-llm/llm",
"method": "chat",
"model": "gpt-4o",
"systemPrompt": "You are a legal document analyst. Analyze the provided document excerpts and answer the user's question with specific citations.",
"userPrompt": "Based on these document excerpts:\n\n{{results.Vault_Search.output.chunks}}\n\nAnswer: {{query}}"
}
}
],
"edges": [
{ "source": "trigger", "target": "search" },
{ "source": "search", "target": "analyze" }
]
}
Deploy and Execute
# Create the workflow
curl -X POST https://api.case.dev/workflows/v1/create \
-H "Authorization: Bearer $CASEDEV_API_KEY" \
-H "Content-Type: application/json" \
-d @document-analyzer.json
# Response includes webhook URL and secret
# {
# "id": "wf_abc123",
# "webhookUrl": "https://api.case.dev/workflows/v1/wf_abc123/webhook",
# "webhookSecret": "whsec_..."
# }
# Execute with your vault
curl -X POST "https://api.case.dev/workflows/v1/wf_abc123/webhook" \
-H "X-Webhook-Secret: whsec_..." \
-H "Content-Type: application/json" \
-d '{
"vaultId": "zm03mnhgedsuzqmm30daa3hc",
"query": "What are the key facts about post-operative care?"
}'
Step-by-Step: Building Document Workflows
Step 1: Create a Vault
Before your workflow can search documents, you need a vault with indexed content.
# Create a vault
curl -X POST https://api.case.dev/vault \
-H "Authorization: Bearer $CASEDEV_API_KEY" \
-H "Content-Type: application/json" \
-d '{"name": "Case Documents 2024"}'
# Response: { "id": "zm03mnhgedsuzqmm30daa3hc", ... }
Step 2: Upload Documents
# Get presigned upload URL
curl -X POST https://api.case.dev/vault/VAULT_ID/upload \
-H "Authorization: Bearer $CASEDEV_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"filename": "deposition.pdf",
"contentType": "application/pdf"
}'
# Upload file to the presigned URL
curl -X PUT "$UPLOAD_URL" \
-H "Content-Type: application/pdf" \
--data-binary @deposition.pdf
Step 3: Ingest (Index) Documents
This is where the magic happens. Ingestion:
- Runs OCR on scanned PDFs
- Extracts text from digital PDFs
- Chunks text into searchable passages
- Generates embeddings for semantic search
# Trigger ingestion
curl -X POST https://api.case.dev/vault/VAULT_ID/ingest/OBJECT_ID \
-H "Authorization: Bearer $CASEDEV_API_KEY"
# Response: { "status": "processing", "workflowId": "wrun_..." }
# Poll for completion
curl https://api.case.dev/vault/VAULT_ID/objects/OBJECT_ID \
-H "Authorization: Bearer $CASEDEV_API_KEY"
# When complete: { "ingestionStatus": "completed", "chunkCount": 89 }
Ingestion is async. Documents must be fully ingested before they appear in search results. For workflows, use the case-vault/vault step with method: "ingest" which handles waiting automatically.
Step 4: Add Vault Search to Your Workflow
Now you can search your indexed documents:
{
"id": "search-step",
"type": "action",
"label": "Search Documents",
"config": {
"actionType": "case-search/vault-search",
"vaultId": "{{vaultId}}",
"query": "{{query}}",
"limit": 10,
"searchMethod": "hybrid"
}
}
The output includes matching text chunks with relevance scores:
{
"chunks": [
{
"text": "The witness testified that...",
"object_id": "abc123",
"hybridScore": 0.89,
"vectorScore": 0.92,
"bm25Score": 0.78
}
],
"sources": [
{ "id": "abc123", "filename": "deposition.pdf" }
]
}
Ingesting Documents in Workflows
You can also ingest documents as part of a workflow. This is useful for:
- Processing uploaded files automatically
- Building document pipelines that accept raw files
{
"id": "ingest-step",
"type": "action",
"label": "Ingest Document",
"config": {
"actionType": "case-vault/vault",
"method": "ingest",
"vaultId": "{{vaultId}}",
"objectId": "{{objectId}}"
}
}
Deployed workflows handle ingestion automatically. The workflow engine waits for ingestion to complete before proceeding to the next step. No polling required.
Template Reference
Access data from your trigger input and previous steps:
Trigger Input (Webhook Body)
{{vaultId}} → Vault ID from request body
{{query}} → Search query from request body
{{objectId}} → Document ID from request body
Previous Step Output
{{results.Vault_Search.output.chunks}}
{{results.Vault_Search.output.chunks[0].text}}
{{results.Vault_Search.output.sources}}
{{results.Analyze.output.choices[0].message.content}}
Note: Step labels with spaces become underscores: “Vault Search” → Vault_Search
Search Methods
| Method | Best For |
|---|
hybrid | General queries. Combines semantic + keyword matching. Default. |
vector | Finding conceptually similar content |
global | ”What are the main themes?” (GraphRAG corpus-wide) |
local | ”What did Dr. Smith say?” (GraphRAG entity-focused) |
Example Workflows
Extract named entities and build a knowledge graph:
{
"name": "Entity Extraction",
"nodes": [
{ "id": "trigger", "type": "trigger", "label": "Webhook", "config": { "triggerType": "Webhook" } },
{
"id": "search",
"type": "action",
"label": "Find People",
"config": {
"actionType": "case-search/vault-search",
"vaultId": "{{vaultId}}",
"query": "people names individuals mentioned",
"limit": 20
}
},
{
"id": "extract",
"type": "action",
"label": "Extract Entities",
"config": {
"actionType": "case-llm/llm",
"method": "chat",
"model": "gpt-4o",
"systemPrompt": "Extract named entities (PERSON, ORG, DATE, LOCATION) as JSON.",
"userPrompt": "{{results.Find_People.output.chunks}}"
}
}
],
"edges": [
{ "source": "trigger", "target": "search" },
{ "source": "search", "target": "extract" }
]
}
Multi-Query Research
Search for multiple aspects and synthesize:
{
"name": "Comprehensive Analysis",
"nodes": [
{ "id": "trigger", "type": "trigger", "label": "Webhook", "config": { "triggerType": "Webhook" } },
{
"id": "search-facts",
"type": "action",
"label": "Search Facts",
"config": {
"actionType": "case-search/vault-search",
"vaultId": "{{vaultId}}",
"query": "key facts timeline events",
"limit": 10
}
},
{
"id": "search-testimony",
"type": "action",
"label": "Search Testimony",
"config": {
"actionType": "case-search/vault-search",
"vaultId": "{{vaultId}}",
"query": "witness testimony statements",
"limit": 10
}
},
{
"id": "synthesize",
"type": "action",
"label": "Synthesize",
"config": {
"actionType": "case-llm/llm",
"method": "chat",
"model": "gpt-4o",
"systemPrompt": "Synthesize the facts and testimony into a comprehensive case summary.",
"userPrompt": "Facts:\n{{results.Search_Facts.output.chunks}}\n\nTestimony:\n{{results.Search_Testimony.output.chunks}}"
}
}
],
"edges": [
{ "source": "trigger", "target": "search-facts" },
{ "source": "trigger", "target": "search-testimony" },
{ "source": "search-facts", "target": "synthesize" },
{ "source": "search-testimony", "target": "synthesize" }
]
}
Best Practices
-
Pre-index your documents. Ingestion takes time. Index documents before you need to search them.
-
Use hybrid search. It combines the best of semantic and keyword matching.
-
Set appropriate limits. More results = more context for LLMs, but also more tokens and cost.
-
Include document sources. The
sources array tells you which documents matched.
-
Chain LLM steps for complex analysis. First extract, then analyze, then format.
Next Steps