Skip to main content
The most powerful use of Case.dev workflows is processing documents at scale. This guide shows you how to combine Vaults (document storage + search) with Workflows (automation pipelines) to build intelligent document processing systems.

The Document Processing Pattern

Every document workflow follows this pattern:
┌─────────────┐     ┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│  Upload to  │ ──▶ │   Ingest    │ ──▶ │   Search    │ ──▶ │   Analyze   │
│    Vault    │     │  (OCR +     │     │  (Find      │     │  (LLM +     │
│             │     │  Embed)     │     │  Relevant)  │     │  Format)    │
└─────────────┘     └─────────────┘     └─────────────┘     └─────────────┘

Key Concepts

Vaults as the Foundation

Vaults are the foundation layer for document workflows. A vault is:
  • An encrypted, isolated document repository
  • Automatic OCR + text extraction on upload
  • Semantic search (vector + keyword hybrid)
  • Optional knowledge graph (GraphRAG)
Rule of thumb: One vault per matter/case. This keeps search results focused and data isolated.

Workflow Steps for Documents

Step TypeAction TypePurpose
Uploadcase-vault/vault method: uploadGet presigned URL to upload files
Ingestcase-vault/vault method: ingestOCR + chunk + embed a document
Searchcase-search/vault-searchFind relevant passages
Analyzecase-llm/llmSummarize, extract, or analyze

Complete Example: Document Analyzer

This workflow accepts a vault ID and query, searches the vault, and generates an analysis.
{
  "name": "Document Analyzer",
  "description": "Search vault documents and generate AI analysis",
  "nodes": [
    {
      "id": "trigger",
      "type": "trigger",
      "label": "Webhook",
      "config": { "triggerType": "Webhook" }
    },
    {
      "id": "search",
      "type": "action",
      "label": "Vault Search",
      "config": {
        "actionType": "case-search/vault-search",
        "vaultId": "{{vaultId}}",
        "query": "{{query}}",
        "limit": 10,
        "searchMethod": "hybrid"
      }
    },
    {
      "id": "analyze",
      "type": "action",
      "label": "Analyze",
      "config": {
        "actionType": "case-llm/llm",
        "method": "chat",
        "model": "gpt-4o",
        "systemPrompt": "You are a legal document analyst. Analyze the provided document excerpts and answer the user's question with specific citations.",
        "userPrompt": "Based on these document excerpts:\n\n{{results.Vault_Search.output.chunks}}\n\nAnswer: {{query}}"
      }
    }
  ],
  "edges": [
    { "source": "trigger", "target": "search" },
    { "source": "search", "target": "analyze" }
  ]
}

Deploy and Execute

# Create the workflow
curl -X POST https://api.case.dev/workflows/v1/create \
  -H "Authorization: Bearer $CASEDEV_API_KEY" \
  -H "Content-Type: application/json" \
  -d @document-analyzer.json

# Response includes webhook URL and secret
# {
#   "id": "wf_abc123",
#   "webhookUrl": "https://api.case.dev/workflows/v1/wf_abc123/webhook",
#   "webhookSecret": "whsec_..."
# }

# Execute with your vault
curl -X POST "https://api.case.dev/workflows/v1/wf_abc123/webhook" \
  -H "X-Webhook-Secret: whsec_..." \
  -H "Content-Type: application/json" \
  -d '{
    "vaultId": "zm03mnhgedsuzqmm30daa3hc",
    "query": "What are the key facts about post-operative care?"
  }'

Step-by-Step: Building Document Workflows

Step 1: Create a Vault

Before your workflow can search documents, you need a vault with indexed content.
# Create a vault
curl -X POST https://api.case.dev/vault \
  -H "Authorization: Bearer $CASEDEV_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"name": "Case Documents 2024"}'

# Response: { "id": "zm03mnhgedsuzqmm30daa3hc", ... }

Step 2: Upload Documents

# Get presigned upload URL
curl -X POST https://api.case.dev/vault/VAULT_ID/upload \
  -H "Authorization: Bearer $CASEDEV_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "filename": "deposition.pdf",
    "contentType": "application/pdf"
  }'

# Upload file to the presigned URL
curl -X PUT "$UPLOAD_URL" \
  -H "Content-Type: application/pdf" \
  --data-binary @deposition.pdf

Step 3: Ingest (Index) Documents

This is where the magic happens. Ingestion:
  1. Runs OCR on scanned PDFs
  2. Extracts text from digital PDFs
  3. Chunks text into searchable passages
  4. Generates embeddings for semantic search
# Trigger ingestion
curl -X POST https://api.case.dev/vault/VAULT_ID/ingest/OBJECT_ID \
  -H "Authorization: Bearer $CASEDEV_API_KEY"

# Response: { "status": "processing", "workflowId": "wrun_..." }

# Poll for completion
curl https://api.case.dev/vault/VAULT_ID/objects/OBJECT_ID \
  -H "Authorization: Bearer $CASEDEV_API_KEY"

# When complete: { "ingestionStatus": "completed", "chunkCount": 89 }
Ingestion is async. Documents must be fully ingested before they appear in search results. For workflows, use the case-vault/vault step with method: "ingest" which handles waiting automatically.

Step 4: Add Vault Search to Your Workflow

Now you can search your indexed documents:
{
  "id": "search-step",
  "type": "action",
  "label": "Search Documents",
  "config": {
    "actionType": "case-search/vault-search",
    "vaultId": "{{vaultId}}",
    "query": "{{query}}",
    "limit": 10,
    "searchMethod": "hybrid"
  }
}
The output includes matching text chunks with relevance scores:
{
  "chunks": [
    {
      "text": "The witness testified that...",
      "object_id": "abc123",
      "hybridScore": 0.89,
      "vectorScore": 0.92,
      "bm25Score": 0.78
    }
  ],
  "sources": [
    { "id": "abc123", "filename": "deposition.pdf" }
  ]
}

Ingesting Documents in Workflows

You can also ingest documents as part of a workflow. This is useful for:
  • Processing uploaded files automatically
  • Building document pipelines that accept raw files
{
  "id": "ingest-step",
  "type": "action",
  "label": "Ingest Document",
  "config": {
    "actionType": "case-vault/vault",
    "method": "ingest",
    "vaultId": "{{vaultId}}",
    "objectId": "{{objectId}}"
  }
}
Deployed workflows handle ingestion automatically. The workflow engine waits for ingestion to complete before proceeding to the next step. No polling required.

Template Reference

Access data from your trigger input and previous steps:

Trigger Input (Webhook Body)

{{vaultId}}           → Vault ID from request body
{{query}}             → Search query from request body
{{objectId}}          → Document ID from request body

Previous Step Output

{{results.Vault_Search.output.chunks}}
{{results.Vault_Search.output.chunks[0].text}}
{{results.Vault_Search.output.sources}}
{{results.Analyze.output.choices[0].message.content}}
Note: Step labels with spaces become underscores: “Vault Search” → Vault_Search

Search Methods

MethodBest For
hybridGeneral queries. Combines semantic + keyword matching. Default.
vectorFinding conceptually similar content
global”What are the main themes?” (GraphRAG corpus-wide)
local”What did Dr. Smith say?” (GraphRAG entity-focused)

Example Workflows

Entity Extraction Pipeline

Extract named entities and build a knowledge graph:
{
  "name": "Entity Extraction",
  "nodes": [
    { "id": "trigger", "type": "trigger", "label": "Webhook", "config": { "triggerType": "Webhook" } },
    {
      "id": "search",
      "type": "action",
      "label": "Find People",
      "config": {
        "actionType": "case-search/vault-search",
        "vaultId": "{{vaultId}}",
        "query": "people names individuals mentioned",
        "limit": 20
      }
    },
    {
      "id": "extract",
      "type": "action",
      "label": "Extract Entities",
      "config": {
        "actionType": "case-llm/llm",
        "method": "chat",
        "model": "gpt-4o",
        "systemPrompt": "Extract named entities (PERSON, ORG, DATE, LOCATION) as JSON.",
        "userPrompt": "{{results.Find_People.output.chunks}}"
      }
    }
  ],
  "edges": [
    { "source": "trigger", "target": "search" },
    { "source": "search", "target": "extract" }
  ]
}

Multi-Query Research

Search for multiple aspects and synthesize:
{
  "name": "Comprehensive Analysis",
  "nodes": [
    { "id": "trigger", "type": "trigger", "label": "Webhook", "config": { "triggerType": "Webhook" } },
    {
      "id": "search-facts",
      "type": "action",
      "label": "Search Facts",
      "config": {
        "actionType": "case-search/vault-search",
        "vaultId": "{{vaultId}}",
        "query": "key facts timeline events",
        "limit": 10
      }
    },
    {
      "id": "search-testimony",
      "type": "action",
      "label": "Search Testimony",
      "config": {
        "actionType": "case-search/vault-search",
        "vaultId": "{{vaultId}}",
        "query": "witness testimony statements",
        "limit": 10
      }
    },
    {
      "id": "synthesize",
      "type": "action",
      "label": "Synthesize",
      "config": {
        "actionType": "case-llm/llm",
        "method": "chat",
        "model": "gpt-4o",
        "systemPrompt": "Synthesize the facts and testimony into a comprehensive case summary.",
        "userPrompt": "Facts:\n{{results.Search_Facts.output.chunks}}\n\nTestimony:\n{{results.Search_Testimony.output.chunks}}"
      }
    }
  ],
  "edges": [
    { "source": "trigger", "target": "search-facts" },
    { "source": "trigger", "target": "search-testimony" },
    { "source": "search-facts", "target": "synthesize" },
    { "source": "search-testimony", "target": "synthesize" }
  ]
}

Best Practices

  1. Pre-index your documents. Ingestion takes time. Index documents before you need to search them.
  2. Use hybrid search. It combines the best of semantic and keyword matching.
  3. Set appropriate limits. More results = more context for LLMs, but also more tokens and cost.
  4. Include document sources. The sources array tells you which documents matched.
  5. Chain LLM steps for complex analysis. First extract, then analyze, then format.

Next Steps