Document Workflows

The most powerful use of Case.dev workflows is processing documents at scale. This guide shows you how to combine Vaults (document storage + search) with Workflows (automation pipelines) to build intelligent document processing systems.

The Document Processing Pattern

Every document workflow follows this pattern:

┌─────────────┐     ┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│  Upload to  │ ──▶ │   Ingest    │ ──▶ │   Search    │ ──▶ │   Analyze   │
│    Vault    │     │  (OCR +     │     │  (Find      │     │  (LLM +     │
│             │     │  Embed)     │     │  Relevant)  │     │  Format)    │
└─────────────┘     └─────────────┘     └─────────────┘     └─────────────┘

Key Concepts

Vaults as the Foundation

Vaults are the foundation layer for document workflows. A vault is:

An encrypted, isolated document repository
Automatic OCR + text extraction on upload
Semantic search (vector + keyword hybrid)
Optional knowledge graph (GraphRAG)

Rule of thumb: One vault per matter/case. This keeps search results focused and data isolated.

Workflow Steps for Documents

Step Type	Action Type	Purpose
Upload	`case-vault/vault` method: `upload`	Get presigned URL to upload files
Ingest	`case-vault/vault` method: `ingest`	OCR + chunk + embed a document
Search	`case-search/vault-search`	Find relevant passages
Analyze	`case-llm/llm`	Summarize, extract, or analyze

Complete Example: Document Analyzer

This workflow accepts a vault ID and query, searches the vault, and generates an analysis.

{
  "name": "Document Analyzer",
  "description": "Search vault documents and generate AI analysis",
  "nodes": [
    {
      "id": "trigger",
      "type": "trigger",
      "label": "Webhook",
      "config": { "triggerType": "Webhook" }
    },
    {
      "id": "search",
      "type": "action",
      "label": "Vault Search",
      "config": {
        "actionType": "case-search/vault-search",
        "vaultId": "{{vaultId}}",
        "query": "{{query}}",
        "limit": 10,
        "searchMethod": "hybrid"
      }
    },
    {
      "id": "analyze",
      "type": "action",
      "label": "Analyze",
      "config": {
        "actionType": "case-llm/llm",
        "method": "chat",
        "model": "gpt-4o",
        "systemPrompt": "You are a legal document analyst. Analyze the provided document excerpts and answer the user's question with specific citations.",
        "userPrompt": "Based on these document excerpts:\n\n{{results.Vault_Search.output.chunks}}\n\nAnswer: {{query}}"
      }
    }
  ],
  "edges": [
    { "source": "trigger", "target": "search" },
    { "source": "search", "target": "analyze" }
  ]
}

Deploy and Execute

# Create the workflow
curl -X POST https://api.case.dev/workflows/v1/create \
  -H "Authorization: Bearer $CASEDEV_API_KEY" \
  -H "Content-Type: application/json" \
  -d @document-analyzer.json

# Response includes webhook URL and secret
# {
#   "id": "wf_abc123",
#   "webhookUrl": "https://api.case.dev/workflows/v1/wf_abc123/webhook",
#   "webhookSecret": "whsec_..."
# }

# Execute with your vault
curl -X POST "https://api.case.dev/workflows/v1/wf_abc123/webhook" \
  -H "X-Webhook-Secret: whsec_..." \
  -H "Content-Type: application/json" \
  -d '{
    "vaultId": "zm03mnhgedsuzqmm30daa3hc",
    "query": "What are the key facts about post-operative care?"
  }'

Step-by-Step: Building Document Workflows

Step 1: Create a Vault

Before your workflow can search documents, you need a vault with indexed content.

# Create a vault
curl -X POST https://api.case.dev/vault \
  -H "Authorization: Bearer $CASEDEV_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"name": "Case Documents 2024"}'

# Response: { "id": "zm03mnhgedsuzqmm30daa3hc", ... }

Step 2: Upload Documents

# Get presigned upload URL
curl -X POST https://api.case.dev/vault/VAULT_ID/upload \
  -H "Authorization: Bearer $CASEDEV_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "filename": "deposition.pdf",
    "contentType": "application/pdf"
  }'

# Upload file to the presigned URL
curl -X PUT "$UPLOAD_URL" \
  -H "Content-Type: application/pdf" \
  --data-binary @deposition.pdf

Step 3: Ingest (Index) Documents

This is where the magic happens. Ingestion:

Runs OCR on scanned PDFs
Extracts text from digital PDFs
Chunks text into searchable passages
Generates embeddings for semantic search

# Trigger ingestion
curl -X POST https://api.case.dev/vault/VAULT_ID/ingest/OBJECT_ID \
  -H "Authorization: Bearer $CASEDEV_API_KEY"

# Response: { "status": "processing", "workflowId": "wrun_..." }

# Poll for completion
curl https://api.case.dev/vault/VAULT_ID/objects/OBJECT_ID \
  -H "Authorization: Bearer $CASEDEV_API_KEY"

# When complete: { "ingestionStatus": "completed", "chunkCount": 89 }

Ingestion is async. Documents must be fully ingested before they appear in search results. For workflows, use the case-vault/vault step with method: "ingest" which handles waiting automatically.

Step 4: Add Vault Search to Your Workflow

Now you can search your indexed documents:

{
  "id": "search-step",
  "type": "action",
  "label": "Search Documents",
  "config": {
    "actionType": "case-search/vault-search",
    "vaultId": "{{vaultId}}",
    "query": "{{query}}",
    "limit": 10,
    "searchMethod": "hybrid"
  }
}

The output includes matching text chunks with relevance scores:

{
  "chunks": [
    {
      "text": "The witness testified that...",
      "object_id": "abc123",
      "hybridScore": 0.89,
      "vectorScore": 0.92,
      "bm25Score": 0.78
    }
  ],
  "sources": [
    { "id": "abc123", "filename": "deposition.pdf" }
  ]
}

Ingesting Documents in Workflows

You can also ingest documents as part of a workflow. This is useful for:

Processing uploaded files automatically
Building document pipelines that accept raw files

{
  "id": "ingest-step",
  "type": "action",
  "label": "Ingest Document",
  "config": {
    "actionType": "case-vault/vault",
    "method": "ingest",
    "vaultId": "{{vaultId}}",
    "objectId": "{{objectId}}"
  }
}

Deployed workflows handle ingestion automatically. The workflow engine waits for ingestion to complete before proceeding to the next step. No polling required.

Template Reference

Access data from your trigger input and previous steps:

Trigger Input (Webhook Body)

{{vaultId}}           → Vault ID from request body
{{query}}             → Search query from request body
{{objectId}}          → Document ID from request body

Previous Step Output

{{results.Vault_Search.output.chunks}}
{{results.Vault_Search.output.chunks[0].text}}
{{results.Vault_Search.output.sources}}
{{results.Analyze.output.choices[0].message.content}}

Note: Step labels with spaces become underscores: “Vault Search” → Vault_Search

Search Methods

Method	Best For
`hybrid`	General queries. Combines semantic + keyword matching. Default.
`vector`	Finding conceptually similar content
`global`	”What are the main themes?” (GraphRAG corpus-wide)
`local`	”What did Dr. Smith say?” (GraphRAG entity-focused)

Example Workflows

Entity Extraction Pipeline

Extract named entities and build a knowledge graph:

{
  "name": "Entity Extraction",
  "nodes": [
    { "id": "trigger", "type": "trigger", "label": "Webhook", "config": { "triggerType": "Webhook" } },
    {
      "id": "search",
      "type": "action",
      "label": "Find People",
      "config": {
        "actionType": "case-search/vault-search",
        "vaultId": "{{vaultId}}",
        "query": "people names individuals mentioned",
        "limit": 20
      }
    },
    {
      "id": "extract",
      "type": "action",
      "label": "Extract Entities",
      "config": {
        "actionType": "case-llm/llm",
        "method": "chat",
        "model": "gpt-4o",
        "systemPrompt": "Extract named entities (PERSON, ORG, DATE, LOCATION) as JSON.",
        "userPrompt": "{{results.Find_People.output.chunks}}"
      }
    }
  ],
  "edges": [
    { "source": "trigger", "target": "search" },
    { "source": "search", "target": "extract" }
  ]
}

Multi-Query Research

Search for multiple aspects and synthesize:

{
  "name": "Comprehensive Analysis",
  "nodes": [
    { "id": "trigger", "type": "trigger", "label": "Webhook", "config": { "triggerType": "Webhook" } },
    {
      "id": "search-facts",
      "type": "action",
      "label": "Search Facts",
      "config": {
        "actionType": "case-search/vault-search",
        "vaultId": "{{vaultId}}",
        "query": "key facts timeline events",
        "limit": 10
      }
    },
    {
      "id": "search-testimony",
      "type": "action",
      "label": "Search Testimony",
      "config": {
        "actionType": "case-search/vault-search",
        "vaultId": "{{vaultId}}",
        "query": "witness testimony statements",
        "limit": 10
      }
    },
    {
      "id": "synthesize",
      "type": "action",
      "label": "Synthesize",
      "config": {
        "actionType": "case-llm/llm",
        "method": "chat",
        "model": "gpt-4o",
        "systemPrompt": "Synthesize the facts and testimony into a comprehensive case summary.",
        "userPrompt": "Facts:\n{{results.Search_Facts.output.chunks}}\n\nTestimony:\n{{results.Search_Testimony.output.chunks}}"
      }
    }
  ],
  "edges": [
    { "source": "trigger", "target": "search-facts" },
    { "source": "trigger", "target": "search-testimony" },
    { "source": "search-facts", "target": "synthesize" },
    { "source": "search-testimony", "target": "synthesize" }
  ]
}

Best Practices

Pre-index your documents. Ingestion takes time. Index documents before you need to search them.
Use hybrid search. It combines the best of semantic and keyword matching.
Set appropriate limits. More results = more context for LLMs, but also more tokens and cost.
Include document sources. The sources array tells you which documents matched.
Chain LLM steps for complex analysis. First extract, then analyze, then format.

Next Steps

Vault Search Documentation — Deep dive on search methods
Document Analyzer Cookbook — Complete end-to-end tutorial
API Reference — Full endpoint documentation

Get Started

Platform

Resources

Document Workflows

The Document Processing Pattern

Key Concepts

Vaults as the Foundation

Workflow Steps for Documents

Complete Example: Document Analyzer

Deploy and Execute

Step-by-Step: Building Document Workflows

Step 1: Create a Vault

Step 2: Upload Documents

Step 3: Ingest (Index) Documents

Step 4: Add Vault Search to Your Workflow

Ingesting Documents in Workflows

Template Reference

Trigger Input (Webhook Body)

Previous Step Output

Search Methods

Example Workflows

Entity Extraction Pipeline

Multi-Query Research

Best Practices

Next Steps

Get Started

Platform

Resources

​The Document Processing Pattern

​Key Concepts

​Vaults as the Foundation

​Workflow Steps for Documents

​Complete Example: Document Analyzer

​Deploy and Execute

​Step-by-Step: Building Document Workflows

​Step 1: Create a Vault

​Step 2: Upload Documents

​Step 3: Ingest (Index) Documents

​Step 4: Add Vault Search to Your Workflow

​Ingesting Documents in Workflows

​Template Reference

​Trigger Input (Webhook Body)

​Previous Step Output

​Search Methods

​Example Workflows

​Entity Extraction Pipeline

​Multi-Query Research

​Best Practices

​Next Steps

The Document Processing Pattern

Key Concepts

Vaults as the Foundation

Workflow Steps for Documents

Complete Example: Document Analyzer

Deploy and Execute

Step-by-Step: Building Document Workflows

Step 1: Create a Vault

Step 2: Upload Documents

Step 3: Ingest (Index) Documents

Step 4: Add Vault Search to Your Workflow

Ingesting Documents in Workflows

Template Reference

Trigger Input (Webhook Body)

Previous Step Output

Search Methods

Example Workflows

Entity Extraction Pipeline

Multi-Query Research

Best Practices

Next Steps