API Reference

Vault API endpoints and operations

Create a Vault

Create a new vault to store related documents.

Endpoint

POST /vault
POST
/vault
curl -X POST https://api.case.dev/vault \
  -H "Authorization: Bearer sk_case_your_api_key_here" \
  -H "Content-Type: application/json" \
  -d '{
  "name": "Smith v. Hospital Case 2024",
  "description": "All discovery, depositions, and medical records for the Smith case"
}'

Example Request

curl -X POST https://api.case.dev/vault \
  -H "Authorization: Bearer sk_case_your_api_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Smith v. Hospital Case 2024",
    "description": "All discovery, depositions, and medical records for the Smith case"
  }'

Example Response

{
  "id": "sytp1b5f5j1yuj7uffzzxgw6",
  "name": "Smith v. Hospital Case 2024",
  "description": "All discovery, depositions, and medical records for the Smith case",
  "filesBucket": "case-vault-myorg-sytp1b5f5j1yuj7uffzzxgw6",
  "vectorBucket": "case-vault-myorg-sytp1b5f5j1yuj7uffzzxgw6-vectors",
  "indexName": "embeddings",
  "region": "us-west-2",
  "createdAt": "2025-11-04T09:29:02.681Z"
}

What Happens Behind the Scenes

When you create a vault, we automatically:

  1. Create two S3 buckets:
    • Files bucket: Stores your actual documents
    • Vector bucket: Stores embeddings for semantic search
  2. Set up encryption: KMS encryption on both buckets
  3. Create vector index: 1536-dimension index (OpenAI embeddings)
  4. Configure for semantic search: Cosine similarity, optimized for text

Request Parameters

Required:

  • name (string): Vault name (e.g., case name, matter number)

Optional:

  • description (string): What's stored in this vault

Response Fields

  • id: Vault ID - use this in all subsequent requests
  • filesBucket: S3 bucket name for documents
  • vectorBucket: S3 bucket name for embeddings
  • indexName: Vector index name (always "embeddings")
  • region: AWS region where vault is stored
  • createdAt: When the vault was created

Get Vault Information

Retrieve detailed information about a specific vault including bucket names, configuration, and usage statistics.

Endpoint

GET /vault/:id
GET
/vault/sytp1b5f5j1yuj7uffzzxgw6
curl -X GET https://api.case.dev/vault/sytp1b5f5j1yuj7uffzzxgw6 \
  -H "Authorization: Bearer sk_case_your_api_key_here" \
  -H "Content-Type: application/json"

Example Request

curl https://api.case.dev/vault/sytp1b5f5j1yuj7uffzzxgw6 \
  -H "Authorization: Bearer sk_case_your_api_key_here"

Example Response

{
  "id": "sytp1b5f5j1yuj7uffzzxgw6",
  "name": "Smith v. Hospital Case 2024",
  "description": "All discovery, depositions, and medical records for the Smith case",
  "filesBucket": "case-vault-myorg-sytp1b5f5j1yuj7uffzzxgw6",
  "vectorBucket": "case-vault-myorg-sytp1b5f5j1yuj7uffzzxgw6-vectors",
  "indexName": "embeddings",
  "region": "us-west-2",
  "kmsKeyId": "arn:aws:kms:us-west-2:123456789:key/abc-def",
  "chunkStrategy": {
    "method": "semantic",
    "chunkSize": 512,
    "overlap": 50,
    "minChunkSize": 100
  },
  "enableGraph": true,
  "totalObjects": 42,
  "totalBytes": 125829120,
  "totalVectors": 3847,
  "metadata": {
    "vectorBucket": "case-vault-myorg-sytp1b5f5j1yuj7uffzzxgw6-vectors"
  },
  "createdAt": "2025-11-04T09:29:02.681Z",
  "updatedAt": "2025-11-04T15:42:18.329Z"
}

Response Fields

Basic Information:

  • id: Unique vault identifier
  • name: Vault name (your case or matter name)
  • description: Vault description

Storage Configuration:

  • filesBucket: S3 bucket storing actual documents
  • vectorBucket: S3 bucket storing embeddings for semantic search
  • indexName: Vector index name (always "embeddings")
  • region: AWS region where vault is stored
  • kmsKeyId: KMS key ARN used for encryption

Processing Configuration:

  • chunkStrategy: How documents are split for processing
    • method: Chunking algorithm (semantic respects paragraphs/sentences)
    • chunkSize: Target chunk size in tokens (512 tokens ≈ 380 words)
    • overlap: Token overlap between chunks for context preservation
    • minChunkSize: Minimum chunk size (prevents tiny fragments)

Features:

  • enableGraph: Whether knowledge graph extraction is enabled

Usage Statistics:

  • totalObjects: Number of documents stored in vault
  • totalBytes: Total storage used (in bytes)
  • totalVectors: Total number of embeddings generated

Metadata:

  • metadata: Additional vault configuration and settings
  • createdAt: When vault was created
  • updatedAt: Last modification time

Use Cases

Check Storage Usage:

curl https://api.case.dev/vault/sytp1b5f5j1yuj7uffzzxgw6 \
  -H "Authorization: Bearer sk_case_..." \
  | jq '{name, totalObjects, totalBytes, totalVectors}'

Verify Configuration:

curl https://api.case.dev/vault/sytp1b5f5j1yuj7uffzzxgw6 \
  -H "Authorization: Bearer sk_case_..." \
  | jq '{region, chunkStrategy, enableGraph}'

Get Bucket Names for Direct S3 Access:

curl https://api.case.dev/vault/sytp1b5f5j1yuj7uffzzxgw6 \
  -H "Authorization: Bearer sk_case_..." \
  | jq '{filesBucket, vectorBucket}'

Upload Document

Get a presigned upload URL to securely upload a document to your vault.

Endpoint

POST /vault/:id/upload
POST
/vault/sytp1b5f5j1yuj7uffzzxgw6/upload
curl -X POST https://api.case.dev/vault/sytp1b5f5j1yuj7uffzzxgw6/upload \
  -H "Authorization: Bearer sk_case_your_api_key_here" \
  -H "Content-Type: application/json" \
  -d '{
  "filename": "deposition-transcript.pdf",
  "contentType": "application/pdf",
  "metadata": {
    "case_id": "2024-1234",
    "document_type": "deposition",
    "witness": "Dr. Sarah Johnson",
    "date": "2024-11-04"
  }
}'

Example Request

curl -X POST https://api.case.dev/vault/sytp1b5f5j1yuj7uffzzxgw6/upload \
  -H "Authorization: Bearer sk_case_your_api_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "filename": "deposition-transcript.pdf",
    "contentType": "application/pdf",
    "metadata": {
      "case_id": "2024-1234",
      "document_type": "deposition",
      "witness": "Dr. Sarah Johnson",
      "date": "2024-11-04"
    }
  }'

Example Response

{
  "objectId": "i5ar122d3h11a1802a3mogob",
  "uploadUrl": "https://case-vault-myorg-sytp1b5f5j1yuj7uffzzxgw6.s3.us-west-2.amazonaws.com/objects/i5ar122d3h11a1802a3mogob/deposition-transcript.pdf?X-Amz-Algorithm=AWS4-HMAC-SHA256&...",
  "expiresIn": 3600,
  "s3Key": "objects/i5ar122d3h11a1802a3mogob/deposition-transcript.pdf",
  "instructions": {
    "method": "PUT",
    "headers": {
      "Content-Type": "application/pdf"
    },
    "note": "Upload your file to the uploadUrl using PUT request"
  }
}

Request Parameters

Required:

  • filename (string): Name of the file
  • contentType (string): MIME type
    • PDF: application/pdf
    • Word: application/vnd.openxmlformats-officedocument.wordprocessingml.document
    • Text: text/plain
    • Image: image/jpeg, image/png

Optional:

  • metadata (object): Custom metadata about this document
    • Searchable key-value pairs
    • Add case IDs, dates, parties, document types, etc.
    • Used to filter search results later

How to Upload

Step 1: Get the presigned upload URL (above)

Step 2: Upload your file using PUT request:

curl -X PUT "https://case-vault-myorg-sytp1b5f5j1yuj7uffzzxgw6.s3.us-west-2.amazonaws.com/objects/..." \
  -H "Content-Type: application/pdf" \
  --data-binary "@deposition-transcript.pdf"

Step 3: The file is now stored! Proceed to ingestion for semantic search.

Complete Upload Example

#!/bin/bash
VAULT_ID="sytp1b5f5j1yuj7uffzzxgw6"
FILE="deposition-transcript.pdf"

# 1. Get upload URL
RESPONSE=$(curl -s -X POST https://api.case.dev/vault/$VAULT_ID/upload \
  -H "Authorization: Bearer sk_case_your_api_key_here" \
  -H "Content-Type: application/json" \
  -d "{
    \"filename\": \"$FILE\",
    \"contentType\": \"application/pdf\",
    \"metadata\": {
      \"case_id\": \"2024-1234\",
      \"witness\": \"Dr. Johnson\"
    }
  }")

UPLOAD_URL=$(echo $RESPONSE | jq -r '.uploadUrl')
OBJECT_ID=$(echo $RESPONSE | jq -r '.objectId')

# 2. Upload the actual file
curl -X PUT "$UPLOAD_URL" \
  -H "Content-Type: application/pdf" \
  --data-binary "@$FILE"

echo "Uploaded! Object ID: $OBJECT_ID"

# 3. Trigger ingestion for semantic search (optional but recommended)
curl -X POST https://api.case.dev/vault/$VAULT_ID/ingest/$OBJECT_ID \
  -H "Authorization: Bearer sk_case_..."

echo "Document will be searchable in a few minutes!"

List Vault Objects

See all documents stored in a vault.

Endpoint

GET /vault/:id/objects
GET
/vault/sytp1b5f5j1yuj7uffzzxgw6/objects
curl -X GET https://api.case.dev/vault/sytp1b5f5j1yuj7uffzzxgw6/objects \
  -H "Authorization: Bearer sk_case_your_api_key_here" \
  -H "Content-Type: application/json"

Example Request

curl https://api.case.dev/vault/sytp1b5f5j1yuj7uffzzxgw6/objects \
  -H "Authorization: Bearer sk_case_your_api_key_here"

Example Response

{
  "vaultId": "sytp1b5f5j1yuj7uffzzxgw6",
  "objects": [
    {
      "id": "i5ar122d3h11a1802a3mogob",
      "filename": "deposition-transcript.pdf",
      "contentType": "application/pdf",
      "sizeBytes": 2458624,
      "ingestionStatus": "completed",
      "pageCount": 150,
      "textLength": 45230,
      "chunkCount": 89,
      "vectorCount": 89,
      "tags": ["deposition", "medical"],
      "metadata": {
        "case_id": "2024-1234",
        "witness": "Dr. Sarah Johnson",
        "date": "2024-11-04"
      },
      "createdAt": "2025-11-04T09:29:47.466Z",
      "ingestionCompletedAt": "2025-11-04T09:35:12.891Z"
    },
    {
      "id": "j3kr9d2k1m5p8q4n6r7s9t2v",
      "filename": "medical-records.pdf",
      "contentType": "application/pdf",
      "sizeBytes": 8945120,
      "ingestionStatus": "processing",
      "pageCount": 350,
      "textLength": null,
      "chunkCount": null,
      "vectorCount": null,
      "metadata": {
        "case_id": "2024-1234",
        "document_type": "medical_records"
      },
      "createdAt": "2025-11-04T09:45:23.123Z"
    }
  ],
  "count": 2
}

Response Fields

Per Object:

  • id: Object ID (use for downloads/ingestion)
  • filename: Original filename
  • contentType: MIME type
  • sizeBytes: File size in bytes
  • ingestionStatus: Search readiness
    • pending: Not yet processed
    • processing: OCR/embedding in progress
    • completed: Fully searchable
    • failed: Processing failed
  • pageCount: Number of pages (PDFs only)
  • textLength: Characters of extracted text
  • chunkCount: Document split into chunks
  • vectorCount: Number of embeddings created
  • metadata: Your custom metadata
  • tags: Auto-detected or manual tags

Download Document

Get a presigned URL to download a specific document from your vault.

Endpoint

GET /vault/:vaultId/objects/:objectId
GET
/vault/sytp1b5f5j1yuj7uffzzxgw6/objects/i5ar122d3h11a1802a3mogob
curl -X GET https://api.case.dev/vault/sytp1b5f5j1yuj7uffzzxgw6/objects/i5ar122d3h11a1802a3mogob \
  -H "Authorization: Bearer sk_case_your_api_key_here" \
  -H "Content-Type: application/json"

Example Request

curl https://api.case.dev/vault/sytp1b5f5j1yuj7uffzzxgw6/objects/i5ar122d3h11a1802a3mogob \
  -H "Authorization: Bearer sk_case_your_api_key_here"

Example Response

{
  "id": "i5ar122d3h11a1802a3mogob",
  "vaultId": "sytp1b5f5j1yuj7uffzzxgw6",
  "filename": "deposition-transcript.pdf",
  "contentType": "application/pdf",
  "sizeBytes": 2458624,
  "downloadUrl": "https://case-vault-myorg-sytp1b5f5j1yuj7uffzzxgw6.s3.us-west-2.amazonaws.com/objects/i5ar122d3h11a1802a3mogob/deposition-transcript.pdf?X-Amz-Algorithm=AWS4-HMAC-SHA256&...",
  "expiresIn": 3600,
  "ingestionStatus": "completed",
  "pageCount": 150,
  "textLength": 45230,
  "chunkCount": 89,
  "vectorCount": 89,
  "metadata": {
    "case_id": "2024-1234",
    "witness": "Dr. Sarah Johnson"
  },
  "createdAt": "2025-11-04T09:29:47.466Z",
  "ingestionCompletedAt": "2025-11-04T09:35:12.891Z"
}

Download the File

Use the downloadUrl directly:

curl -o deposition-transcript.pdf "https://case-vault-myorg-sytp1b5f5j1yuj7uffzzxgw6.s3.us-west-2.amazonaws.com/objects/..."

Note: Download URLs expire in 1 hour. Generate a new one if expired.


Generate Custom Presigned URL

Generate presigned URLs for S3 operations with custom expiry times and operation types. This gives you more control than the basic download endpoint.

Endpoint

POST /vault/:vaultId/objects/:objectId/presigned-url
POST
/vault/sytp1b5f5j1yuj7uffzzxgw6/objects/i5ar122d3h11a1802a3mogob/presigned-url
curl -X POST https://api.case.dev/vault/sytp1b5f5j1yuj7uffzzxgw6/objects/i5ar122d3h11a1802a3mogob/presigned-url \
  -H "Authorization: Bearer sk_case_your_api_key_here" \
  -H "Content-Type: application/json" \
  -d '{
  "operation": "GET",
  "expiresIn": 7200
}'

Example Request

curl -X POST https://api.case.dev/vault/sytp1b5f5j1yuj7uffzzxgw6/objects/i5ar122d3h11a1802a3mogob/presigned-url \
  -H "Authorization: Bearer sk_case_your_api_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "operation": "GET",
    "expiresIn": 7200
  }'

Example Response

{
  "objectId": "i5ar122d3h11a1802a3mogob",
  "vaultId": "sytp1b5f5j1yuj7uffzzxgw6",
  "filename": "deposition-transcript.pdf",
  "s3Key": "objects/i5ar122d3h11a1802a3mogob/deposition-transcript.pdf",
  "operation": "GET",
  "presignedUrl": "https://case-vault-myorg-sytp1b5f5j1yuj7uffzzxgw6.s3.us-west-2.amazonaws.com/objects/...",
  "expiresIn": 7200,
  "expiresAt": "2025-11-04T11:29:47.466Z",
  "instructions": {
    "method": "GET",
    "description": "Use this URL to download the file",
    "example": "curl -o \"deposition-transcript.pdf\" \"https://case-vault-myorg-sytp1b5f5j1yuj7uffzzxgw6.s3.us-west-2.amazonaws.com/objects/...\""
  },
  "metadata": {
    "contentType": "application/pdf",
    "sizeBytes": 2458624,
    "bucket": "case-vault-myorg-sytp1b5f5j1yuj7uffzzxgw6",
    "region": "us-west-2"
  }
}

Request Parameters

Optional:

  • operation (string): S3 operation type (default: GET)
    • GET - Download/read the file
    • PUT - Upload/replace the file
    • DELETE - Delete the file from S3
    • HEAD - Get file metadata without downloading
  • expiresIn (number): URL expiry in seconds (default: 3600)
    • Minimum: 60 seconds (1 minute)
    • Maximum: 604800 seconds (7 days)
  • contentType (string): Content type for PUT operations
    • Only needed for PUT operations
    • Defaults to the object's stored content type

Use Cases

Long-lived Download Links (7 days):

{
  "operation": "GET",
  "expiresIn": 604800
}

Replace/Update a Document:

{
  "operation": "PUT",
  "expiresIn": 3600,
  "contentType": "application/pdf"
}

Then upload using the presigned URL:

curl -X PUT "$PRESIGNED_URL" \
  -H "Content-Type: application/pdf" \
  --data-binary "@updated-document.pdf"

Delete a Document:

{
  "operation": "DELETE",
  "expiresIn": 300
}

Check File Metadata:

{
  "operation": "HEAD",
  "expiresIn": 3600
}

Security Notes

  • Presigned URLs bypass normal authentication - anyone with the URL can perform the operation
  • Choose appropriate expiry times based on your security requirements
  • DELETE operations are permanent - the file is removed from S3 (database record remains)
  • PUT operations will replace existing files completely

Trigger Document Ingestion

Process a document for semantic search. This extracts text (via OCR if needed), chunks it, generates embeddings, and stores vectors for fast similarity search.

Endpoint

POST /vault/:vaultId/ingest/:objectId
POST
/vault/sytp1b5f5j1yuj7uffzzxgw6/ingest/i5ar122d3h11a1802a3mogob
curl -X POST https://api.case.dev/vault/sytp1b5f5j1yuj7uffzzxgw6/ingest/i5ar122d3h11a1802a3mogob \
  -H "Authorization: Bearer sk_case_your_api_key_here" \
  -H "Content-Type: application/json" \
  -d '{}'

Example Request

curl -X POST https://api.case.dev/vault/sytp1b5f5j1yuj7uffzzxgw6/ingest/i5ar122d3h11a1802a3mogob \
  -H "Authorization: Bearer sk_case_your_api_key_here"

Example Response

{
  "objectId": "i5ar122d3h11a1802a3mogob",
  "workflowId": "wrun_01K972ABCDEFGHIJK",
  "status": "processing",
  "message": "OCR ingestion started. Processing may take several minutes."
}

What Happens During Ingestion

This kicks off a Vercel Workflow that:

  1. Submits document to Vision OCR API
    • Extracts all text
    • Preserves layout and structure
    • Returns page count and text
  2. Chunks the text (when OCR completes)
    • Semantic chunking (512 tokens per chunk)
    • 50 token overlap between chunks
    • Preserves context across boundaries
  3. Generates embeddings
    • Uses OpenAI text-embedding-3-small
    • 1536 dimensions per chunk
    • Stores in vector bucket
  4. Updates object metadata
    • Sets ingestionStatus to completed
    • Records pageCount, textLength, chunkCount, vectorCount

Processing Times

Document TypePagesOCR TimeTotal Time
Clean PDF100s (no OCR needed)~10s
Scanned PDF50~2 min~2.5 min
Large discovery300~12 min~13 min

The workflow runs asynchronously - you get an immediate response, processing happens in the background.


Search your vault documents using natural language. Multiple search methods available based on your needs.

Endpoint

POST /vault/:id/search
POST
/vault/sytp1b5f5j1yuj7uffzzxgw6/search
curl -X POST https://api.case.dev/vault/sytp1b5f5j1yuj7uffzzxgw6/search \
  -H "Authorization: Bearer sk_case_your_api_key_here" \
  -H "Content-Type: application/json" \
  -d '{
  "query": "testimony about post-operative care and monitoring",
  "method": "hybrid",
  "topK": 10
}'

Request Parameters

ParameterTypeRequiredDescription
querystringYesYour search query in natural language
methodstringYesSearch method: fast, hybrid, entity, global, or local
topKnumberNoNumber of results to return (default: 10)

Search Methods

Instant semantic similarity search using S3 Vectors

  • Speed: < 500ms
  • Setup: Works immediately after document upload (no GraphRAG needed!)
  • How: Semantic embedding similarity using OpenAI text-embedding-3-small
  • Best for: Quick searches, real-time applications, chatbots
curl -X POST https://api.case.dev/vault/sytp1b5f5j1yuj7uffzzxgw6/search \
  -H "Authorization: Bearer sk_case_your_api_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "post-operative care protocols",
    "method": "fast",
    "topK": 5
  }'

Response:

{
  "method": "fast",
  "query": "post-operative care protocols",
  "response": "Found 5 relevant chunks across 2 documents.",
  "chunks": [
    {
      "text": "Q: Can you describe the post-operative monitoring protocol? A: The standard protocol requires vital signs every 15 minutes...",
      "object_id": "i5ar122d3h11a1802a3mogob",
      "chunk_index": 12,
      "distance": 0.23,
      "key": "i5ar122d3h11a1802a3mogob_chunk_12"
    }
  ],
  "sources": [
    {
      "id": "i5ar122d3h11a1802a3mogob",
      "filename": "deposition-transcript.pdf",
      "pageCount": 145,
      "textLength": 87234,
      "chunkCount": 42
    }
  ],
  "vault_id": "sytp1b5f5j1yuj7uffzzxgw6"
}

Hybrid Search (method: "hybrid") 🎯 Best Accuracy

Combines semantic similarity (70%) with keyword matching (30%) using BM25

  • Speed: < 1 second
  • Setup: Works immediately after document upload (no GraphRAG needed!)
  • How: S3 Vectors similarity + BM25 keyword scoring with normalized fusion
  • Best for: When you need both semantic understanding AND exact keyword matches
curl -X POST https://api.case.dev/vault/sytp1b5f5j1yuj7uffzzxgw6/search \
  -H "Authorization: Bearer sk_case_your_api_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "post-operative care protocols",
    "method": "hybrid",
    "topK": 5
  }'

Response:

{
  "method": "hybrid",
  "query": "post-operative care protocols",
  "response": "Found 5 relevant chunks (α=0.7 vector + β=0.3 BM25)",
  "chunks": [
    {
      "text": "The post-operative care protocol requires...",
      "object_id": "i5ar122d3h11a1802a3mogob",
      "chunk_index": 12,
      "hybridScore": 0.92,
      "vectorScore": 0.89,
      "bm25Score": 1.0
    }
  ],
  "sources": [...],
  "vault_id": "sytp1b5f5j1yuj7uffzzxgw6"
}

How hybrid scoring works:

  • Get 2x candidates using vector similarity
  • Score candidates with BM25 keyword algorithm
  • Normalize both scores to 0-1 range
  • Combine: 0.7 * vectorScore + 0.3 * bm25Score
  • Return top K by hybrid score

Entity Search (method: "entity")

GraphRAG entity-based keyword search

  • Speed: 2-3 seconds
  • Setup: Requires POST /vault/:id/graphrag/init first
  • How: Keyword matching on GraphRAG extracted entity names/descriptions
  • Best for: Finding specific entities, people, organizations

Global Search (method: "global")

Comprehensive knowledge graph analysis

  • Speed: 15-20 seconds
  • Setup: Requires GraphRAG indexing
  • How: Analyzes community summaries across entire graph
  • Best for: "What are the main themes?", "Summarize key points"

Local Search (method: "local")

Entity-focused graph search

  • Speed: 5-10 seconds
  • Setup: Requires GraphRAG indexing
  • How: Focuses on specific entities and their relationships
  • Best for: "Tell me about X", "Who is connected to Y?"

Vault Architecture

How It Works

Your Document
    ↓
[Upload via Presigned URL]
    ↓
Files Bucket (S3)
    ├─ objects/
    │   ├─ {objectId}/
    │   │   └─ document.pdf
    │   └─ ...
    │
[Ingestion Workflow Triggered]
    ↓
Vision OCR API (if scanned)
    ↓
Text Extraction
    ↓
Semantic Chunking
    ↓
Embedding Generation (OpenAI)
    ↓
Vector Bucket (S3 Vectors)
    └─ embeddings/
        ├─ chunk-1 → [0.123, -0.456, ...]
        ├─ chunk-2 → [0.789, 0.234, ...]
        └─ ...

[Ready for Semantic Search]

Chunking Strategy

Documents are split intelligently:

  • Chunk size: 512 tokens (~380 words)
  • Overlap: 50 tokens to preserve context
  • Min chunk: 100 tokens (no tiny fragments)
  • Method: Semantic (respects sentences/paragraphs)

This means:

  • ~2 chunks per page (standard deposition)
  • 150-page deposition = ~300 chunks
  • Each chunk is searchable independently