API Reference

LLM API endpoints and operations

List Available Models

Get a complete list of all available AI models with pricing, capabilities, and specifications.

Endpoint

GET /llm/v1/models

API Key

GET

/llm/v1/models

Code Examples

curl -X GET https://api.case.dev/llm/v1/models \
  -H "Authorization: Bearer sk_case_your_api_key_here" \
  -H "Content-Type: application/json"

Example Request

curl https://api.case.dev/llm/v1/models \
  -H "Authorization: Bearer sk_case_your_api_key_here"

Example Response

{
  "object": "list",
  "data": [
    {
      "id": "anthropic/claude-sonnet-4.5",
      "object": "model",
      "created": 1755815280,
      "owned_by": "anthropic",
      "name": "Claude Sonnet 4.5",
      "description": "Claude Sonnet 4.5 is the newest model...",
      "context_window": 200000,
      "max_tokens": 64000,
      "type": "language",
      "tags": ["file-input", "reasoning", "tool-use", "vision"],
      "pricing": {
        "input": "0.000003",
        "output": "0.000015",
        "input_cache_read": "0.0000003",
        "input_cache_write": "0.00000375"
      }
    }
    // ... 130+ more models
  ]
}

Response Fields

id: Model identifier to use in chat completions (e.g., anthropic/claude-sonnet-4.5)
name: Human-readable model name
description: What the model is good at
context_window: Maximum input tokens the model can handle
max_tokens: Maximum output tokens per request
type: language for chat models, embedding for embedding models
tags: Capabilities like vision, tool-use, reasoning, file-input
pricing: Cost per token in USD
- input: Cost per input token
- output: Cost per output token
- input_cache_read: Cost for reading from cache (if supported)
- input_cache_write: Cost for writing to cache (if supported)

Use Cases

Choosing the right model: Compare capabilities and pricing
Cost estimation: Calculate expected costs for your use case
Feature discovery: Find models with specific capabilities (vision, tool use, etc.)

Chat Completions

Send messages to AI models and get intelligent responses. This is the main endpoint for conversational AI.

Endpoint

POST /llm/v1/chat/completions

API Key

POST

/llm/v1/chat/completions

Request Body

Code Examples

curl -X POST https://api.case.dev/llm/v1/chat/completions \
  -H "Authorization: Bearer sk_case_your_api_key_here" \
  -H "Content-Type: application/json" \
  -d '{
  "model": "anthropic/claude-4.5-haiku",
  "messages": [
    {
      "role": "user",
      "content": "Summarize this deposition in 3 bullet points"
    }
  ],
  "max_tokens": 500
}'

Basic Example

curl -X POST https://api.case.dev/llm/v1/chat/completions \
  -H "Authorization: Bearer sk_case_your_api_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-4.5-haiku",
    "messages": [
      {"role": "user", "content": "Summarize this deposition in 3 bullet points"}
    ],
    "max_tokens": 500
  }'

Example Response

{
  "id": "gen_01K972J7KV4Y0MJZ3SRTA6YYMH",
  "object": "chat.completion",
  "created": 1762247909,
  "model": "anthropic/claude-4.5-haiku",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Here are the key points:\n\n• Witness testified about...\n• Documents reviewed include...\n• Timeline established from...",
        "provider_metadata": {
          "anthropic": {
            "usage": {
              "input_tokens": 245,
              "output_tokens": 87,
              "cache_creation_input_tokens": 0,
              "cache_read_input_tokens": 0
            }
          }
        }
      },
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 245,
    "completion_tokens": 87,
    "total_tokens": 332,
    "cost": 0.000105,
    "market_cost": 0.000105,
    "is_byok": true
  }
}

Request Parameters

Required:

model (string): Model ID from /llm/v1/models (e.g., anthropic/claude-sonnet-4.5)
messages (array): Conversation history
- role: user, assistant, or system
- content: Message text or multimodal content

Optional:

max_tokens (number): Maximum tokens to generate (default: 4096)
temperature (number): Randomness, 0-2 (default: 1)
top_p (number): Nucleus sampling, 0-1 (default: 1)
stream (boolean): Stream responses token-by-token (default: false)
stop (array): Stop sequences to end generation
presence_penalty (number): Penalize repeated topics, -2 to 2
frequency_penalty (number): Penalize repeated tokens, -2 to 2

Multi-Turn Conversations

Build context by including previous messages:

{
  "model": "openai/gpt-5",
  "messages": [
    { "role": "user", "content": "What is a deposition?" },
    { "role": "assistant", "content": "A deposition is sworn testimony..." },
    { "role": "user", "content": "How long do they typically last?" }
  ],
  "max_tokens": 200
}

System Prompts

Set behavior and context with system messages:

{
  "model": "anthropic/claude-sonnet-4.5",
  "messages": [
    {
      "role": "system",
      "content": "You are a legal assistant specializing in medical malpractice cases. Be concise and cite relevant case law when possible."
    },
    {
      "role": "user",
      "content": "Review this medical record and identify potential issues of negligence."
    }
  ]
}

Vision Models

Send images along with text (works with models tagged with vision):

{
  "model": "anthropic/claude-sonnet-4.5",
  "messages": [
    {
      "role": "user",
      "content": [
        { "type": "text", "text": "What medical equipment is visible in this photo?" },
        {
          "type": "image_url",
          "image_url": { "url": "https://example.com/hospital-room.jpg" }
        }
      ]
    }
  ]
}

Streaming Responses

Get responses token-by-token as they're generated:

curl -X POST https://api.case.dev/llm/v1/chat/completions \
  -H "Authorization: Bearer sk_case_your_api_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-5-mini",
    "messages": [{"role": "user", "content": "Write a detailed case summary"}],
    "stream": true
  }' \
  --no-buffer

Response format (Server-Sent Events):

data: {"id":"gen_123","choices":[{"delta":{"content":"The"}}]}

data: {"id":"gen_123","choices":[{"delta":{"content":" case"}}]}

data: {"id":"gen_123","choices":[{"delta":{"content":" involves"}}]}

data: [DONE]

Legal Use Cases

Document Analysis:

{
  "model": "anthropic/claude-sonnet-4.5",
  "messages": [
    {
      "role": "system",
      "content": "Extract key facts, dates, and parties from legal documents."
    },
    {
      "role": "user",
      "content": "Document text here..."
    }
  ],
  "max_tokens": 2000
}

Deposition Summarization:

{
  "model": "openai/gpt-5",
  "messages": [
    {
      "role": "user",
      "content": "Summarize this 300-page deposition transcript, focusing on admissions and inconsistencies:\n\n[transcript text]"
    }
  ],
  "max_tokens": 4000
}

Medical Record Review:

{
  "model": "anthropic/claude-opus-4.1",
  "messages": [
    {
      "role": "system",
      "content": "You are a medical-legal expert. Identify standard-of-care deviations and temporal inconsistencies."
    },
    {
      "role": "user",
      "content": "[medical records text]"
    }
  ],
  "max_tokens": 5000
}

Cost Control

Monitor costs in real-time using the usage object in responses:

{
  "usage": {
    "prompt_tokens": 1245,
    "completion_tokens": 387,
    "total_tokens": 1632,
    "cost": 0.004896, // Your actual cost
    "market_cost": 0.004896, // Market rate cost
    "is_byok": true // Using your own API keys
  }
}

Cost calculation:

cost = (input_tokens × input_price) + (output_tokens × output_price)
Prices are per token (see /llm/v1/models for rates)

Text Embeddings

Convert text into numerical vectors for semantic search, clustering, and similarity comparisons.

Endpoint

POST /llm/v1/embeddings

API Key

POST

/llm/v1/embeddings

Request Body

Code Examples

curl -X POST https://api.case.dev/llm/v1/embeddings \
  -H "Authorization: Bearer sk_case_your_api_key_here" \
  -H "Content-Type: application/json" \
  -d '{
  "model": "openai/text-embedding-3-small",
  "input": "Plaintiff alleges negligence in post-operative care"
}'

Example Request

curl -X POST https://api.case.dev/llm/v1/embeddings \
  -H "Authorization: Bearer sk_case_your_api_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/text-embedding-3-small",
    "input": "Plaintiff alleges negligence in post-operative care"
  }'

Example Response

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [
        -0.016882256,
        0.02250519,
        -0.011252595
        // ... 1536 dimensions total
      ]
    }
  ],
  "model": "openai/text-embedding-3-small",
  "usage": {
    "prompt_tokens": 12,
    "total_tokens": 12
  }
}

Available Embedding Models

Model	Dimensions	Use Case	Cost per 1K tokens
`openai/text-embedding-3-small`	1536	General purpose, fast	$0.00002
`openai/text-embedding-3-large`	3072	Higher quality	$0.00013
`voyage/voyage-law-2`	1024	Legal documents (optimized)	$0.00012
`voyage/voyage-3.5`	1536	General purpose	$0.00006
`cohere/embed-v4.0`	1024	Multilingual	$0.00012

Batch Embeddings

Embed multiple texts in one request (more efficient):

{
  "model": "openai/text-embedding-3-small",
  "input": [
    "Medical record from January 2024",
    "Deposition transcript page 45",
    "Expert witness report summary"
  ]
}

Response:

{
  "data": [
    {"index": 0, "embedding": [...]},
    {"index": 1, "embedding": [...]},
    {"index": 2, "embedding": [...]},
  ]
}

Legal Use Cases

Semantic Document Search:

Embed all your case documents
Store embeddings in a vector database
When searching, embed the query
Find documents with similar embeddings

Case Clustering:

Group similar cases by embedding case summaries
Find patterns across depositions
Identify related medical incidents

Document Similarity:

# Embed two documents
curl -X POST https://api.case.dev/llm/v1/embeddings \
  -H "Authorization: Bearer sk_case_your_api_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "voyage/voyage-law-2",
    "input": [
      "Plaintiff expert testimony regarding standard of care",
      "Defense expert rebuttal on treatment protocols"
    ]
  }'

# Calculate cosine similarity between the two embeddings
# Similarity close to 1 = very similar, close to 0 = different

Smart Contract Review:

Embed contract clauses
Find similar precedents
Identify unusual terms by comparing to standard embeddings

Use Cases

How to use vaults in different scenarios

Best Practices

Tips and best practices for using LLMs