API Reference

LLM API endpoints and operations

List Available Models

Get a complete list of all available AI models with pricing, capabilities, and specifications.

Endpoint

GET /llm/v1/models
GET
/llm/v1/models
curl -X GET https://api.case.dev/llm/v1/models \
  -H "Authorization: Bearer sk_case_your_api_key_here" \
  -H "Content-Type: application/json"

Example Request

curl https://api.case.dev/llm/v1/models \
  -H "Authorization: Bearer sk_case_your_api_key_here"

Example Response

{
  "object": "list",
  "data": [
    {
      "id": "anthropic/claude-sonnet-4.5",
      "object": "model",
      "created": 1755815280,
      "owned_by": "anthropic",
      "name": "Claude Sonnet 4.5",
      "description": "Claude Sonnet 4.5 is the newest model...",
      "context_window": 200000,
      "max_tokens": 64000,
      "type": "language",
      "tags": ["file-input", "reasoning", "tool-use", "vision"],
      "pricing": {
        "input": "0.000003",
        "output": "0.000015",
        "input_cache_read": "0.0000003",
        "input_cache_write": "0.00000375"
      }
    }
    // ... 130+ more models
  ]
}

Response Fields

  • id: Model identifier to use in chat completions (e.g., anthropic/claude-sonnet-4.5)
  • name: Human-readable model name
  • description: What the model is good at
  • context_window: Maximum input tokens the model can handle
  • max_tokens: Maximum output tokens per request
  • type: language for chat models, embedding for embedding models
  • tags: Capabilities like vision, tool-use, reasoning, file-input
  • pricing: Cost per token in USD
    • input: Cost per input token
    • output: Cost per output token
    • input_cache_read: Cost for reading from cache (if supported)
    • input_cache_write: Cost for writing to cache (if supported)

Use Cases

  • Choosing the right model: Compare capabilities and pricing
  • Cost estimation: Calculate expected costs for your use case
  • Feature discovery: Find models with specific capabilities (vision, tool use, etc.)

Chat Completions

Send messages to AI models and get intelligent responses. This is the main endpoint for conversational AI.

Endpoint

POST /llm/v1/chat/completions
POST
/llm/v1/chat/completions
curl -X POST https://api.case.dev/llm/v1/chat/completions \
  -H "Authorization: Bearer sk_case_your_api_key_here" \
  -H "Content-Type: application/json" \
  -d '{
  "model": "anthropic/claude-4.5-haiku",
  "messages": [
    {
      "role": "user",
      "content": "Summarize this deposition in 3 bullet points"
    }
  ],
  "max_tokens": 500
}'

Basic Example

curl -X POST https://api.case.dev/llm/v1/chat/completions \
  -H "Authorization: Bearer sk_case_your_api_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-4.5-haiku",
    "messages": [
      {"role": "user", "content": "Summarize this deposition in 3 bullet points"}
    ],
    "max_tokens": 500
  }'

Example Response

{
  "id": "gen_01K972J7KV4Y0MJZ3SRTA6YYMH",
  "object": "chat.completion",
  "created": 1762247909,
  "model": "anthropic/claude-4.5-haiku",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Here are the key points:\n\n• Witness testified about...\n• Documents reviewed include...\n• Timeline established from...",
        "provider_metadata": {
          "anthropic": {
            "usage": {
              "input_tokens": 245,
              "output_tokens": 87,
              "cache_creation_input_tokens": 0,
              "cache_read_input_tokens": 0
            }
          }
        }
      },
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 245,
    "completion_tokens": 87,
    "total_tokens": 332,
    "cost": 0.000105,
    "market_cost": 0.000105,
    "is_byok": true
  }
}

Request Parameters

Required:

  • model (string): Model ID from /llm/v1/models (e.g., anthropic/claude-sonnet-4.5)
  • messages (array): Conversation history
    • role: user, assistant, or system
    • content: Message text or multimodal content

Optional:

  • max_tokens (number): Maximum tokens to generate (default: 4096)
  • temperature (number): Randomness, 0-2 (default: 1)
  • top_p (number): Nucleus sampling, 0-1 (default: 1)
  • stream (boolean): Stream responses token-by-token (default: false)
  • stop (array): Stop sequences to end generation
  • presence_penalty (number): Penalize repeated topics, -2 to 2
  • frequency_penalty (number): Penalize repeated tokens, -2 to 2

Multi-Turn Conversations

Build context by including previous messages:

{
  "model": "openai/gpt-5",
  "messages": [
    { "role": "user", "content": "What is a deposition?" },
    { "role": "assistant", "content": "A deposition is sworn testimony..." },
    { "role": "user", "content": "How long do they typically last?" }
  ],
  "max_tokens": 200
}

System Prompts

Set behavior and context with system messages:

{
  "model": "anthropic/claude-sonnet-4.5",
  "messages": [
    {
      "role": "system",
      "content": "You are a legal assistant specializing in medical malpractice cases. Be concise and cite relevant case law when possible."
    },
    {
      "role": "user",
      "content": "Review this medical record and identify potential issues of negligence."
    }
  ]
}

Vision Models

Send images along with text (works with models tagged with vision):

{
  "model": "anthropic/claude-sonnet-4.5",
  "messages": [
    {
      "role": "user",
      "content": [
        { "type": "text", "text": "What medical equipment is visible in this photo?" },
        {
          "type": "image_url",
          "image_url": { "url": "https://example.com/hospital-room.jpg" }
        }
      ]
    }
  ]
}

Streaming Responses

Get responses token-by-token as they're generated:

curl -X POST https://api.case.dev/llm/v1/chat/completions \
  -H "Authorization: Bearer sk_case_your_api_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-5-mini",
    "messages": [{"role": "user", "content": "Write a detailed case summary"}],
    "stream": true
  }' \
  --no-buffer

Response format (Server-Sent Events):

data: {"id":"gen_123","choices":[{"delta":{"content":"The"}}]}

data: {"id":"gen_123","choices":[{"delta":{"content":" case"}}]}

data: {"id":"gen_123","choices":[{"delta":{"content":" involves"}}]}

data: [DONE]

Document Analysis:

{
  "model": "anthropic/claude-sonnet-4.5",
  "messages": [
    {
      "role": "system",
      "content": "Extract key facts, dates, and parties from legal documents."
    },
    {
      "role": "user",
      "content": "Document text here..."
    }
  ],
  "max_tokens": 2000
}

Deposition Summarization:

{
  "model": "openai/gpt-5",
  "messages": [
    {
      "role": "user",
      "content": "Summarize this 300-page deposition transcript, focusing on admissions and inconsistencies:\n\n[transcript text]"
    }
  ],
  "max_tokens": 4000
}

Medical Record Review:

{
  "model": "anthropic/claude-opus-4.1",
  "messages": [
    {
      "role": "system",
      "content": "You are a medical-legal expert. Identify standard-of-care deviations and temporal inconsistencies."
    },
    {
      "role": "user",
      "content": "[medical records text]"
    }
  ],
  "max_tokens": 5000
}

Cost Control

Monitor costs in real-time using the usage object in responses:

{
  "usage": {
    "prompt_tokens": 1245,
    "completion_tokens": 387,
    "total_tokens": 1632,
    "cost": 0.004896, // Your actual cost
    "market_cost": 0.004896, // Market rate cost
    "is_byok": true // Using your own API keys
  }
}

Cost calculation:

  • cost = (input_tokens × input_price) + (output_tokens × output_price)
  • Prices are per token (see /llm/v1/models for rates)

Text Embeddings

Convert text into numerical vectors for semantic search, clustering, and similarity comparisons.

Endpoint

POST /llm/v1/embeddings
POST
/llm/v1/embeddings
curl -X POST https://api.case.dev/llm/v1/embeddings \
  -H "Authorization: Bearer sk_case_your_api_key_here" \
  -H "Content-Type: application/json" \
  -d '{
  "model": "openai/text-embedding-3-small",
  "input": "Plaintiff alleges negligence in post-operative care"
}'

Example Request

curl -X POST https://api.case.dev/llm/v1/embeddings \
  -H "Authorization: Bearer sk_case_your_api_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/text-embedding-3-small",
    "input": "Plaintiff alleges negligence in post-operative care"
  }'

Example Response

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [
        -0.016882256,
        0.02250519,
        -0.011252595
        // ... 1536 dimensions total
      ]
    }
  ],
  "model": "openai/text-embedding-3-small",
  "usage": {
    "prompt_tokens": 12,
    "total_tokens": 12
  }
}

Available Embedding Models

ModelDimensionsUse CaseCost per 1K tokens
openai/text-embedding-3-small1536General purpose, fast$0.00002
openai/text-embedding-3-large3072Higher quality$0.00013
voyage/voyage-law-21024Legal documents (optimized)$0.00012
voyage/voyage-3.51536General purpose$0.00006
cohere/embed-v4.01024Multilingual$0.00012

Batch Embeddings

Embed multiple texts in one request (more efficient):

{
  "model": "openai/text-embedding-3-small",
  "input": [
    "Medical record from January 2024",
    "Deposition transcript page 45",
    "Expert witness report summary"
  ]
}

Response:

{
  "data": [
    {"index": 0, "embedding": [...]},
    {"index": 1, "embedding": [...]},
    {"index": 2, "embedding": [...]},
  ]
}

Semantic Document Search:

  1. Embed all your case documents
  2. Store embeddings in a vector database
  3. When searching, embed the query
  4. Find documents with similar embeddings

Case Clustering:

  • Group similar cases by embedding case summaries
  • Find patterns across depositions
  • Identify related medical incidents

Document Similarity:

# Embed two documents
curl -X POST https://api.case.dev/llm/v1/embeddings \
  -H "Authorization: Bearer sk_case_your_api_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "voyage/voyage-law-2",
    "input": [
      "Plaintiff expert testimony regarding standard of care",
      "Defense expert rebuttal on treatment protocols"
    ]
  }'

# Calculate cosine similarity between the two embeddings
# Similarity close to 1 = very similar, close to 0 = different

Smart Contract Review:

  • Embed contract clauses
  • Find similar precedents
  • Identify unusual terms by comparing to standard embeddings