Skip to main content
This is the core endpoint for all AI-powered features — summarization, extraction, analysis, drafting.
Endpoint
POST /llm/v1/chat/completions
curl -X POST https://api.case.dev/llm/v1/chat/completions \
  -H "Authorization: Bearer sk_case_YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-sonnet-4.5",
    "messages": [
      {"role": "user", "content": "Summarize this deposition in 3 bullet points."}
    ]
  }'
Response
{
  "id": "gen_01K972J7KV4Y0MJZ3SRTA6YYMH",
  "object": "chat.completion",
  "model": "anthropic/claude-sonnet-4.5",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Here are the key points:\n\n• Witness testified that...\n• Documents reviewed include...\n• Timeline established from..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 245,
    "completion_tokens": 87,
    "total_tokens": 332,
    "cost": 0.000105
  }
}

Parameters

Required

ParameterTypeDescription
messagesarrayThe conversation. Each message has a role and content.

Optional

ParameterTypeDefaultDescription
modelstringcasemark/casemark-core-1Which model to use. See Models.
max_tokensnumber4096Maximum tokens to generate
temperaturenumber1Randomness (0-2). Use 0 for factual tasks.
streambooleanfalseStream response token-by-token
stoparraynullStop generation when these strings appear

Messages

Each message in the messages array:
FieldTypeDescription
rolestringsystem, user, or assistant
contentstringThe message text

System prompts

Set the AI’s behavior with a system message:
const response = await client.llm.v1.chat.createCompletion({
  model: 'anthropic/claude-sonnet-4.5',
  messages: [
    {
      role: 'system',
      content: 'You are a legal assistant. Be concise. Cite case law when relevant.'
    },
    {
      role: 'user',
      content: 'What are the elements of negligence?'
    }
  ]
});

Multi-turn conversations

Include previous messages to maintain context:
TypeScript
const response = await client.llm.v1.chat.createCompletion({
  model: 'openai/gpt-4o',
  messages: [
    { role: 'user', content: 'What is a deposition?' },
    { role: 'assistant', content: 'A deposition is sworn testimony taken outside of court...' },
    { role: 'user', content: 'How long do they typically last?' }
  ]
});

Streaming

Get responses token-by-token as they’re generated:
const stream = await client.llm.v1.chat.createCompletion({
  model: 'anthropic/claude-sonnet-4.5',
  messages: [{ role: 'user', content: 'Write a case summary.' }],
  stream: true
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || '');
}

Vision

Send images to models that support vision (Claude, GPT-4o):
TypeScript
const response = await client.llm.v1.chat.createCompletion({
  model: 'anthropic/claude-sonnet-4.5',
  messages: [
    {
      role: 'user',
      content: [
        { type: 'text', text: 'What medical equipment is visible in this image?' },
        { type: 'image_url', image_url: { url: 'https://example.com/exhibit-a.jpg' } }
      ]
    }
  ]
});

Usage and costs

Every response includes token counts and cost:
Response
{
  "usage": {
    "prompt_tokens": 1245,
    "completion_tokens": 387,
    "total_tokens": 1632,
    "cost": 0.004896
  }
}
Reduce costs: Use temperature: 0 for factual extraction. Try cheaper models like deepseek/deepseek-chat or qwen/qwen-2.5-72b-instruct for simpler tasks.

Common patterns

Deposition summary

TypeScript
const response = await client.llm.v1.chat.createCompletion({
  model: 'anthropic/claude-sonnet-4.5',
  messages: [
    {
      role: 'system',
      content: `Summarize depositions with:
1. Key admissions
2. Timeline of events
3. Credibility issues
4. Contradictions with other testimony`
    },
    { role: 'user', content: depositionText }
  ],
  temperature: 0.3,
  max_tokens: 2000
});

Contract clause extraction

TypeScript
const response = await client.llm.v1.chat.createCompletion({
  model: 'openai/gpt-4o',
  messages: [
    {
      role: 'system',
      content: 'Extract all indemnification clauses. Return JSON: [{clause_text, page, party_protected}]'
    },
    { role: 'user', content: contractText }
  ],
  temperature: 0
});

Medical record review

TypeScript
const response = await client.llm.v1.chat.createCompletion({
  model: 'anthropic/claude-opus-4',
  messages: [
    {
      role: 'system',
      content: 'You are a medical-legal expert. Identify standard-of-care deviations and timeline inconsistencies.'
    },
    { role: 'user', content: medicalRecords }
  ],
  max_tokens: 5000
});