Chat completions

This is the core endpoint for all AI-powered features — summarization, extraction, analysis, drafting.

Endpoint

POST /llm/v1/chat/completions

curl -X POST https://api.case.dev/llm/v1/chat/completions \
  -H "Authorization: Bearer sk_case_YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-sonnet-4.5",
    "messages": [
      {"role": "user", "content": "Summarize this deposition in 3 bullet points."}
    ]
  }'

Response

{
  "id": "gen_01K972J7KV4Y0MJZ3SRTA6YYMH",
  "object": "chat.completion",
  "model": "anthropic/claude-sonnet-4.5",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Here are the key points:\n\n• Witness testified that...\n• Documents reviewed include...\n• Timeline established from..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 245,
    "completion_tokens": 87,
    "total_tokens": 332,
    "cost": 0.000105
  }
}

Parameters

Required

Parameter	Type	Description
`messages`	array	The conversation. Each message has a `role` and `content`.

Optional

Parameter	Type	Default	Description
`model`	string	`casemark/casemark-core-1`	Which model to use. Browse all 195+ models →
`max_tokens`	number	4096	Maximum tokens to generate
`temperature`	number	1	Randomness (0-2). Use 0 for factual tasks.
`stream`	boolean	false	Stream response token-by-token
`stop`	array	null	Stop generation when these strings appear

Messages

Each message in the messages array:

Field	Type	Description
`role`	string	`system`, `user`, or `assistant`
`content`	string	The message text

System prompts

Set the AI’s behavior with a system message:

const response = await client.llm.v1.chat.createCompletion({
  model: 'anthropic/claude-sonnet-4.5',
  messages: [
    {
      role: 'system',
      content: 'You are a legal assistant. Be concise. Cite case law when relevant.'
    },
    {
      role: 'user',
      content: 'What are the elements of negligence?'
    }
  ]
});

Multi-turn conversations

Include previous messages to maintain context:

const response = await client.llm.v1.chat.createCompletion({
  model: 'openai/gpt-4o',
  messages: [
    { role: 'user', content: 'What is a deposition?' },
    { role: 'assistant', content: 'A deposition is sworn testimony taken outside of court...' },
    { role: 'user', content: 'How long do they typically last?' }
  ]
});

Streaming

Get responses token-by-token as they’re generated:

const stream = await client.llm.v1.chat.createCompletion({
  model: 'anthropic/claude-sonnet-4.5',
  messages: [{ role: 'user', content: 'Write a case summary.' }],
  stream: true
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || '');
}

Vision

Send images to models that support vision (Claude, GPT-4o):

TypeScript

const response = await client.llm.v1.chat.createCompletion({
  model: 'anthropic/claude-sonnet-4.5',
  messages: [
    {
      role: 'user',
      content: [
        { type: 'text', text: 'What medical equipment is visible in this image?' },
        { type: 'image_url', image_url: { url: 'https://example.com/exhibit-a.jpg' } }
      ]
    }
  ]
});

Usage and costs

Every response includes token counts and cost:

Response

{
  "usage": {
    "prompt_tokens": 1245,
    "completion_tokens": 387,
    "total_tokens": 1632,
    "cost": 0.004896
  }
}

Reduce costs: Use temperature: 0 for factual extraction. Try cheaper models like deepseek/deepseek-chat or qwen/qwen-2.5-72b-instruct for simpler tasks.

Common patterns

Deposition summary

const response = await client.llm.v1.chat.createCompletion({
  model: 'anthropic/claude-sonnet-4.5',
  messages: [
    {
      role: 'system',
      content: `Summarize depositions with:
1. Key admissions
2. Timeline of events
3. Credibility issues
4. Contradictions with other testimony`
    },
    { role: 'user', content: depositionText }
  ],
  temperature: 0.3,
  max_tokens: 2000
});

Contract clause extraction

const response = await client.llm.v1.chat.createCompletion({
  model: 'openai/gpt-4o',
  messages: [
    {
      role: 'system',
      content: 'Extract all indemnification clauses. Return JSON: [{clause_text, page, party_protected}]'
    },
    { role: 'user', content: contractText }
  ],
  temperature: 0
});

Medical record review

const response = await client.llm.v1.chat.createCompletion({
  model: 'anthropic/claude-opus-4',
  messages: [
    {
      role: 'system',
      content: 'You are a medical-legal expert. Identify standard-of-care deviations and timeline inconsistencies.'
    },
    { role: 'user', content: medicalRecords }
  ],
  max_tokens: 5000
});

Get Started

Platform

Resources

Parameters

Required

Optional

Messages

System prompts

Multi-turn conversations

Streaming

Vision

Usage and costs

Common patterns

Deposition summary

Contract clause extraction

Medical record review

Get Started

Platform

Resources

​Parameters

​Required

​Optional

​Messages

​System prompts

​Multi-turn conversations

​Streaming

​Vision

​Usage and costs

​Common patterns

​Deposition summary

​Contract clause extraction

​Medical record review

Parameters

Required

Optional

Messages

System prompts

Multi-turn conversations

Streaming

Vision

Usage and costs

Common patterns

Deposition summary

Contract clause extraction

Medical record review