This is the core endpoint for all AI-powered features — summarization, extraction, analysis, drafting.
POST /llm/v1/chat/completions
curl -X POST https://api.case.dev/llm/v1/chat/completions \
-H "Authorization: Bearer sk_case_YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "anthropic/claude-sonnet-4.5",
"messages": [
{"role": "user", "content": "Summarize this deposition in 3 bullet points."}
]
}'
{
"id": "gen_01K972J7KV4Y0MJZ3SRTA6YYMH",
"object": "chat.completion",
"model": "anthropic/claude-sonnet-4.5",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Here are the key points:\n\n• Witness testified that...\n• Documents reviewed include...\n• Timeline established from..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 245,
"completion_tokens": 87,
"total_tokens": 332,
"cost": 0.000105
}
}
Parameters
Required
| Parameter | Type | Description |
|---|
messages | array | The conversation. Each message has a role and content. |
Optional
| Parameter | Type | Default | Description |
|---|
model | string | casemark/casemark-core-1 | Which model to use. See Models. |
max_tokens | number | 4096 | Maximum tokens to generate |
temperature | number | 1 | Randomness (0-2). Use 0 for factual tasks. |
stream | boolean | false | Stream response token-by-token |
stop | array | null | Stop generation when these strings appear |
Messages
Each message in the messages array:
| Field | Type | Description |
|---|
role | string | system, user, or assistant |
content | string | The message text |
System prompts
Set the AI’s behavior with a system message:
const response = await client.llm.v1.chat.createCompletion({
model: 'anthropic/claude-sonnet-4.5',
messages: [
{
role: 'system',
content: 'You are a legal assistant. Be concise. Cite case law when relevant.'
},
{
role: 'user',
content: 'What are the elements of negligence?'
}
]
});
Multi-turn conversations
Include previous messages to maintain context:
const response = await client.llm.v1.chat.createCompletion({
model: 'openai/gpt-4o',
messages: [
{ role: 'user', content: 'What is a deposition?' },
{ role: 'assistant', content: 'A deposition is sworn testimony taken outside of court...' },
{ role: 'user', content: 'How long do they typically last?' }
]
});
Streaming
Get responses token-by-token as they’re generated:
const stream = await client.llm.v1.chat.createCompletion({
model: 'anthropic/claude-sonnet-4.5',
messages: [{ role: 'user', content: 'Write a case summary.' }],
stream: true
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content || '');
}
Vision
Send images to models that support vision (Claude, GPT-4o):
const response = await client.llm.v1.chat.createCompletion({
model: 'anthropic/claude-sonnet-4.5',
messages: [
{
role: 'user',
content: [
{ type: 'text', text: 'What medical equipment is visible in this image?' },
{ type: 'image_url', image_url: { url: 'https://example.com/exhibit-a.jpg' } }
]
}
]
});
Usage and costs
Every response includes token counts and cost:
{
"usage": {
"prompt_tokens": 1245,
"completion_tokens": 387,
"total_tokens": 1632,
"cost": 0.004896
}
}
Reduce costs: Use temperature: 0 for factual extraction. Try cheaper models like deepseek/deepseek-chat or qwen/qwen-2.5-72b-instruct for simpler tasks.
Common patterns
Deposition summary
const response = await client.llm.v1.chat.createCompletion({
model: 'anthropic/claude-sonnet-4.5',
messages: [
{
role: 'system',
content: `Summarize depositions with:
1. Key admissions
2. Timeline of events
3. Credibility issues
4. Contradictions with other testimony`
},
{ role: 'user', content: depositionText }
],
temperature: 0.3,
max_tokens: 2000
});
const response = await client.llm.v1.chat.createCompletion({
model: 'openai/gpt-4o',
messages: [
{
role: 'system',
content: 'Extract all indemnification clauses. Return JSON: [{clause_text, page, party_protected}]'
},
{ role: 'user', content: contractText }
],
temperature: 0
});
Medical record review
const response = await client.llm.v1.chat.createCompletion({
model: 'anthropic/claude-opus-4',
messages: [
{
role: 'system',
content: 'You are a medical-legal expert. Identify standard-of-care deviations and timeline inconsistencies.'
},
{ role: 'user', content: medicalRecords }
],
max_tokens: 5000
});