This is the core endpoint for all AI-powered features — summarization, extraction, analysis, drafting.
POST /llm/v1/chat/completions
cURL
TypeScript
Python
Go
CLI
curl -X POST https://api.case.dev/llm/v1/chat/completions \
-H "Authorization: Bearer sk_case_YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "anthropic/claude-sonnet-4.5",
"messages": [
{"role": "user", "content": "Summarize this deposition in 3 bullet points."}
]
}'
{
"id" : "gen_01K972J7KV4Y0MJZ3SRTA6YYMH" ,
"object" : "chat.completion" ,
"model" : "anthropic/claude-sonnet-4.5" ,
"choices" : [
{
"index" : 0 ,
"message" : {
"role" : "assistant" ,
"content" : "Here are the key points: \n\n • Witness testified that... \n • Documents reviewed include... \n • Timeline established from..."
},
"finish_reason" : "stop"
}
],
"usage" : {
"prompt_tokens" : 245 ,
"completion_tokens" : 87 ,
"total_tokens" : 332 ,
"cost" : 0.000105
}
}
Parameters
Required
Parameter Type Description messagesarray The conversation. Each message has a role and content.
Optional
Parameter Type Default Description modelstring casemark/casemark-core-1Which model to use. Browse all 195+ models → max_tokensnumber 4096 Maximum tokens to generate temperaturenumber 1 Randomness (0-2). Use 0 for factual tasks. streamboolean false Stream response token-by-token stoparray null Stop generation when these strings appear
Messages
Each message in the messages array:
Field Type Description rolestring system, user, or assistantcontentstring The message text
System prompts
Set the AI’s behavior with a system message:
const response = await client.llm.v1.chat. createCompletion ({
model: 'anthropic/claude-sonnet-4.5' ,
messages: [
{
role: 'system' ,
content: 'You are a legal assistant. Be concise. Cite case law when relevant.'
},
{
role: 'user' ,
content: 'What are the elements of negligence?'
}
]
});
Multi-turn conversations
Include previous messages to maintain context:
const response = await client.llm.v1.chat. createCompletion ({
model: 'openai/gpt-4o' ,
messages: [
{ role: 'user' , content: 'What is a deposition?' },
{ role: 'assistant' , content: 'A deposition is sworn testimony taken outside of court...' },
{ role: 'user' , content: 'How long do they typically last?' }
]
});
Streaming
Get responses token-by-token as they’re generated:
const stream = await client.llm.v1.chat. createCompletion ({
model: 'anthropic/claude-sonnet-4.5' ,
messages: [{ role: 'user' , content: 'Write a case summary.' }],
stream: true
});
for await ( const chunk of stream) {
process.stdout. write (chunk.choices[ 0 ]?.delta?.content || '' );
}
Vision
Send images to models that support vision (Claude, GPT-4o):
const response = await client.llm.v1.chat. createCompletion ({
model: 'anthropic/claude-sonnet-4.5' ,
messages: [
{
role: 'user' ,
content: [
{ type: 'text' , text: 'What medical equipment is visible in this image?' },
{ type: 'image_url' , image_url: { url: 'https://example.com/exhibit-a.jpg' } }
]
}
]
});
Usage and costs
Every response includes token counts and cost:
{
"usage" : {
"prompt_tokens" : 1245 ,
"completion_tokens" : 387 ,
"total_tokens" : 1632 ,
"cost" : 0.004896
}
}
Reduce costs: Use temperature: 0 for factual extraction. Try cheaper models like deepseek/deepseek-chat or qwen/qwen-2.5-72b-instruct for simpler tasks.
Common patterns
Deposition summary
const response = await client.llm.v1.chat. createCompletion ({
model: 'anthropic/claude-sonnet-4.5' ,
messages: [
{
role: 'system' ,
content: `Summarize depositions with:
1. Key admissions
2. Timeline of events
3. Credibility issues
4. Contradictions with other testimony`
},
{ role: 'user' , content: depositionText }
],
temperature: 0.3 ,
max_tokens: 2000
});
const response = await client.llm.v1.chat. createCompletion ({
model: 'openai/gpt-4o' ,
messages: [
{
role: 'system' ,
content: 'Extract all indemnification clauses. Return JSON: [{clause_text, page, party_protected}]'
},
{ role: 'user' , content: contractText }
],
temperature: 0
});
Medical record review
const response = await client.llm.v1.chat. createCompletion ({
model: 'anthropic/claude-opus-4' ,
messages: [
{
role: 'system' ,
content: 'You are a medical-legal expert. Identify standard-of-care deviations and timeline inconsistencies.'
},
{ role: 'user' , content: medicalRecords }
],
max_tokens: 5000
});