Model Selection

Choosing the right model for your use case

Choosing the Right Model

  • Best: anthropic/claude-sonnet-4.5 - Excellent at long documents, nuanced reasoning
  • Fast: anthropic/claude-4.5-haiku - Quick summaries, fast turnaround
  • Deep Analysis: anthropic/claude-opus-4.1 - Most thorough, best for complex cases

For Coding/Structured Output

  • Best: openai/gpt-5-codex - Optimized for code generation
  • Fast: deepseek/deepseek-v3.1 - Great reasoning, very affordable

For Cost-Sensitive Tasks

  • Ultra-cheap: alibaba/qwen-3-32b - $0.0001/1M input tokens
  • Balanced: deepseek/deepseek-v3.1 - $0.0003/1M tokens (excellent quality)
  • Fast & Cheap: xai/grok-4-fast-non-reasoning - $0.0002/1M tokens

For Medical Records

  • Best: anthropic/claude-opus-4.1 - Understands medical terminology deeply
  • Vision: anthropic/claude-sonnet-4.5 - Can read charts, scans, handwriting
  • Budget: google/gemini-2.5-flash - Good reasoning at lower cost

For Reasoning Tasks

  • Best: openai/gpt-5-pro - Deepest thinking, multi-hour reasoning
  • Fast: openai/o4-mini - Quick reasoning for math/logic
  • Open: deepseek/deepseek-r1 - Transparent reasoning traces

Understanding Response Metadata

Provider Metadata

The API routes requests through multiple providers for reliability:

{
  "provider_metadata": {
    "gateway": {
      "routing": {
        "originalModelId": "anthropic/claude-4.5-haiku",
        "resolvedProvider": "anthropic",
        "fallbacksAvailable": ["vertexAnthropic", "bedrock"],
        "finalProvider": "anthropic",
        "attempts": [
          {
            "provider": "anthropic",
            "credentialType": "byok",
            "success": true,
            "statusCode": 200
          }
        ]
      },
      "cost": "0.0000336",
      "marketCost": "0.0000336"
    }
  }
}
  • originalModelId: The model you requested
  • resolvedProvider: Which provider fulfilled the request
  • fallbacksAvailable: Backup providers if primary fails
  • credentialType: byok (bring your own key) or system (CaseMark's keys)
  • attempts: Shows retry history if there were failures

Finish Reasons

  • stop: Model naturally completed the response
  • length: Hit max_tokens limit (response may be incomplete)
  • content_filter: Response blocked by safety filters
  • tool_calls: Model wants to call a function (for tool use)

Model Selection Cheat Sheet

TaskRecommended ModelWhy
Quick Q&Aanthropic/claude-4.5-haikuFast, cheap, smart enough
Deep legal analysisanthropic/claude-opus-4.1Best reasoning for complex cases
Medical recordsanthropic/claude-sonnet-4.5 + visionCan read charts, handwriting
Contract reviewopenai/gpt-5Excellent at structured extraction
Deposition summaryanthropic/claude-sonnet-4.5Great at long docs, nuance
Cost-sensitivedeepseek/deepseek-v3.110x cheaper, still very good
Coding/automationopenai/gpt-5-codexOptimized for code
Multi-hour reasoningopenai/gpt-5-proCan think for hours on hard problems
Embeddings (general)openai/text-embedding-3-smallFast, cheap, good quality
Embeddings (legal)voyage/voyage-law-2Optimized for legal text