Text-to-Speech

Convert text to professional audio with AI voices

Text-to-Speech Synthesis

Convert text to natural-sounding speech with professional voice actors. Perfect for generating audio versions of legal documents, client communications, or courtroom announcements.

List Available Voices

Get all available voices with filtering and search capabilities.

Endpoint

GET /voice/v1/voices
GET
/voice/v1/voices?category=premade&page_size=20
curl -X GET https://api.case.dev/voice/v1/voices?category=premade&page_size=20 \
  -H "Authorization: Bearer sk_case_your_api_key_here" \
  -H "Content-Type: application/json"

Example Request

curl https://api.case.dev/voice/v1/voices?category=premade \
  -H "Authorization: Bearer sk_case_your_api_key_here"

Example Response

{
  "voices": [
    {
      "voice_id": "EXAVITQu4vr4xnSDxMaL",
      "name": "Rachel",
      "category": "premade",
      "description": "Professional, clear, and authoritative female voice",
      "labels": {
        "accent": "american",
        "age": "young",
        "gender": "female",
        "use_case": "narration"
      },
      "preview_url": "https://..."
    },
    {
      "voice_id": "21m00Tcm4TlvDq8ikWAM",
      "name": "Adam",
      "category": "premade",
      "description": "Deep, professional male voice",
      "labels": {
        "accent": "american",
        "age": "middle_aged",
        "gender": "male",
        "use_case": "narration"
      }
    }
  ],
  "has_more": false,
  "total_count": 45
}

Query Parameters

Pagination:

  • page_size (number): Results per page (default: 10, max: 100)
  • next_page_token (string): Token from previous response for pagination

Search & Filtering:

  • search (string): Search across name, description, labels, and category
  • voice_type (string): Filter by type
    • personal - Your custom voices
    • community - Community-created voices
    • default - ElevenLabs premade voices
  • category (string): Voice category
    • premade - Professional voice actors
    • cloned - Voice clones
    • generated - AI-generated voices
    • professional - Professional voice clones

Sorting:

  • sort (string): Sort field (created_at_unix or name)
  • sort_direction (string): asc or desc

Advanced:

  • include_total_count (boolean): Include total count (impacts performance)
  • voice_ids (array): Look up specific voice IDs (max 100)

Generate Speech

Convert text to audio using high-quality AI voices.

Endpoint

POST /voice/v1/speak
POST
/voice/v1/speak
curl -X POST https://api.case.dev/voice/v1/speak \
  -H "Authorization: Bearer sk_case_your_api_key_here" \
  -H "Content-Type: application/json" \
  -d '{
  "text": "The court finds in favor of the plaintiff. The defendant is hereby ordered to pay damages in the amount of two million five hundred thousand dollars.",
  "voice_id": "EXAVITQu4vr4xnSDxMaL",
  "model_id": "eleven_multilingual_v2"
}'

Example Request

curl -X POST https://api.case.dev/voice/v1/speak \
  -H "Authorization: Bearer sk_case_your_api_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "The court finds in favor of the plaintiff.",
    "voice_id": "EXAVITQu4vr4xnSDxMaL"
  }' \
  -o judgment.mp3

Request Parameters

Required:

  • text (string): Text to convert to speech
    • Max length: 5000 characters
    • Supports multiple paragraphs

Optional:

  • voice_id (string): Voice to use (default: Rachel - EXAVITQu4vr4xnSDxMaL)
    • Get available voices from /voice/v1/voices
  • model_id (string): TTS model (default: eleven_multilingual_v2)
    • eleven_multilingual_v2 - Best quality, 29 languages
    • eleven_turbo_v2 - Fastest, low latency
    • eleven_monolingual_v1 - English only, legacy
  • output_format (string): Audio format (default: mp3_44100_128)
    • MP3: mp3_22050_32, mp3_44100_64, mp3_44100_128, mp3_44100_192
    • PCM: pcm_16000, pcm_22050, pcm_24000, pcm_44100
    • Opus: ulaw_8000
  • voice_settings (object): Fine-tune voice characteristics
    • stability (0-1): Consistency vs expressiveness (default: 0.5)
    • similarity_boost (0-1): How close to original voice (default: 0.75)
    • style (0-1): Exaggeration level (default: 0)
    • speaker_boost (boolean): Enhance clarity (default: true)
  • optimize_streaming_latency (number): Latency optimization (0-4)
    • 0 - Best quality
    • 4 - Lowest latency (may mispronounce numbers/dates)
  • language_code (string): Language for multilingual models
    • en - English
    • es - Spanish
    • fr - French
    • de - German
    • ... 29 languages supported

Response

The endpoint returns binary audio data directly. Save to a file:

curl -X POST https://api.case.dev/voice/v1/speak \
  -H "Authorization: Bearer sk_case_..." \
  -d '{"text": "Hello from CaseMark"}' \
  -o output.mp3

Voice Settings Example

Fine-tune voice characteristics:

curl -X POST https://api.case.dev/voice/v1/speak \
  -H "Authorization: Bearer sk_case_..." \
  -d '{
    "text": "The defendant failed to meet the standard of care.",
    "voice_id": "EXAVITQu4vr4xnSDxMaL",
    "voice_settings": {
      "stability": 0.7,
      "similarity_boost": 0.8,
      "style": 0.2,
      "speaker_boost": true
    }
  }' \
  -o statement.mp3

Voice Settings Guide:

  • High stability (0.7-1.0): Consistent, professional tone for legal documents
  • Low stability (0-0.3): More expressive, emotional for dramatic readings
  • High similarity_boost: Closer to original voice actor
  • Style: Add emphasis and variation (use sparingly for legal content)
  • Speaker boost: Enhanced clarity, recommended for professional use

Streaming Speech

Generate audio with lower latency by streaming audio as it's generated.

Endpoint

POST /voice/v1/speak/stream
POST
/voice/v1/speak/stream
curl -X POST https://api.case.dev/voice/v1/speak/stream \
  -H "Authorization: Bearer sk_case_your_api_key_here" \
  -H "Content-Type: application/json" \
  -d '{
  "text": "All rise. The Honorable Judge Martinez presiding.",
  "voice_id": "21m00Tcm4TlvDq8ikWAM",
  "optimize_streaming_latency": 3
}'

Example Request

curl -X POST https://api.case.dev/voice/v1/speak/stream \
  -H "Authorization: Bearer sk_case_..." \
  -H "Content-Type: application/json" \
  -d '{
    "text": "All rise. The court is now in session.",
    "optimize_streaming_latency": 3
  }' \
  -o announcement.mp3

Parameters

Same as /voice/v1/speak with additional streaming optimizations:

  • optimize_streaming_latency (number): Recommended for streaming
    • 0 - Best quality, higher latency
    • 1 - Balanced (recommended for streaming)
    • 2 - Lower latency
    • 3 - Very low latency (recommended for real-time)
    • 4 - Minimum latency (may affect quality)

Response

Audio data streamed as it's generated. The browser/client can start playback immediately:

// Browser example
const response = await fetch('/api/voice/v1/speak/stream', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({ text: 'Hello from the courtroom' })
});

const audioContext = new AudioContext();
const reader = response.body.getReader();

// Start playback as chunks arrive
while (true) {
  const { done, value } = await reader.read();
  if (done) break;

  // Decode and play chunk
  const audioData = await audioContext.decodeAudioData(value.buffer);
  // ... play audio
}

Use Cases

Real-time Announcements:

  • Court session openings
  • Live caption audio generation
  • Interactive voice responses

Low-latency Applications:

  • Phone systems
  • Virtual assistants
  • Live narration

Court Announcements

curl -X POST https://api.case.dev/voice/v1/speak \
  -H "Authorization: Bearer sk_case_..." \
  -d '{
    "text": "All rise. The Honorable Judge Sarah Martinez presiding. Court is now in session for Case Number 2024-CV-1234, Smith versus Hospital Medical Center.",
    "voice_id": "21m00Tcm4TlvDq8ikWAM",
    "voice_settings": {
      "stability": 0.9,
      "style": 0.1
    }
  }' \
  -o court-announcement.mp3

Document Narration

Convert legal briefs or summaries to audio:

curl -X POST https://api.case.dev/voice/v1/speak \
  -H "Authorization: Bearer sk_case_..." \
  -d '{
    "text": "Plaintiff alleges negligence in post-operative care, citing failure to monitor vital signs and delayed response to complications. Medical records show a pattern of inadequate staffing during overnight shifts.",
    "voice_id": "EXAVITQu4vr4xnSDxMaL",
    "model_id": "eleven_multilingual_v2"
  }' \
  -o case-summary.mp3

Multilingual Client Communications

curl -X POST https://api.case.dev/voice/v1/speak \
  -H "Authorization: Bearer sk_case_..." \
  -d '{
    "text": "Su caso ha sido archivado exitosamente. Nos pondremos en contacto con usted dentro de 48 horas.",
    "language_code": "es",
    "voice_id": "EXAVITQu4vr4xnSDxMaL"
  }' \
  -o client-message-spanish.mp3

Batch Document Processing

Generate audio for multiple sections:

#!/bin/bash
API_KEY="sk_case_your_api_key_here"
VOICE_ID="EXAVITQu4vr4xnSDxMaL"

# Array of sections
sections=(
  "Introduction to the case"
  "Plaintiff testimony summary"
  "Defense arguments overview"
  "Medical expert opinions"
  "Conclusion and verdict"
)

for i in "${!sections[@]}"; do
  echo "Generating section $((i+1))/${#sections[@]}: ${sections[$i]}"

  curl -s -X POST https://api.case.dev/voice/v1/speak \
    -H "Authorization: Bearer $API_KEY" \
    -H "Content-Type: application/json" \
    -d "{
      \"text\": \"${sections[$i]}\",
      \"voice_id\": \"$VOICE_ID\"
    }" \
    -o "section-$((i+1)).mp3"
done

echo "All sections generated!"

Based on testing with legal content:

Professional & Authoritative:

  • Rachel (EXAVITQu4vr4xnSDxMaL) - Clear, professional female voice
    • Best for: Narration, client communications, document reading
  • Adam (21m00Tcm4TlvDq8ikWAM) - Deep, authoritative male voice
    • Best for: Court announcements, formal statements, judicial content
  • Antoni (ErXwobaYiN019PkySvjV) - Calm, well-articulated male voice
    • Best for: Long-form content, depositions, educational material

Clear & Neutral:

  • Elli (MF3mGyEYCl7XYWbV9V6O) - Clear, neutral female voice
    • Best for: Instructions, procedural content, automated responses
  • Josh (TxGEqnHWrfWFTfGW9XjX) - Professional male voice
    • Best for: Business communications, case summaries

Multilingual: Use model_id: "eleven_multilingual_v2" with any voice for 29 languages including Spanish, French, German, Portuguese, Chinese, and Arabic.


Output Formats

  • mp3_44100_128 (default) - CD quality, good file size
  • mp3_44100_192 - Highest MP3 quality
  • mp3_44100_64 - Smaller files, lower quality
  • mp3_22050_32 - Smallest files, voice-only quality

PCM Formats (Uncompressed)

  • pcm_16000 - 16kHz, phone quality
  • pcm_22050 - 22kHz, standard
  • pcm_44100 - 44kHz, CD quality (requires Pro tier)

Other Formats

  • ulaw_8000 - 8kHz μ-law encoding (telephony)

Recommendation for legal documents: Use mp3_44100_128 for best balance of quality and file size.


Pricing

Character-based pricing:

  • Standard: $0.30 per 1K characters
  • Turbo model: $0.30 per 1K characters (faster)

Example costs:

  • 100-word paragraph (~500 chars): $0.15
  • 1-page legal brief (~2000 chars): $0.60
  • 10-page document (~20,000 chars): $6.00

No additional charges for:

  • Different voices
  • Voice settings customization
  • Language selection
  • Multiple audio formats

Advanced Features

Text Normalization

Control how text is processed before synthesis:

{
  "text": "The payment of $2,500,000 is due on 01/15/2024",
  "apply_text_normalization": "on"
}

Output: "The payment of two million five hundred thousand dollars is due on January fifteenth, twenty twenty four"

Options:

  • on - Always normalize
  • off - Never normalize
  • auto - Smart normalization (default)

Pronunciation Dictionaries

Add custom pronunciations for legal terms:

{
  "text": "The voir dire process began at 9 AM",
  "pronunciation_dictionary_locators": [
    {
      "pronunciation_dictionary_id": "dict_123",
      "version_id": "v1"
    }
  ]
}

Context Stitching

For better flow across multiple requests:

{
  "text": "This is the middle section.",
  "previous_text": "This was said before.",
  "next_text": "This comes after."
}

Helps maintain consistent intonation across segmented documents.


Vault Integration

Generate audio versions of documents stored in vaults.

Workflow: Document → Audio → Store

#!/bin/bash
API_KEY="sk_case_your_api_key_here"
VAULT_ID="sytp1b5f5j1yuj7uffzzxgw6"
OBJECT_ID="text-doc-123"

# Step 1: Get text document from vault
DOWNLOAD_URL=$(curl -s https://api.case.dev/vault/$VAULT_ID/objects/$OBJECT_ID \
  -H "Authorization: Bearer $API_KEY" \
  | jq -r '.downloadUrl')

# Download and extract text
TEXT_CONTENT=$(curl -s "$DOWNLOAD_URL")

# Step 2: Generate speech from text
curl -X POST https://api.case.dev/voice/v1/speak \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d "{
    \"text\": \"$TEXT_CONTENT\",
    \"voice_id\": \"EXAVITQu4vr4xnSDxMaL\",
    \"output_format\": \"mp3_44100_128\"
  }" \
  -o /tmp/audio-output.mp3

# Step 3: Upload audio back to vault
AUDIO_UPLOAD=$(curl -s -X POST https://api.case.dev/vault/$VAULT_ID/upload \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d "{
    \"filename\": \"case-summary-audio.mp3\",
    \"contentType\": \"audio/mpeg\",
    \"metadata\": {
      \"source_document_id\": \"$OBJECT_ID\",
      \"voice\": \"Rachel\",
      \"generated_at\": \"$(date -u +%Y-%m-%dT%H:%M:%SZ)\"
    }
  }")

AUDIO_OBJECT_ID=$(echo "$AUDIO_UPLOAD" | jq -r '.objectId')
UPLOAD_URL=$(echo "$AUDIO_UPLOAD" | jq -r '.uploadUrl')

curl -X PUT "$UPLOAD_URL" \
  -H "Content-Type: audio/mpeg" \
  --data-binary "@/tmp/audio-output.mp3"

echo "✓ Audio version stored in vault: $AUDIO_OBJECT_ID"

Use Cases

Accessibility:

  • Generate audio versions of legal documents for visually impaired clients
  • Create spoken summaries of lengthy depositions

Client Communications:

  • Convert case updates to audio messages
  • Generate multilingual client notifications

Training Materials:

  • Create audio training modules
  • Narrate compliance procedures