Text-to-Speech

Convert text to professional audio with AI voices

Text-to-Speech Synthesis

Convert text to natural-sounding speech with professional voice actors. Perfect for generating audio versions of legal documents, client communications, or courtroom announcements.

List Available Voices

Get all available voices with filtering and search capabilities.

Endpoint

GET /voice/v1/voices

API Key

GET

/voice/v1/voices?category=premade&page_size=20

Code Examples

curl -X GET https://api.case.dev/voice/v1/voices?category=premade&page_size=20 \
  -H "Authorization: Bearer sk_case_your_api_key_here" \
  -H "Content-Type: application/json"

Example Request

curl https://api.case.dev/voice/v1/voices?category=premade \
  -H "Authorization: Bearer sk_case_your_api_key_here"

Example Response

{
  "voices": [
    {
      "voice_id": "EXAVITQu4vr4xnSDxMaL",
      "name": "Rachel",
      "category": "premade",
      "description": "Professional, clear, and authoritative female voice",
      "labels": {
        "accent": "american",
        "age": "young",
        "gender": "female",
        "use_case": "narration"
      },
      "preview_url": "https://..."
    },
    {
      "voice_id": "21m00Tcm4TlvDq8ikWAM",
      "name": "Adam",
      "category": "premade",
      "description": "Deep, professional male voice",
      "labels": {
        "accent": "american",
        "age": "middle_aged",
        "gender": "male",
        "use_case": "narration"
      }
    }
  ],
  "has_more": false,
  "total_count": 45
}

Query Parameters

Pagination:

page_size (number): Results per page (default: 10, max: 100)
next_page_token (string): Token from previous response for pagination

Search & Filtering:

search (string): Search across name, description, labels, and category
voice_type (string): Filter by type
- personal - Your custom voices
- community - Community-created voices
- default - ElevenLabs premade voices
category (string): Voice category
- premade - Professional voice actors
- cloned - Voice clones
- generated - AI-generated voices
- professional - Professional voice clones

Sorting:

sort (string): Sort field (created_at_unix or name)
sort_direction (string): asc or desc

Advanced:

include_total_count (boolean): Include total count (impacts performance)
voice_ids (array): Look up specific voice IDs (max 100)

Generate Speech

Convert text to audio using high-quality AI voices.

Endpoint

POST /voice/v1/speak

API Key

POST

/voice/v1/speak

Request Body

Code Examples

curl -X POST https://api.case.dev/voice/v1/speak \
  -H "Authorization: Bearer sk_case_your_api_key_here" \
  -H "Content-Type: application/json" \
  -d '{
  "text": "The court finds in favor of the plaintiff. The defendant is hereby ordered to pay damages in the amount of two million five hundred thousand dollars.",
  "voice_id": "EXAVITQu4vr4xnSDxMaL",
  "model_id": "eleven_multilingual_v2"
}'

Example Request

curl -X POST https://api.case.dev/voice/v1/speak \
  -H "Authorization: Bearer sk_case_your_api_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "The court finds in favor of the plaintiff.",
    "voice_id": "EXAVITQu4vr4xnSDxMaL"
  }' \
  -o judgment.mp3

Request Parameters

Required:

text (string): Text to convert to speech
- Max length: 5000 characters
- Supports multiple paragraphs

Optional:

voice_id (string): Voice to use (default: Rachel - EXAVITQu4vr4xnSDxMaL)
- Get available voices from /voice/v1/voices
model_id (string): TTS model (default: eleven_multilingual_v2)
- eleven_multilingual_v2 - Best quality, 29 languages
- eleven_turbo_v2 - Fastest, low latency
- eleven_monolingual_v1 - English only, legacy
output_format (string): Audio format (default: mp3_44100_128)
- MP3: mp3_22050_32, mp3_44100_64, mp3_44100_128, mp3_44100_192
- PCM: pcm_16000, pcm_22050, pcm_24000, pcm_44100
- Opus: ulaw_8000
voice_settings (object): Fine-tune voice characteristics
- stability (0-1): Consistency vs expressiveness (default: 0.5)
- similarity_boost (0-1): How close to original voice (default: 0.75)
- style (0-1): Exaggeration level (default: 0)
- speaker_boost (boolean): Enhance clarity (default: true)
optimize_streaming_latency (number): Latency optimization (0-4)
- 0 - Best quality
- 4 - Lowest latency (may mispronounce numbers/dates)
language_code (string): Language for multilingual models
- en - English
- es - Spanish
- fr - French
- de - German
- ... 29 languages supported

Response

The endpoint returns binary audio data directly. Save to a file:

curl -X POST https://api.case.dev/voice/v1/speak \
  -H "Authorization: Bearer sk_case_..." \
  -d '{"text": "Hello from CaseMark"}' \
  -o output.mp3

Voice Settings Example

Fine-tune voice characteristics:

curl -X POST https://api.case.dev/voice/v1/speak \
  -H "Authorization: Bearer sk_case_..." \
  -d '{
    "text": "The defendant failed to meet the standard of care.",
    "voice_id": "EXAVITQu4vr4xnSDxMaL",
    "voice_settings": {
      "stability": 0.7,
      "similarity_boost": 0.8,
      "style": 0.2,
      "speaker_boost": true
    }
  }' \
  -o statement.mp3

Voice Settings Guide:

High stability (0.7-1.0): Consistent, professional tone for legal documents
Low stability (0-0.3): More expressive, emotional for dramatic readings
High similarity_boost: Closer to original voice actor
Style: Add emphasis and variation (use sparingly for legal content)
Speaker boost: Enhanced clarity, recommended for professional use

Streaming Speech

Generate audio with lower latency by streaming audio as it's generated.

Endpoint

POST /voice/v1/speak/stream

API Key

POST

/voice/v1/speak/stream

Request Body

Code Examples

curl -X POST https://api.case.dev/voice/v1/speak/stream \
  -H "Authorization: Bearer sk_case_your_api_key_here" \
  -H "Content-Type: application/json" \
  -d '{
  "text": "All rise. The Honorable Judge Martinez presiding.",
  "voice_id": "21m00Tcm4TlvDq8ikWAM",
  "optimize_streaming_latency": 3
}'

Example Request

curl -X POST https://api.case.dev/voice/v1/speak/stream \
  -H "Authorization: Bearer sk_case_..." \
  -H "Content-Type: application/json" \
  -d '{
    "text": "All rise. The court is now in session.",
    "optimize_streaming_latency": 3
  }' \
  -o announcement.mp3

Parameters

Same as /voice/v1/speak with additional streaming optimizations:

optimize_streaming_latency (number): Recommended for streaming
- 0 - Best quality, higher latency
- 1 - Balanced (recommended for streaming)
- 2 - Lower latency
- 3 - Very low latency (recommended for real-time)
- 4 - Minimum latency (may affect quality)

Response

Audio data streamed as it's generated. The browser/client can start playback immediately:

// Browser example
const response = await fetch('/api/voice/v1/speak/stream', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({ text: 'Hello from the courtroom' })
});

const audioContext = new AudioContext();
const reader = response.body.getReader();

// Start playback as chunks arrive
while (true) {
  const { done, value } = await reader.read();
  if (done) break;

  // Decode and play chunk
  const audioData = await audioContext.decodeAudioData(value.buffer);
  // ... play audio
}

Use Cases

Real-time Announcements:

Court session openings
Live caption audio generation
Interactive voice responses

Low-latency Applications:

Phone systems
Virtual assistants
Live narration

Legal Use Cases

Court Announcements

curl -X POST https://api.case.dev/voice/v1/speak \
  -H "Authorization: Bearer sk_case_..." \
  -d '{
    "text": "All rise. The Honorable Judge Sarah Martinez presiding. Court is now in session for Case Number 2024-CV-1234, Smith versus Hospital Medical Center.",
    "voice_id": "21m00Tcm4TlvDq8ikWAM",
    "voice_settings": {
      "stability": 0.9,
      "style": 0.1
    }
  }' \
  -o court-announcement.mp3

Document Narration

Convert legal briefs or summaries to audio:

curl -X POST https://api.case.dev/voice/v1/speak \
  -H "Authorization: Bearer sk_case_..." \
  -d '{
    "text": "Plaintiff alleges negligence in post-operative care, citing failure to monitor vital signs and delayed response to complications. Medical records show a pattern of inadequate staffing during overnight shifts.",
    "voice_id": "EXAVITQu4vr4xnSDxMaL",
    "model_id": "eleven_multilingual_v2"
  }' \
  -o case-summary.mp3

Multilingual Client Communications

curl -X POST https://api.case.dev/voice/v1/speak \
  -H "Authorization: Bearer sk_case_..." \
  -d '{
    "text": "Su caso ha sido archivado exitosamente. Nos pondremos en contacto con usted dentro de 48 horas.",
    "language_code": "es",
    "voice_id": "EXAVITQu4vr4xnSDxMaL"
  }' \
  -o client-message-spanish.mp3

Batch Document Processing

Generate audio for multiple sections:

#!/bin/bash
API_KEY="sk_case_your_api_key_here"
VOICE_ID="EXAVITQu4vr4xnSDxMaL"

# Array of sections
sections=(
  "Introduction to the case"
  "Plaintiff testimony summary"
  "Defense arguments overview"
  "Medical expert opinions"
  "Conclusion and verdict"
)

for i in "${!sections[@]}"; do
  echo "Generating section $((i+1))/${#sections[@]}: ${sections[$i]}"

  curl -s -X POST https://api.case.dev/voice/v1/speak \
    -H "Authorization: Bearer $API_KEY" \
    -H "Content-Type: application/json" \
    -d "{
      \"text\": \"${sections[$i]}\",
      \"voice_id\": \"$VOICE_ID\"
    }" \
    -o "section-$((i+1)).mp3"
done

echo "All sections generated!"

Recommended Voices for Legal Use

Based on testing with legal content:

Professional & Authoritative:

Rachel (EXAVITQu4vr4xnSDxMaL) - Clear, professional female voice
- Best for: Narration, client communications, document reading
Adam (21m00Tcm4TlvDq8ikWAM) - Deep, authoritative male voice
- Best for: Court announcements, formal statements, judicial content
Antoni (ErXwobaYiN019PkySvjV) - Calm, well-articulated male voice
- Best for: Long-form content, depositions, educational material

Clear & Neutral:

Elli (MF3mGyEYCl7XYWbV9V6O) - Clear, neutral female voice
- Best for: Instructions, procedural content, automated responses
Josh (TxGEqnHWrfWFTfGW9XjX) - Professional male voice
- Best for: Business communications, case summaries

Multilingual: Use model_id: "eleven_multilingual_v2" with any voice for 29 languages including Spanish, French, German, Portuguese, Chinese, and Arabic.

Output Formats

MP3 Formats (Recommended)

mp3_44100_128 (default) - CD quality, good file size
mp3_44100_192 - Highest MP3 quality
mp3_44100_64 - Smaller files, lower quality
mp3_22050_32 - Smallest files, voice-only quality

PCM Formats (Uncompressed)

pcm_16000 - 16kHz, phone quality
pcm_22050 - 22kHz, standard
pcm_44100 - 44kHz, CD quality (requires Pro tier)

Other Formats

ulaw_8000 - 8kHz μ-law encoding (telephony)

Recommendation for legal documents: Use mp3_44100_128 for best balance of quality and file size.

Pricing

Character-based pricing:

Standard: $0.30 per 1K characters
Turbo model: $0.30 per 1K characters (faster)

Example costs:

100-word paragraph (~500 chars): $0.15
1-page legal brief (~2000 chars): $0.60
10-page document (~20,000 chars): $6.00

No additional charges for:

Different voices
Voice settings customization
Language selection
Multiple audio formats

Advanced Features

Text Normalization

Control how text is processed before synthesis:

{
  "text": "The payment of $2,500,000 is due on 01/15/2024",
  "apply_text_normalization": "on"
}

Output: "The payment of two million five hundred thousand dollars is due on January fifteenth, twenty twenty four"

Options:

on - Always normalize
off - Never normalize
auto - Smart normalization (default)

Pronunciation Dictionaries

Add custom pronunciations for legal terms:

{
  "text": "The voir dire process began at 9 AM",
  "pronunciation_dictionary_locators": [
    {
      "pronunciation_dictionary_id": "dict_123",
      "version_id": "v1"
    }
  ]
}

Context Stitching

For better flow across multiple requests:

{
  "text": "This is the middle section.",
  "previous_text": "This was said before.",
  "next_text": "This comes after."
}

Helps maintain consistent intonation across segmented documents.

Vault Integration

Generate audio versions of documents stored in vaults.

Workflow: Document → Audio → Store

#!/bin/bash
API_KEY="sk_case_your_api_key_here"
VAULT_ID="sytp1b5f5j1yuj7uffzzxgw6"
OBJECT_ID="text-doc-123"

# Step 1: Get text document from vault
DOWNLOAD_URL=$(curl -s https://api.case.dev/vault/$VAULT_ID/objects/$OBJECT_ID \
  -H "Authorization: Bearer $API_KEY" \
  | jq -r '.downloadUrl')

# Download and extract text
TEXT_CONTENT=$(curl -s "$DOWNLOAD_URL")

# Step 2: Generate speech from text
curl -X POST https://api.case.dev/voice/v1/speak \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d "{
    \"text\": \"$TEXT_CONTENT\",
    \"voice_id\": \"EXAVITQu4vr4xnSDxMaL\",
    \"output_format\": \"mp3_44100_128\"
  }" \
  -o /tmp/audio-output.mp3

# Step 3: Upload audio back to vault
AUDIO_UPLOAD=$(curl -s -X POST https://api.case.dev/vault/$VAULT_ID/upload \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d "{
    \"filename\": \"case-summary-audio.mp3\",
    \"contentType\": \"audio/mpeg\",
    \"metadata\": {
      \"source_document_id\": \"$OBJECT_ID\",
      \"voice\": \"Rachel\",
      \"generated_at\": \"$(date -u +%Y-%m-%dT%H:%M:%SZ)\"
    }
  }")

AUDIO_OBJECT_ID=$(echo "$AUDIO_UPLOAD" | jq -r '.objectId')
UPLOAD_URL=$(echo "$AUDIO_UPLOAD" | jq -r '.uploadUrl')

curl -X PUT "$UPLOAD_URL" \
  -H "Content-Type: audio/mpeg" \
  --data-binary "@/tmp/audio-output.mp3"

echo "✓ Audio version stored in vault: $AUDIO_OBJECT_ID"

Use Cases

Accessibility:

Generate audio versions of legal documents for visually impaired clients
Create spoken summaries of lengthy depositions

Client Communications:

Convert case updates to audio messages
Generate multilingual client notifications

Training Materials:

Create audio training modules
Narrate compliance procedures

Transcription

Speech-to-text transcription with speaker diarization and PII redaction

Voice

Text-to-speech synthesis - Convert text to natural-sounding speech with multiple voice options.