Text-to-Speech
Convert text to professional audio with AI voices
Text-to-Speech Synthesis
Convert text to natural-sounding speech with professional voice actors. Perfect for generating audio versions of legal documents, client communications, or courtroom announcements.
List Available Voices
Get all available voices with filtering and search capabilities.
Endpoint
/voice/v1/voices?category=premade&page_size=20curl -X GET https://api.case.dev/voice/v1/voices?category=premade&page_size=20 \
-H "Authorization: Bearer sk_case_your_api_key_here" \
-H "Content-Type: application/json"Example Request
Example Response
Query Parameters
Pagination:
page_size(number): Results per page (default: 10, max: 100)next_page_token(string): Token from previous response for pagination
Search & Filtering:
search(string): Search across name, description, labels, and categoryvoice_type(string): Filter by typepersonal- Your custom voicescommunity- Community-created voicesdefault- ElevenLabs premade voices
category(string): Voice categorypremade- Professional voice actorscloned- Voice clonesgenerated- AI-generated voicesprofessional- Professional voice clones
Sorting:
sort(string): Sort field (created_at_unixorname)sort_direction(string):ascordesc
Advanced:
include_total_count(boolean): Include total count (impacts performance)voice_ids(array): Look up specific voice IDs (max 100)
Generate Speech
Convert text to audio using high-quality AI voices.
Endpoint
/voice/v1/speakcurl -X POST https://api.case.dev/voice/v1/speak \
-H "Authorization: Bearer sk_case_your_api_key_here" \
-H "Content-Type: application/json" \
-d '{
"text": "The court finds in favor of the plaintiff. The defendant is hereby ordered to pay damages in the amount of two million five hundred thousand dollars.",
"voice_id": "EXAVITQu4vr4xnSDxMaL",
"model_id": "eleven_multilingual_v2"
}'Example Request
Request Parameters
Required:
text(string): Text to convert to speech- Max length: 5000 characters
- Supports multiple paragraphs
Optional:
voice_id(string): Voice to use (default: Rachel -EXAVITQu4vr4xnSDxMaL)- Get available voices from
/voice/v1/voices
- Get available voices from
model_id(string): TTS model (default:eleven_multilingual_v2)eleven_multilingual_v2- Best quality, 29 languageseleven_turbo_v2- Fastest, low latencyeleven_monolingual_v1- English only, legacy
output_format(string): Audio format (default:mp3_44100_128)- MP3:
mp3_22050_32,mp3_44100_64,mp3_44100_128,mp3_44100_192 - PCM:
pcm_16000,pcm_22050,pcm_24000,pcm_44100 - Opus:
ulaw_8000
- MP3:
voice_settings(object): Fine-tune voice characteristicsstability(0-1): Consistency vs expressiveness (default: 0.5)similarity_boost(0-1): How close to original voice (default: 0.75)style(0-1): Exaggeration level (default: 0)speaker_boost(boolean): Enhance clarity (default: true)
optimize_streaming_latency(number): Latency optimization (0-4)0- Best quality4- Lowest latency (may mispronounce numbers/dates)
language_code(string): Language for multilingual modelsen- Englishes- Spanishfr- Frenchde- German- ... 29 languages supported
Response
The endpoint returns binary audio data directly. Save to a file:
Voice Settings Example
Fine-tune voice characteristics:
Voice Settings Guide:
- High stability (0.7-1.0): Consistent, professional tone for legal documents
- Low stability (0-0.3): More expressive, emotional for dramatic readings
- High similarity_boost: Closer to original voice actor
- Style: Add emphasis and variation (use sparingly for legal content)
- Speaker boost: Enhanced clarity, recommended for professional use
Streaming Speech
Generate audio with lower latency by streaming audio as it's generated.
Endpoint
/voice/v1/speak/streamcurl -X POST https://api.case.dev/voice/v1/speak/stream \
-H "Authorization: Bearer sk_case_your_api_key_here" \
-H "Content-Type: application/json" \
-d '{
"text": "All rise. The Honorable Judge Martinez presiding.",
"voice_id": "21m00Tcm4TlvDq8ikWAM",
"optimize_streaming_latency": 3
}'Example Request
Parameters
Same as /voice/v1/speak with additional streaming optimizations:
optimize_streaming_latency(number): Recommended for streaming0- Best quality, higher latency1- Balanced (recommended for streaming)2- Lower latency3- Very low latency (recommended for real-time)4- Minimum latency (may affect quality)
Response
Audio data streamed as it's generated. The browser/client can start playback immediately:
Use Cases
Real-time Announcements:
- Court session openings
- Live caption audio generation
- Interactive voice responses
Low-latency Applications:
- Phone systems
- Virtual assistants
- Live narration
Legal Use Cases
Court Announcements
Document Narration
Convert legal briefs or summaries to audio:
Multilingual Client Communications
Batch Document Processing
Generate audio for multiple sections:
Recommended Voices for Legal Use
Based on testing with legal content:
Professional & Authoritative:
- Rachel (
EXAVITQu4vr4xnSDxMaL) - Clear, professional female voice- Best for: Narration, client communications, document reading
- Adam (
21m00Tcm4TlvDq8ikWAM) - Deep, authoritative male voice- Best for: Court announcements, formal statements, judicial content
- Antoni (
ErXwobaYiN019PkySvjV) - Calm, well-articulated male voice- Best for: Long-form content, depositions, educational material
Clear & Neutral:
- Elli (
MF3mGyEYCl7XYWbV9V6O) - Clear, neutral female voice- Best for: Instructions, procedural content, automated responses
- Josh (
TxGEqnHWrfWFTfGW9XjX) - Professional male voice- Best for: Business communications, case summaries
Multilingual:
Use model_id: "eleven_multilingual_v2" with any voice for 29 languages including Spanish, French, German, Portuguese, Chinese, and Arabic.
Output Formats
MP3 Formats (Recommended)
mp3_44100_128(default) - CD quality, good file sizemp3_44100_192- Highest MP3 qualitymp3_44100_64- Smaller files, lower qualitymp3_22050_32- Smallest files, voice-only quality
PCM Formats (Uncompressed)
pcm_16000- 16kHz, phone qualitypcm_22050- 22kHz, standardpcm_44100- 44kHz, CD quality (requires Pro tier)
Other Formats
ulaw_8000- 8kHz μ-law encoding (telephony)
Recommendation for legal documents: Use mp3_44100_128 for best balance of quality and file size.
Pricing
Character-based pricing:
- Standard: $0.30 per 1K characters
- Turbo model: $0.30 per 1K characters (faster)
Example costs:
- 100-word paragraph (~500 chars): $0.15
- 1-page legal brief (~2000 chars): $0.60
- 10-page document (~20,000 chars): $6.00
No additional charges for:
- Different voices
- Voice settings customization
- Language selection
- Multiple audio formats
Advanced Features
Text Normalization
Control how text is processed before synthesis:
Output: "The payment of two million five hundred thousand dollars is due on January fifteenth, twenty twenty four"
Options:
on- Always normalizeoff- Never normalizeauto- Smart normalization (default)
Pronunciation Dictionaries
Add custom pronunciations for legal terms:
Context Stitching
For better flow across multiple requests:
Helps maintain consistent intonation across segmented documents.
Vault Integration
Generate audio versions of documents stored in vaults.
Workflow: Document → Audio → Store
Use Cases
Accessibility:
- Generate audio versions of legal documents for visually impaired clients
- Create spoken summaries of lengthy depositions
Client Communications:
- Convert case updates to audio messages
- Generate multilingual client notifications
Training Materials:
- Create audio training modules
- Narrate compliance procedures