Transcription

Speech-to-text transcription with speaker diarization and PII redaction

Create Transcription

Submit audio or video files for transcription. Supports 100+ languages with speaker diarization, PII redaction, and advanced features.

Endpoint

POST /voice/transcription
POST
/voice/transcription
curl -X POST https://api.case.dev/voice/transcription \
  -H "Authorization: Bearer sk_case_your_api_key_here" \
  -H "Content-Type: application/json" \
  -d '{
  "audio_url": "https://your-storage.com/deposition-audio.m4a",
  "speaker_labels": true,
  "language_code": "en"
}'

Example Request

curl -X POST https://api.case.dev/voice/transcription \
  -H "Authorization: Bearer sk_case_your_api_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "audio_url": "https://your-storage.com/deposition-audio.m4a",
    "speaker_labels": true,
    "language_code": "en"
  }'

Example Response

{
  "id": "5f9c5c3e-1234-5678-9abc-def012345678",
  "status": "queued",
  "audio_url": "https://your-storage.com/deposition-audio.m4a",
  "language_code": "en",
  "speaker_labels": true
}

Request Parameters

Required:

  • audio_url (string): Publicly accessible URL to your audio/video file
    • Supports: M4A, MP3, MP4, WAV, FLAC, OGG, WebM, and more
    • Max file size: 5GB
    • Max duration: 7 hours

Optional:

  • language_code (string): Language for transcription (default: auto-detect)
    • en - English
    • es - Spanish
    • fr - French
    • de - German
    • pt - Portuguese
    • zh - Chinese
    • ja - Japanese
    • ... 100+ languages supported
  • speaker_labels (boolean): Enable speaker diarization (default: false)
    • Identifies different speakers as "Speaker A", "Speaker B", etc.
    • Perfect for depositions, interviews, meetings
  • auto_highlights (boolean): Automatically detect key phrases (default: false)
    • Identifies important moments in the audio
  • content_safety_labels (boolean): Detect sensitive content (default: false)
    • Flags potentially sensitive topics
  • redact_pii (boolean): Redact personally identifiable information (default: false)
    • Removes names, addresses, SSNs, credit cards, etc.
    • Essential for HIPAA compliance
  • redact_pii_policies (array): Specific PII types to redact
    • Options: name, address, email, phone_number, ssn, credit_card, date_of_birth, medical, bank_account
  • webhook_url (string): URL to receive completion notification
    • Webhook called when transcription completes
    • Recommended for long audio files
  • language_detection (boolean): Detect language automatically (default: false)
    • Useful for multilingual audio

Get Transcription Status

Retrieve transcription status and completed transcript.

Endpoint

GET /voice/transcription/:id
GET
/voice/transcription/5f9c5c3e-1234-5678-9abc-def012345678
curl -X GET https://api.case.dev/voice/transcription/5f9c5c3e-1234-5678-9abc-def012345678 \
  -H "Authorization: Bearer sk_case_your_api_key_here" \
  -H "Content-Type: application/json"

Example Request

curl https://api.case.dev/voice/transcription/5f9c5c3e-1234-5678-9abc-def012345678 \
  -H "Authorization: Bearer sk_case_your_api_key_here"

Example Response (Processing)

{
  "id": "5f9c5c3e-1234-5678-9abc-def012345678",
  "status": "processing",
  "audio_url": "https://your-storage.com/deposition-audio.m4a",
  "audio_duration": 3847.2
}

Example Response (Completed)

{
  "id": "5f9c5c3e-1234-5678-9abc-def012345678",
  "status": "completed",
  "audio_url": "https://your-storage.com/deposition-audio.m4a",
  "text": "Speaker A: Please state your name for the record. Speaker B: My name is Dr. Sarah Johnson...",
  "words": [
    {
      "text": "Please",
      "start": 100,
      "end": 350,
      "confidence": 0.99,
      "speaker": "A"
    },
    {
      "text": "state",
      "start": 400,
      "end": 650,
      "confidence": 0.98,
      "speaker": "A"
    }
  ],
  "utterances": [
    {
      "text": "Please state your name for the record.",
      "start": 100,
      "end": 2450,
      "confidence": 0.97,
      "speaker": "A"
    },
    {
      "text": "My name is Dr. Sarah Johnson.",
      "start": 3100,
      "end": 5200,
      "confidence": 0.96,
      "speaker": "B"
    }
  ],
  "audio_duration": 3847.2,
  "confidence": 0.95,
  "language_code": "en"
}

Status Values

  • queued: Job accepted, waiting to start
  • processing: Transcription in progress
  • completed: Finished successfully
  • error: Failed (check error message)

Response Fields (Completed)

  • text (string): Full transcript with speaker labels
  • words (array): Word-level timing and confidence
    • text: The word
    • start: Start time in milliseconds
    • end: End time in milliseconds
    • confidence: Accuracy score (0-1)
    • speaker: Speaker label if diarization enabled
  • utterances (array): Sentence-level speaker turns
    • Groups words into complete sentences per speaker
  • audio_duration (number): Duration in seconds
  • confidence (number): Overall transcription confidence
  • language_code (string): Detected or specified language

Deposition Transcription

curl -X POST https://api.case.dev/voice/transcription \
  -H "Authorization: Bearer sk_case_..." \
  -H "Content-Type: application/json" \
  -d '{
    "audio_url": "https://vault.s3.amazonaws.com/deposition-2024-1234.m4a",
    "speaker_labels": true,
    "language_code": "en",
    "redact_pii": true,
    "redact_pii_policies": ["name", "address", "ssn", "medical"],
    "webhook_url": "https://your-app.com/transcription-complete"
  }'

Perfect for:

  • Depositions with multiple speakers
  • Witness interviews
  • Client consultations
  • Court proceedings

Medical Record Audio Notes

curl -X POST https://api.case.dev/voice/transcription \
  -H "Authorization: Bearer sk_case_..." \
  -H "Content-Type: application/json" \
  -d '{
    "audio_url": "https://storage.com/doctor-notes.mp3",
    "redact_pii": true,
    "redact_pii_policies": ["name", "medical", "date_of_birth"],
    "content_safety_labels": true
  }'

HIPAA Compliant:

  • Automatically redacts PHI
  • Flags sensitive medical topics
  • Maintains compliance logs

Multilingual Witness Interviews

curl -X POST https://api.case.dev/voice/transcription \
  -H "Authorization: Bearer sk_case_..." \
  -H "Content-Type: application/json" \
  -d '{
    "audio_url": "https://storage.com/spanish-interview.m4a",
    "language_detection": true,
    "speaker_labels": true
  }'

Supports:

  • Automatic language detection
  • 100+ languages including Spanish, Mandarin, Arabic
  • Speaker labels work across languages

Processing Times

Audio DurationTypical Processing Time
5 minutes30-60 seconds
30 minutes2-4 minutes
1 hour5-8 minutes
3 hours15-25 minutes

Processing speed:

  • ~0.15x realtime (1 hour audio = ~9 minutes processing)
  • Higher accuracy than real-time transcription
  • Use webhooks for files over 30 minutes

Polling for Completion

For shorter audio files without webhooks:

#!/bin/bash
TRANSCRIPT_ID="5f9c5c3e-1234-5678-9abc-def012345678"

while true; do
  RESPONSE=$(curl -s https://api.case.dev/voice/transcription/$TRANSCRIPT_ID \
    -H "Authorization: Bearer sk_case_...")

  STATUS=$(echo "$RESPONSE" | jq -r '.status')
  echo "Status: $STATUS"

  if [ "$STATUS" = "completed" ]; then
    echo "Transcription complete!"
    echo "$RESPONSE" | jq -r '.text' > transcript.txt
    break
  elif [ "$STATUS" = "error" ]; then
    echo "Transcription failed"
    break
  fi

  sleep 5
done

Webhook Notifications

When transcription completes, we POST to your webhook_url:

{
  "transcript_id": "5f9c5c3e-1234-5678-9abc-def012345678",
  "status": "completed",
  "text": "Full transcript text...",
  "audio_duration": 1847.3,
  "confidence": 0.96
}

Webhook verification:

  • Includes X-Signature header with HMAC-SHA256
  • Verify requests are from CaseMark

Pricing

Per-minute pricing:

  • Voice transcription: $0.30 per minute ($18.00 per hour)

Example costs:

  • 1-hour deposition: $18.00
  • 3-hour medical interview: $54.00
  • 30-minute client call: $9.00

No additional charges for:

  • Language detection
  • Multiple languages
  • Webhook delivery
  • Word-level timestamps

Supported Audio Formats

  • M4A (recommended for iOS recordings)
  • MP3 (universal compatibility)
  • MP4 (video files - audio extracted)
  • WAV (uncompressed, highest quality)
  • FLAC (lossless compression)
  • OGG/Opus (web optimized)
  • WebM (browser recordings)
  • AMR (phone recordings)

Video formats supported:

  • MP4, MOV, AVI, MKV (audio extracted automatically)

Accuracy & Features

Industry-leading accuracy:

  • 95%+ for clear audio
  • 90%+ for phone/courtroom recordings
  • Works with background noise, accents, technical jargon

Advanced features:

  • Speaker diarization: Identify who said what
  • PII redaction: HIPAA/GDPR compliant
  • 100+ languages: Auto-detect or specify
  • Custom vocabulary: Coming soon for legal terms
  • Paragraph formatting: Natural text structure
  • Timestamps: Word and sentence level

Vault Integration

Transcribe audio files stored in vaults without downloading. The transcription API accepts S3 URLs directly for seamless integration.

Using Presigned URLs from Vault

# Get vault object with audio file
VAULT_ID="sytp1b5f5j1yuj7uffzzxgw6"
OBJECT_ID="audio123"

# Get presigned download URL (valid for 1 hour)
DOWNLOAD_URL=$(curl -s https://api.case.dev/vault/$VAULT_ID/objects/$OBJECT_ID \
  -H "Authorization: Bearer sk_case_..." \
  | jq -r '.downloadUrl')

# Submit for transcription
curl -X POST https://api.case.dev/voice/transcription \
  -H "Authorization: Bearer sk_case_..." \
  -H "Content-Type: application/json" \
  -d "{
    \"audio_url\": \"$DOWNLOAD_URL\",
    \"speaker_labels\": true,
    \"language_code\": \"en\"
  }"

Using Long-Lived Presigned URLs

For audio files that take longer to process, generate a presigned URL with extended expiry:

# Generate 24-hour presigned URL
PRESIGNED_RESPONSE=$(curl -s -X POST https://api.case.dev/vault/$VAULT_ID/objects/$OBJECT_ID/presigned-url \
  -H "Authorization: Bearer sk_case_..." \
  -H "Content-Type: application/json" \
  -d '{"operation": "GET", "expiresIn": 86400}')

AUDIO_URL=$(echo "$PRESIGNED_RESPONSE" | jq -r '.presignedUrl')

# Submit for transcription
curl -X POST https://api.case.dev/voice/transcription \
  -H "Authorization: Bearer sk_case_..." \
  -d "{\"audio_url\": \"$AUDIO_URL\", \"speaker_labels\": true}"

Complete Workflow: Upload → Transcribe → Store

#!/bin/bash
API_KEY="sk_case_your_api_key_here"
VAULT_ID="sytp1b5f5j1yuj7uffzzxgw6"
AUDIO_FILE="deposition-recording.m4a"

# Step 1: Upload audio to vault
UPLOAD_RESPONSE=$(curl -s -X POST https://api.case.dev/vault/$VAULT_ID/upload \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d "{
    \"filename\": \"$AUDIO_FILE\",
    \"contentType\": \"audio/m4a\",
    \"metadata\": {
      \"case\": \"2024-CV-1234\",
      \"type\": \"deposition\",
      \"date\": \"2024-11-10\"
    }
  }")

OBJECT_ID=$(echo "$UPLOAD_RESPONSE" | jq -r '.objectId')
UPLOAD_URL=$(echo "$UPLOAD_RESPONSE" | jq -r '.uploadUrl')

# Upload the file
curl -X PUT "$UPLOAD_URL" \
  -H "Content-Type: audio/m4a" \
  --data-binary "@$AUDIO_FILE"

echo "✓ Audio uploaded to vault: $OBJECT_ID"

# Step 2: Get download URL
DOWNLOAD_URL=$(curl -s https://api.case.dev/vault/$VAULT_ID/objects/$OBJECT_ID \
  -H "Authorization: Bearer $API_KEY" \
  | jq -r '.downloadUrl')

# Step 3: Submit for transcription
TRANSCRIPT_RESPONSE=$(curl -s -X POST https://api.case.dev/voice/transcription \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d "{
    \"audio_url\": \"$DOWNLOAD_URL\",
    \"speaker_labels\": true,
    \"language_code\": \"en\",
    \"redact_pii\": true
  }")

TRANSCRIPT_ID=$(echo "$TRANSCRIPT_RESPONSE" | jq -r '.id')
echo "✓ Transcription started: $TRANSCRIPT_ID"

# Step 4: Poll for completion
while true; do
  STATUS_RESPONSE=$(curl -s https://api.case.dev/voice/transcription/$TRANSCRIPT_ID \
    -H "Authorization: Bearer $API_KEY")

  STATUS=$(echo "$STATUS_RESPONSE" | jq -r '.status')
  echo "Transcription status: $STATUS"

  if [ "$STATUS" = "completed" ]; then
    echo "✓ Transcription complete!"

    # Save transcript
    echo "$STATUS_RESPONSE" | jq -r '.text' > transcript.txt

    # Step 5: Upload transcript back to vault
    TRANSCRIPT_UPLOAD=$(curl -s -X POST https://api.case.dev/vault/$VAULT_ID/upload \
      -H "Authorization: Bearer $API_KEY" \
      -H "Content-Type: application/json" \
      -d "{
        \"filename\": \"${AUDIO_FILE%.m4a}-transcript.txt\",
        \"contentType\": \"text/plain\",
        \"metadata\": {
          \"source_audio_id\": \"$OBJECT_ID\",
          \"transcript_id\": \"$TRANSCRIPT_ID\"
        }
      }")

    TRANSCRIPT_OBJECT_ID=$(echo "$TRANSCRIPT_UPLOAD" | jq -r '.objectId')
    TRANSCRIPT_UPLOAD_URL=$(echo "$TRANSCRIPT_UPLOAD" | jq -r '.uploadUrl')

    curl -X PUT "$TRANSCRIPT_UPLOAD_URL" \
      -H "Content-Type: text/plain" \
      --data-binary "@transcript.txt"

    echo "✓ Transcript uploaded to vault: $TRANSCRIPT_OBJECT_ID"
    break
  elif [ "$STATUS" = "error" ]; then
    echo "✗ Transcription failed"
    exit 1
  fi

  sleep 10
done

echo ""
echo "=== Complete! ==="
echo "Audio in vault: $OBJECT_ID"
echo "Transcript in vault: $TRANSCRIPT_OBJECT_ID"
echo "Transcript ID: $TRANSCRIPT_ID"

Key Benefits

No Downloads Required

  • Transcription service accesses audio directly from S3
  • Eliminates local file handling

Secure

  • Presigned URLs expire automatically
  • Audio files stay encrypted in vault

Integrated Workflow

  • Upload → Transcribe → Store all in one platform
  • Keep audio and transcripts together

Cost Effective

  • Avoid S3 egress charges
  • Pay only for transcription time

Streaming Transcription

Real-time speech-to-text for live audio streams via WebSocket. Get transcripts as you speak with ultra-low latency.

Endpoint

wss://casemark-ai--websocket-stream-helper-fastapi-app.modal.run/ws?token=sk_case_your_api_key_here

Features

Ultra-Fast Transcription

  • 300ms P50 latency on word emission
  • 91% word accuracy rate

  • Intelligent endpointing for turn detection

Pricing

  • $0.30 per minute ($18.00 per hour)
  • Same rate as async transcription
  • Unlimited concurrent streams
  • No setup fees or minimums

Use Cases

  • Live deposition transcription with real-time captions
  • Phone call transcription as conversations happen
  • Court proceeding transcription with live display
  • Voice agent applications with immediate feedback

Connect to Streaming

WebSocket Connection

Connect via WebSocket with your API key in the query string:

const token = 'sk_case_your_api_key_here';
const ws = new WebSocket(`wss://casemark-ai--websocket-stream-helper-fastapi-app.modal.run/ws?token=${token}`);

ws.onopen = () => {
  console.log('Connected to streaming transcription');
};

ws.onmessage = (event) => {
  const data = JSON.parse(event.data);
  console.log('Transcript:', data);
};

ws.onerror = (error) => {
  console.error('WebSocket error:', error);
};

ws.onclose = (event) => {
  console.log('Connection closed:', event.code, event.reason);
};

Authentication

Pass your API key as a query parameter:

  • Parameter: token
  • Format: ?token=sk_case_your_api_key_here
  • Required: Yes

The WebSocket connection will be rejected if:

  • No token is provided
  • Token is invalid or expired
  • API key doesn't have voice, transcription, or streaming permissions

Send Audio Data

Audio Format Requirements

Required format:

  • Encoding: PCM signed 16-bit little-endian
  • Sample rate: 16,000 Hz (16kHz)
  • Channels: Mono (1 channel)

Send Audio Frames

Send raw audio bytes as binary WebSocket messages:

// Example: Send audio from microphone
navigator.mediaDevices.getUserMedia({ audio: true })
  .then(stream => {
    const mediaRecorder = new MediaRecorder(stream);
    const audioContext = new AudioContext({ sampleRate: 16000 });
    
    mediaRecorder.ondataavailable = async (event) => {
      const audioData = await event.data.arrayBuffer();
      
      // Convert to PCM 16-bit if needed
      const pcmData = convertToPCM16(audioData);
      
      // Send to WebSocket
      if (ws.readyState === WebSocket.OPEN) {
        ws.send(pcmData);
      }
    };
    
    mediaRecorder.start(100); // Send every 100ms
  });

Audio Chunk Size

  • Recommended: 100-250ms chunks
  • Minimum: 50ms
  • Maximum: 1000ms

Smaller chunks = lower latency, but more overhead.


Receive Transcripts

Message Types

The WebSocket will send JSON messages with different types:

1. Session Begins

{
  "type": "session_begins",
  "session_id": "abc123",
  "message": "Streaming session started"
}

2. Partial Transcript (Real-time)

{
  "message_type": "PartialTranscript",
  "text": "Hello, this is a test",
  "audio_start": 0,
  "audio_end": 2000,
  "confidence": 0.95,
  "words": [
    {
      "text": "Hello",
      "start": 0,
      "end": 400,
      "confidence": 0.98
    },
    {
      "text": "this",
      "start": 400,
      "end": 600,
      "confidence": 0.97
    }
  ]
}

Partial transcripts are interim results that may change as more audio is processed.

3. Final Transcript (Immutable)

{
  "message_type": "FinalTranscript",
  "text": "Hello, this is a test.",
  "audio_start": 0,
  "audio_end": 2500,
  "confidence": 0.96,
  "punctuated": true,
  "words": [
    {
      "text": "Hello",
      "start": 0,
      "end": 400,
      "confidence": 0.98
    },
    {
      "text": "this",
      "start": 400,
      "end": 600,
      "confidence": 0.97
    },
    {
      "text": "is",
      "start": 600,
      "end": 750,
      "confidence": 0.96
    },
    {
      "text": "a",
      "start": 750,
      "end": 850,
      "confidence": 0.95
    },
    {
      "text": "test",
      "start": 850,
      "end": 2500,
      "confidence": 0.97
    }
  ]
}

Final transcripts are immutable and won't change. Use these for official records.

4. Session Terminated

{
  "message_type": "SessionTerminated"
}

Sent when the session ends (either by client or server).


End Session

Graceful Termination

Send a JSON message to end the session cleanly:

ws.send(JSON.stringify({ terminate: true }));

// Wait a moment for final transcripts
setTimeout(() => {
  ws.close();
}, 1000);

Automatic Timeout

Sessions automatically end after:

  • 5 minutes of silence (no audio data received)
  • Connection errors
  • Client disconnect

Complete Example

Node.js Client

const WebSocket = require('ws');
const fs = require('fs');

const token = 'sk_case_your_api_key_here';
const ws = new WebSocket(`wss://casemark-ai--websocket-stream-helper-fastapi-app.modal.run/ws?token=${token}`);

ws.on('open', () => {
  console.log('✓ Connected to streaming transcription');
  
  // Read audio file (16kHz, PCM 16-bit, mono)
  const audioFile = fs.readFileSync('./audio.raw');
  
  // Send in chunks (100ms at 16kHz = 3200 bytes)
  const chunkSize = 3200;
  let offset = 0;
  
  const sendChunk = () => {
    if (offset < audioFile.length) {
      const chunk = audioFile.slice(offset, offset + chunkSize);
      ws.send(chunk);
      offset += chunkSize;
      setTimeout(sendChunk, 100); // Send every 100ms
    } else {
      // End of audio
      console.log('✓ Audio sent, waiting for final transcripts...');
      ws.send(JSON.stringify({ terminate: true }));
      setTimeout(() => ws.close(), 2000);
    }
  };
  
  sendChunk();
});

ws.on('message', (data) => {
  const message = JSON.parse(data.toString());
  
  if (message.message_type === 'FinalTranscript') {
    console.log('Final:', message.text);
  } else if (message.message_type === 'PartialTranscript') {
    console.log('Partial:', message.text);
  } else {
    console.log('Message:', message);
  }
});

ws.on('error', (error) => {
  console.error('WebSocket error:', error);
});

ws.on('close', (code, reason) => {
  console.log(`Connection closed: ${code} ${reason}`);
});

Browser Client

// Get microphone access
const stream = await navigator.mediaDevices.getUserMedia({
  audio: {
    sampleRate: 16000,
    channelCount: 1,
    echoCancellation: true,
    noiseSuppression: true,
  }
});

const token = 'sk_case_your_api_key_here';
const ws = new WebSocket(`wss://casemark-ai--websocket-stream-helper-fastapi-app.modal.run/ws?token=${token}`);

// Set up AudioWorklet for PCM processing
const audioContext = new AudioContext({ sampleRate: 16000 });
const source = audioContext.createMediaStreamSource(stream);

await audioContext.audioWorklet.addModule('/audio-processor.js');
const processor = new AudioWorkletNode(audioContext, 'pcm-processor');

processor.port.onmessage = (event) => {
  // Send PCM data to WebSocket
  if (ws.readyState === WebSocket.OPEN) {
    ws.send(event.data);
  }
};

source.connect(processor);
processor.connect(audioContext.destination);

// Handle transcripts
ws.onmessage = (event) => {
  const data = JSON.parse(event.data);
  
  if (data.message_type === 'FinalTranscript') {
    document.getElementById('transcript').textContent += data.text + ' ';
  } else if (data.message_type === 'PartialTranscript') {
    document.getElementById('interim').textContent = data.text;
  }
};

// Stop recording
document.getElementById('stop').onclick = () => {
  ws.send(JSON.stringify({ terminate: true }));
  stream.getTracks().forEach(track => track.stop());
  audioContext.close();
  ws.close();
};

Error Handling

Error Messages

The WebSocket may send error messages:

{
  "error": "Authentication failed",
  "code": "AUTH_INVALID"
}

Common error codes:

  • AUTH_INVALID - Invalid or missing API key
  • PERMISSION_DENIED - API key lacks streaming permissions
  • SERVICE_UNAVAILABLE - Streaming service temporarily down
  • UPSTREAM_ERROR - AssemblyAI service error
  • PROCESSING_ERROR - Failed to process audio data
  • SESSION_NOT_FOUND - No active session

Close Codes

WebSocket close codes indicate why the connection ended:

CodeReasonDescription
1000Normal closureSession ended normally
1008Policy violationAuthentication failed or insufficient perms
1011Internal errorServer error (temporary)
4000Bad audio formatAudio doesn't meet requirements
4001Rate limit exceededToo many concurrent connections

Best Practices

Audio Quality

Optimize for accuracy:

  • Use noise cancellation when capturing microphone input
  • Minimize background noise
  • Use high-quality microphones for depositions
  • Test with your specific audio setup

Latency Optimization

Minimize end-to-end latency:

  • Send smaller chunks (100ms) for real-time display
  • Use wired internet connection (not WiFi when possible)
  • Host your application close to your users
  • Process partial transcripts for immediate feedback

Error Recovery

Handle transient failures:

let reconnectAttempts = 0;
const maxReconnects = 3;

function connect() {
  const ws = new WebSocket(`wss://casemark-ai--websocket-stream-helper-fastapi-app.modal.run/ws?token=${token}`);
  
  ws.onclose = (event) => {
    if (event.code !== 1000 && reconnectAttempts < maxReconnects) {
      reconnectAttempts++;
      console.log(`Reconnecting... (${reconnectAttempts}/${maxReconnects})`);
      setTimeout(connect, 1000 * reconnectAttempts);
    }
  };
  
  ws.onopen = () => {
    reconnectAttempts = 0; // Reset on successful connection
  };
  
  return ws;
}

Usage Tracking

Monitor your usage:

  • Track connection duration to estimate costs
  • Implement automatic disconnection after inactivity
  • Set session time limits for budget control
  • Monitor concurrent connections
const startTime = Date.now();

ws.onclose = () => {
  const durationSeconds = (Date.now() - startTime) / 1000;
  const cost = (durationSeconds / 60) * 0.30; // $0.30 per minute
  console.log(`Session duration: ${durationSeconds}s, Est. cost: $${cost.toFixed(2)}`);
};

Comparison: Async vs Streaming

FeatureAsync TranscriptionStreaming Transcription
LatencyMinutes300ms
ProtocolHTTP RESTWebSocket
Use CasePre-recorded filesLive audio
Pricing$0.30/minute$0.30/minute
InputAudio URLRaw audio stream
OutputComplete transcriptProgressive transcripts
Speaker Labels✓ YesComing soon
Auto Highlights✓ Yes✗ No
Content Safety✓ Yes✗ No

When to use async:

  • Transcribing pre-recorded depositions
  • Batch processing multiple files
  • Need speaker diarization or advanced features

When to use streaming:

  • Live courtroom transcription
  • Real-time phone call transcription
  • Voice assistant applications
  • Interactive voice agents