WebSocket API for async low latency audio translation designed for medical consultations.
Main Features:
Architecture: The system uses a pipeline architecture with two parallel branches:
Processing Flow:
Development server for audio streaming and orchestration.
⚠️ Connection Requirements (HTTP/1.1 Only): This endpoint strictly requires a standard HTTP/1.1 WebSocket Handshake.
Clients must send the following headers:
Connection: UpgradeUpgrade: websocketSec-WebSocket-Version: 13Sec-WebSocket-Key: [Base64-encoded 16-byte random key]Note: HTTP/2 is not supported for the handshake. If using clients like curl or strict proxies, you must force HTTP/1.1 or ensure the client does not attempt an HTTP/2 connection, as this will strip the required Upgrade headers.
Main WebSocket channel for audio streaming and receiving translation results.
Communication Sequence:
Timeouts:
WebSocket Close Codes (1XXX): The server may close the connection with the following standard WebSocket status codes:
1000 (Normal Closure):
1003 (Unsupported Data):
1008 (Policy Violation):
language_medic, language_patient).1011 (Internal Error):
Send configuration and audio
Available only on servers:
Accepts one of the following messages:
First JSON message sent by client to configure the session
French doctor → English patient consultation
{
"language_medic": "fr-FR",
"gender_medic": "female",
"language_patient": "en-GB",
"gender_patient": "male"
}
Consultation with Arabic-speaking patient
{
"language_medic": "fr-FR",
"gender_medic": "male",
"language_patient": "ar-SA",
"gender_patient": "male"
}
Raw audio data streamed continuously after handshake
Raw PCM audio (s16le, 16kHz, mono). Recommended size: 4096 bytes per chunk.
string
Main WebSocket channel for audio streaming and receiving translation results.
Communication Sequence:
Timeouts:
WebSocket Close Codes (1XXX): The server may close the connection with the following standard WebSocket status codes:
1000 (Normal Closure):
1003 (Unsupported Data):
1008 (Policy Violation):
language_medic, language_patient).1011 (Internal Error):
Receive transcription and translation results
Available only on servers:
Accepts one of the following messages:
Confirmation that server accepted the configuration
{
"type": "handshake_success",
"message": "Configuration received and accepted. Ready to receive audio stream.",
"medic_lang": "fr-FR",
"gender_medic": "female",
"patient_lang": "en-GB",
"gender_patient": "male"
}
ASR transcription of a detected sentence with speaker identification
{
"type": "asr_result",
"id": 1,
"text": "Bonjour, comment vous sentez-vous aujourd'hui ?",
"language": "fr-FR",
"speaker": "medic"
}
{
"type": "asr_result",
"id": 2,
"text": "I have a headache since yesterday",
"language": "en-GB",
"speaker": "patient"
}
Literal translation with back-translation and TTS audio
{
"type": "branch_a_result",
"sentence_id": 1,
"original_text": "Bonjour, comment vous sentez-vous aujourd'hui ?",
"translated_text": "Hello, how are you feeling today?",
"back_translated_text": "Bonjour, comment vous sentez-vous aujourd'hui ?",
"audio_format": "wav"
}
Translation with contextual reformulation, back-translation and TTS audio
{
"type": "branch_b_result",
"sentence_id": 1,
"reformulated_source": "Bonjour, pouvez-vous me décrire votre état de santé actuel ?",
"translated_reformulation": "Hello, can you describe your current health condition?",
"back_translated_reformulation": "Bonjour, pouvez-vous décrire votre état de santé actuel ?",
"audio_format": "wav"
}
{
"type": "error",
"service": "VAD",
"message": "VAD unavailable."
}
{
"type": "error",
"service": "Speech Recognition",
"message": "Speech recognition service is not responding. Please try again later."
}
{
"type": "error",
"service": "Translation Pipeline",
"message": "Translation service request timed out. Please try again."
}
Retrieve the dictionary of supported languages for the ASR engine. This is a standard HTTP GET request.
Get supported languages
Available only on servers:
Accepts the following message:
Sorted list of BCP-47 locale codes supported by the ASR engine
[
"ar-SA",
"de-DE",
"en-GB",
"fr-FR"
]
First JSON message sent by client to configure the session
Raw audio data streamed continuously after handshake
Raw PCM audio (s16le, 16kHz, mono). Recommended size: 4096 bytes per chunk.
Confirmation that server accepted the configuration
ASR transcription of a detected sentence with speaker identification
Literal translation with back-translation and TTS audio
Translation with contextual reformulation, back-translation and TTS audio
Error notification or service degradation
Sorted list of BCP-47 locale codes supported by the ASR engine