Orchestrator WebSocket API 1.3.0 documentation

Orchestrator WebSocket API 1.3.0

Proprietary
ulysse@aalia.tech

WebSocket API for async low latency audio translation designed for medical consultations.

Main Features:

Speech-to-Text (ASR)
Voice Activity Detection (VAD)
Bidirectional async low latency translation
Contextual reformulation
Text-to-Speech (TTS)
Back-translation for validation

Architecture: The system uses a pipeline architecture with two parallel branches:

Branch A: Literal translation
Branch B: Translation with contextual reformulation

Processing Flow:

Client sends handshake with language/gender configuration
PCM audio streaming (16kHz, mono, 16bits)
Automatic sentence detection via VAD
ASR → Translation → TTS (parallel on 2 branches)
Results sent to client in JSON with Base64 audio

Servers

staging.fr.vokaalia.com/v1/orchestratorwsorchestrator
Development server for audio streaming and orchestration.

⚠️ Connection Requirements (HTTP/1.1 Only): This endpoint strictly requires a standard HTTP/1.1 WebSocket Handshake.

Clients must send the following headers:

Connection: Upgrade

Upgrade: websocket

Sec-WebSocket-Version: 13

Sec-WebSocket-Key: [Base64-encoded 16-byte random key]

Note: HTTP/2 is not supported for the handshake. If using clients like curl or strict proxies, you must force HTTP/1.1 or ensure the client does not attempt an HTTP/2 connection, as this will strip the required Upgrade headers.

Operations

PUB /ws/audio
Main WebSocket channel for audio streaming and receiving translation results.

Communication Sequence:

Client → Server: HandshakeRequest (JSON)

Server → Client: HandshakeSuccess (JSON)

Client → Server: AudioChunk (bytes) [continuous streaming]

Server → Client: ASRResult, BranchAResult, BranchBResult (JSON) [asynchronous]

Timeouts:

Handshake: 10 seconds

Client inactivity: 60 seconds

Max session: 9000 seconds

WebSocket Close Codes (1XXX): The server may close the connection with the following standard WebSocket status codes:

1000 (Normal Closure):

Handshake timeout (client didn't send configuration within 10s).

Client inactivity (no audio received for 60s).

1003 (Unsupported Data):

Invalid JSON format in handshake message.

1008 (Policy Violation):

Missing required fields in handshake (e.g., language_medic, language_patient).

Unsupported language: The provided language code is not supported by the ASR engine.

1011 (Internal Error):

Server processing error or failure to send handshake confirmation.
Send configuration and audio
Operation IDsendAudioStream
Available only on servers:
- orchestrator
Accepts one of the following messages:
- #0Initial session configuration
  First JSON message sent by client to configure the session
  application/json
  object
  language_medic
  required
  string
  Language code for doctor/healthcare provider. Accepted formats (all normalized to BCP-47 internally):
  
  BCP-47 locale: "fr-FR", "en-GB", "ar-SA"
  
  ISO 639-1: "fr", "en", "ar"
  
  ISO 639-3: "fra", "eng", "arb" When a bare ISO 639-1 code is provided for a language with regional variants, it resolves to the default locale (e.g. "en" → "en-GB").
  
  gender_medic
  string
  TTS voice gender for the doctor. Used for voice synthesis of translations intended for the patient.
  
  Allowed values:
  "male"
  "female"
  language_patient
  required
  string
  Language code for patient. Accepted formats (all normalized to BCP-47 internally):
  
  BCP-47 locale: "fr-FR", "en-GB", "ar-SA"
  
  ISO 639-1: "fr", "en", "ar"
  
  ISO 639-3: "fra", "eng", "arb" When a bare ISO 639-1 code is provided for a language with regional variants, it resolves to the default locale (e.g. "en" → "en-GB").
  
  gender_patient
  string
  TTS voice gender for the patient. Used for voice synthesis of translations intended for the doctor.
  
  Allowed values:
  "male"
  "female"
  patient_age
  string
  Age category of the patient. Used to adapt the reformulation style (e.g., simpler language for children). Defaults to "adult" if not provided.
  
  Allowed values:
  "adult"
  "child"
  "adolescent"
  "senior"
  tuple<string, string, ...optional<any>>
  1 item:
  language_medic and language_patient are required
  2 item:
  Connection is closed with code 1008 if missing
  Additional properties are allowed.
  Examples
  #1 Example - FrenchEnglishConsultation
  French doctor → English patient consultation
  { "language_medic": "fr-FR", "gender_medic": "female", "language_patient": "en-GB", "gender_patient": "male" }
  
  #2 Example - ArabicConsultation
  Consultation with Arabic-speaking patient
  { "language_medic": "fr-FR", "gender_medic": "male", "language_patient": "ar-SA", "gender_patient": "male" }
- #1PCM audio chunk
  Raw audio data streamed continuously after handshake
  application/octet-stream
  Payload
  string
  format: binary
  Raw PCM audio (s16le, 16kHz, mono). Recommended size: 4096 bytes per chunk.
  
  Examples
  string
  
  This example has been generated automatically.
SUB /ws/audio
Main WebSocket channel for audio streaming and receiving translation results.

Communication Sequence:

Client → Server: HandshakeRequest (JSON)

Server → Client: HandshakeSuccess (JSON)

Client → Server: AudioChunk (bytes) [continuous streaming]

Server → Client: ASRResult, BranchAResult, BranchBResult (JSON) [asynchronous]

Timeouts:

Handshake: 10 seconds

Client inactivity: 60 seconds

Max session: 9000 seconds

WebSocket Close Codes (1XXX): The server may close the connection with the following standard WebSocket status codes:

1000 (Normal Closure):

Handshake timeout (client didn't send configuration within 10s).

Client inactivity (no audio received for 60s).

1003 (Unsupported Data):

Invalid JSON format in handshake message.

1008 (Policy Violation):

Missing required fields in handshake (e.g., language_medic, language_patient).

Unsupported language: The provided language code is not supported by the ASR engine.

1011 (Internal Error):

Server processing error or failure to send handshake confirmation.
Receive transcription and translation results
Operation IDreceiveResults
Available only on servers:
- orchestrator
Accepts one of the following messages:
- #0Configuration confirmation
  Confirmation that server accepted the configuration
  application/json
  object
  type
  required
  string
  Message type
  
  Const:"handshake_success"
  message
  required
  string
  Confirmation message
  
  medic_lang
  required
  string
  Confirmed doctor language, normalized to BCP-47
  
  gender_medic
  string
  Confirmed doctor gender
  
  patient_lang
  required
  string
  Confirmed patient language, normalized to BCP-47
  
  gender_patient
  string
  Confirmed patient gender
  
  patient_age
  string
  Confirmed patient age category
  
  Additional properties are allowed.
  Examples
  #1 Example - SuccessfulConfirmation
  { "type": "handshake_success", "message": "Configuration received and accepted. Ready to receive audio stream.", "medic_lang": "fr-FR", "gender_medic": "female", "patient_lang": "en-GB", "gender_patient": "male" }
- #1Transcription result
  ASR transcription of a detected sentence with speaker identification
  application/json
  object
  type
  required
  string
  Message type
  
  Const:"asr_result"
  id
  required
  integer
  Unique sentence ID (incremental per session). Allows tracking processing in branches A and B.
  
  text
  required
  string
  Sentence transcription
  
  language
  required
  string
  Language automatically detected by the speech recognition engine, normalized to BCP-47 (e.g. "fr-FR", "en-GB").
  
  speaker
  required
  string
  Identified speaker role by comparing detected language with language_medic and language_patient from handshake.
  
  Allowed values:
  "medic"
  "patient"
  "unknown"
  Additional properties are allowed.
  Examples
  #1 Example - DoctorTranscription
  { "type": "asr_result", "id": 1, "text": "Bonjour, comment vous sentez-vous aujourd'hui ?", "language": "fr-FR", "speaker": "medic" }
  
  #2 Example - PatientTranscription
  { "type": "asr_result", "id": 2, "text": "I have a headache since yesterday", "language": "en-GB", "speaker": "patient" }
- #2Literal translation result (Branch A)
  Literal translation with back-translation and TTS audio
  application/json
  object
  type
  required
  string
  Message type
  
  Const:"branch_a_result"
  sentence_id
  required
  integer
  Sentence ID (corresponds to ASRResult.id)
  
  original_text
  required
  string
  Original transcribed text (copy of ASRResult.text)
  
  translated_text
  required
  string
  Literal translation to target language
  
  back_translated_text
  string
  Reverse translation to source language (for validation). May be null in case of translation service error.
  
  audio_data
  string
  format: byte
  Base64-encoded TTS audio (WAV format). May be null in case of TTS service error.
  
  audio_format
  string
  Audio format (always WAV)
  
  Const:"wav"
  Additional properties are allowed.
  Examples
  #1 Example - LiteralTranslation
  { "type": "branch_a_result", "sentence_id": 1, "original_text": "Bonjour, comment vous sentez-vous aujourd'hui ?", "translated_text": "Hello, how are you feeling today?", "back_translated_text": "Bonjour, comment vous sentez-vous aujourd'hui ?", "audio_format": "wav" }
- #3Reformulated translation result (Branch B)
  Translation with contextual reformulation, back-translation and TTS audio
  application/json
  object
  type
  required
  string
  Message type
  
  Const:"branch_b_result"
  sentence_id
  required
  integer
  Sentence ID
  
  reformulated_source
  required
  string
  Reformulated source text
  
  translated_reformulation
  required
  string
  Translation of the reformulation
  
  back_translated_reformulation
  string
  Reverse translation of the reformulation
  
  audio_data
  string
  format: byte
  Base64-encoded TTS audio
  
  audio_format
  string
  Audio format
  
  Const:"wav"
  Additional properties are allowed.
  Examples
  #1 Example - ReformulatedTranslation
  { "type": "branch_b_result", "sentence_id": 1, "reformulated_source": "Bonjour, pouvez-vous me décrire votre état de santé actuel ?", "translated_reformulation": "Hello, can you describe your current health condition?", "back_translated_reformulation": "Bonjour, pouvez-vous décrire votre état de santé actuel ?", "audio_format": "wav" }
- #4Error message
  Error notification or service degradation
  application/json
  object
  type
  required
  string
  Const:"error"
  service
  required
  string
  The service module that failed
  
  message
  required
  string
  Human readable error message
  
  sentence_id
  integer
  Optional ID of the sentence being processed when error occurred
  
  Additional properties are allowed.
  Examples
  #1 Example - VADServiceUnavailable
  { "type": "error", "service": "VAD", "message": "VAD unavailable." }
  
  #2 Example - TranscriptionError
  { "type": "error", "service": "Speech Recognition", "message": "Speech recognition service is not responding. Please try again later." }
  
  #3 Example - TranslationError
  { "type": "error", "service": "Translation Pipeline", "message": "Translation service request timed out. Please try again." }
SUB /languages
Retrieve the dictionary of supported languages for the ASR engine. This is a standard HTTP GET request.

Get supported languages
Operation IDgetLanguages
Available only on servers:
- orchestrator
Accepts the following message:
Supported Languages
Sorted list of BCP-47 locale codes supported by the ASR engine
application/json
array<string>
Items:
string
BCP-47 locale code (e.g. "fr-FR", "en-GB", "ar-SA")

Examples
[ "ar-SA", "de-DE", "en-GB", "fr-FR" ]

This example has been generated automatically.

Messages

#1Initial session configuration
First JSON message sent by client to configure the session
- application/json
Message IDHandshakeRequest
object
language_medic
required
string
Language code for doctor/healthcare provider. Accepted formats (all normalized to BCP-47 internally):

BCP-47 locale: "fr-FR", "en-GB", "ar-SA"

ISO 639-1: "fr", "en", "ar"

ISO 639-3: "fra", "eng", "arb" When a bare ISO 639-1 code is provided for a language with regional variants, it resolves to the default locale (e.g. "en" → "en-GB").

gender_medic
string
TTS voice gender for the doctor. Used for voice synthesis of translations intended for the patient.

Allowed values:
"male"
"female"
language_patient
required
string
Language code for patient. Accepted formats (all normalized to BCP-47 internally):

BCP-47 locale: "fr-FR", "en-GB", "ar-SA"

ISO 639-1: "fr", "en", "ar"

ISO 639-3: "fra", "eng", "arb" When a bare ISO 639-1 code is provided for a language with regional variants, it resolves to the default locale (e.g. "en" → "en-GB").

gender_patient
string
TTS voice gender for the patient. Used for voice synthesis of translations intended for the doctor.

Allowed values:
"male"
"female"
patient_age
string
Age category of the patient. Used to adapt the reformulation style (e.g., simpler language for children). Defaults to "adult" if not provided.

Allowed values:
"adult"
"child"
"adolescent"
"senior"
tuple<string, string, ...optional<any>>
1 item:
language_medic and language_patient are required
2 item:
Connection is closed with code 1008 if missing
Additional properties are allowed.
#2PCM audio chunk
Raw audio data streamed continuously after handshake
- application/octet-stream
Message IDAudioChunk
Payload
string
format: binary
Raw PCM audio (s16le, 16kHz, mono). Recommended size: 4096 bytes per chunk.
#3Configuration confirmation
Confirmation that server accepted the configuration
- application/json
Message IDHandshakeSuccess
object
type
required
string
Message type

Const:"handshake_success"
message
required
string
Confirmation message

medic_lang
required
string
Confirmed doctor language, normalized to BCP-47

gender_medic
string
Confirmed doctor gender

patient_lang
required
string
Confirmed patient language, normalized to BCP-47

gender_patient
string
Confirmed patient gender

patient_age
string
Confirmed patient age category

Additional properties are allowed.
#4Transcription result
ASR transcription of a detected sentence with speaker identification
- application/json
Message IDASRResult
object
type
required
string
Message type

Const:"asr_result"
id
required
integer
Unique sentence ID (incremental per session). Allows tracking processing in branches A and B.

text
required
string
Sentence transcription

language
required
string
Language automatically detected by the speech recognition engine, normalized to BCP-47 (e.g. "fr-FR", "en-GB").

speaker
required
string
Identified speaker role by comparing detected language with language_medic and language_patient from handshake.

Allowed values:
"medic"
"patient"
"unknown"
Additional properties are allowed.
#5Literal translation result (Branch A)
Literal translation with back-translation and TTS audio
- application/json
Message IDBranchAResult
object
type
required
string
Message type

Const:"branch_a_result"
sentence_id
required
integer
Sentence ID (corresponds to ASRResult.id)

original_text
required
string
Original transcribed text (copy of ASRResult.text)

translated_text
required
string
Literal translation to target language

back_translated_text
string
Reverse translation to source language (for validation). May be null in case of translation service error.

audio_data
string
format: byte
Base64-encoded TTS audio (WAV format). May be null in case of TTS service error.

audio_format
string
Audio format (always WAV)

Const:"wav"
Additional properties are allowed.
#6Reformulated translation result (Branch B)
Translation with contextual reformulation, back-translation and TTS audio
- application/json
Message IDBranchBResult
object
type
required
string
Message type

Const:"branch_b_result"
sentence_id
required
integer
Sentence ID

reformulated_source
required
string
Reformulated source text

translated_reformulation
required
string
Translation of the reformulation

back_translated_reformulation
string
Reverse translation of the reformulation

audio_data
string
format: byte
Base64-encoded TTS audio

audio_format
string
Audio format

Const:"wav"
Additional properties are allowed.
#7Error message
Error notification or service degradation
- application/json
Message IDErrorMessage
object
type
required
string
Const:"error"
service
required
string
The service module that failed

message
required
string
Human readable error message

sentence_id
integer
Optional ID of the sentence being processed when error occurred

Additional properties are allowed.
#8Supported Languages
Sorted list of BCP-47 locale codes supported by the ASR engine
- application/json
Message IDLanguagesResponse
array<string>
Items:
string
BCP-47 locale code (e.g. "fr-FR", "en-GB", "ar-SA")

Schemas

object
language_medic
required
string
Language code for doctor/healthcare provider. Accepted formats (all normalized to BCP-47 internally):

BCP-47 locale: "fr-FR", "en-GB", "ar-SA"

ISO 639-1: "fr", "en", "ar"

ISO 639-3: "fra", "eng", "arb" When a bare ISO 639-1 code is provided for a language with regional variants, it resolves to the default locale (e.g. "en" → "en-GB").

gender_medic
string
TTS voice gender for the doctor. Used for voice synthesis of translations intended for the patient.

Allowed values:
"male"
"female"
language_patient
required
string
Language code for patient. Accepted formats (all normalized to BCP-47 internally):

BCP-47 locale: "fr-FR", "en-GB", "ar-SA"

ISO 639-1: "fr", "en", "ar"

ISO 639-3: "fra", "eng", "arb" When a bare ISO 639-1 code is provided for a language with regional variants, it resolves to the default locale (e.g. "en" → "en-GB").

gender_patient
string
TTS voice gender for the patient. Used for voice synthesis of translations intended for the doctor.

Allowed values:
"male"
"female"
patient_age
string
Age category of the patient. Used to adapt the reformulation style (e.g., simpler language for children). Defaults to "adult" if not provided.

Allowed values:
"adult"
"child"
"adolescent"
"senior"
tuple<string, string, ...optional<any>>
1 item:
language_medic and language_patient are required
2 item:
Connection is closed with code 1008 if missing
Additional properties are allowed.
object
type
required
string
Message type

Const:"handshake_success"
message
required
string
Confirmation message

medic_lang
required
string
Confirmed doctor language, normalized to BCP-47

gender_medic
string
Confirmed doctor gender

patient_lang
required
string
Confirmed patient language, normalized to BCP-47

gender_patient
string
Confirmed patient gender

patient_age
string
Confirmed patient age category

Additional properties are allowed.
object
type
required
string
Message type

Const:"asr_result"
id
required
integer
Unique sentence ID (incremental per session). Allows tracking processing in branches A and B.

text
required
string
Sentence transcription

language
required
string
Language automatically detected by the speech recognition engine, normalized to BCP-47 (e.g. "fr-FR", "en-GB").

speaker
required
string
Identified speaker role by comparing detected language with language_medic and language_patient from handshake.

Allowed values:
"medic"
"patient"
"unknown"
Additional properties are allowed.
object
type
required
string
Message type

Const:"branch_a_result"
sentence_id
required
integer
Sentence ID (corresponds to ASRResult.id)

original_text
required
string
Original transcribed text (copy of ASRResult.text)

translated_text
required
string
Literal translation to target language

back_translated_text
string
Reverse translation to source language (for validation). May be null in case of translation service error.

audio_data
string
format: byte
Base64-encoded TTS audio (WAV format). May be null in case of TTS service error.

audio_format
string
Audio format (always WAV)

Const:"wav"
Additional properties are allowed.
object
type
required
string
Message type

Const:"branch_b_result"
sentence_id
required
integer
Sentence ID

reformulated_source
required
string
Reformulated source text

translated_reformulation
required
string
Translation of the reformulation

back_translated_reformulation
string
Reverse translation of the reformulation

audio_data
string
format: byte
Base64-encoded TTS audio

audio_format
string
Audio format

Const:"wav"
Additional properties are allowed.
object
type
required
string
Const:"error"
service
required
string
The service module that failed

message
required
string
Human readable error message

sentence_id
integer
Optional ID of the sentence being processed when error occurred

Additional properties are allowed.

Orchestrator WebSocket API 1.3.0

Servers

Operations

PUB /ws/audio

Examples

#1 Example - FrenchEnglishConsultation

#2 Example - ArabicConsultation

Examples

This example has been generated automatically.

SUB /ws/audio

Examples

#1 Example - SuccessfulConfirmation

Examples

#1 Example - DoctorTranscription

#2 Example - PatientTranscription

Examples

#1 Example - LiteralTranslation

Examples

#1 Example - ReformulatedTranslation

Examples

#1 Example - VADServiceUnavailable

#2 Example - TranscriptionError

#3 Example - TranslationError

SUB /languages

Examples

This example has been generated automatically.

Messages

Schemas