Changelog
All notable changes to AALIATALK services.
Orchestrator
2026-06-22 v1.5.2 Speed Control, Session Control Messages & Pipeline Improvements
-
Added
speedoptional handshake parameter - initial TTS playback speed multiplier (default 1.0, range 0.25-4.0). Echoed back inHandshakeSuccess. -
Added
set_speedcontrol message - change TTS speed mid-session without reconnecting. Takes effect on the next sentence. Server responds withspeed_updated. -
Added
flagcontrol message - flag a specific sentence or the entire conversation for review. Accepts aflag_type(mistranslation,wrong_speaker,bug,other) and an optional free-text note. Server responds withflag_ack. -
Added
correct_speakercontrol message - override the speaker role assigned by ASR for a specific sentence. Server responds withcorrection_ack. -
Added
trigger_branch_ccontrol message - run the alternate-speaker hypothesis for a single sentence on demand, regardless of theenable_dual_directionhandshake setting. Server responds immediately withbranch_c_triggered, then asynchronously withbranch_c_result. -
Added
enable_dual_directionhandshake parameter - when true, the alternate-speaker hypothesis runs automatically after the literal translation for every sentence in the session. -
Added
conversation_exitcontrol message - graceful client-triggered session end. Server drains in-flight pipelines, delivers the archive if requested, then closes the WebSocket with code 1000. Server responds withexit_acknowledged. -
Changed
Conversation archive is now delivered as a
conversation_archiveWebSocket message at session end, built entirely in memory. No files are written to disk at any point during or after a session. -
Changed
If the archive cannot be built, the server now sends an explicit
errormessage withservice: "Conversation Archive"instead of silently sending nothing. - Changed Reformulation backend updated.
-
Fixed
Sentence-level
flagandcorrect_speakercontrol messages always returned "Sentence not found" in production. The internal sentence lookup dictionary was declared but never populated at detection time. Both the ordered history and the lookup dictionary are now kept in sync using the same object reference. - Fixed Session-end server logs always reported 0 sentences processed. The snapshot was taken after cleanup had already cleared the sentence history. Fixed by snapshotting before cleanup runs.
- Removed Per-session conversation download token and associated download route. Archives are now delivered directly via the WebSocket at session end.
-
Removed
conversation_idfield from session metrics.
2026-05-14 v1.4.6 Internal Metrics, Pipeline Throughput & Monitoring
-
Added
Prometheus metrics endpoint mounted at
/metrics, scraped by Grafana Agent every 15 seconds. - Added Per-session metrics tracking: sentences processed, empty, errored and flagged, peak concurrent pipelines, warnings and errors.
- Added Pipeline throughput instrumentation: latency tracking from sentence detection through ASR, translation, reformulation and TTS for each branch.
- Added TTS metrics: per-request latency in milliseconds, audio output size in bytes, error rate by failure type.
- Added Reformulation metrics: per-request latency, error rate by failure type.
- Added Back-translation metrics: success and error counters per sentence.
-
Added
Branch-level completion and error counters for branches A, B and C. Branch C skip counters with reason labels (
unknown_speaker_role,same_translation_direction). - Added Session lifecycle metrics: connection count, disconnection count, session duration histogram, total audio bytes processed per session.
-
Added
Handshake rejection counters by reason (
missing_fields,unsupported_language,timeout,invalid_json,send_failed). -
Added
WebSocket close counters split by normal and abnormal closure. Health check failure counters by service (
vad,asr). -
Changed
/healthendpoint now actively probes downstream services (VAD model, ASR) on each call rather than returning a static status, and increments failure counters on degraded results.
2026-04-10 v1.3.1 ASR/LID Overhaul & Pipeline Improvements
- Added Local text-based language identification integrated into the speech recognition pipeline. Language detection now runs on the transcribed text rather than relying on the speech service's locale field, which was unreliable for short utterances.
- Added Additional reformulation backend option.
-
Fixed
Speaker role detection (
medic/patient) returningunknownfor all sentences, caused by the speech service reporting an incorrect locale regardless of what was spoken. - Fixed Translation service connectivity failure in containerised deployments.
- Changed Locale list for speech recognition is now built with the medic language first. When the speech service cannot determine the language of a short utterance, it falls back to the first locale in the list — medic language is the more common speaker in a medical consultation.
- Changed VAD silence counter threshold increased from 10 to 20 frames (~0.64s) to reduce false sentence boundaries on natural speech pauses.
2026-04-08 v1.3.0 Language Handling Overhaul
-
Added
Multi-format language code acceptance at handshake.
language_medicandlanguage_patientnow accept BCP-47 (fr-FR), ISO 639-1 (fr), and ISO 639-3 (fra) — all normalized to BCP-47 internally. - Added New language normalization module. Single source of truth for all language code conversions across the pipeline.
- Added Structured language data files keyed by BCP-47, containing voice synthesis metadata, speech recognition availability flags, and operational status per locale.
-
Fixed
Speaker role detection was silently returning
unknownfor every sentence due to a format mismatch between speech recognition output and stored session languages. Detection is now dialect-aware. - Fixed Translation and voice synthesis services were receiving incorrectly formatted language codes.
-
Fixed
asr_resultnow returns a normalized BCP-47 locale in thelanguagefield instead of the raw code returned by the speech recognition engine. -
Changed
HandshakeSuccessresponse now echoes normalized BCP-47 inmedic_langandpatient_lang. -
Changed
Breaking
/languagesendpoint now returns a sorted JSON array of BCP-47 locale strings instead of a{ "fr": "Français" }object. - Changed Internal module structure clarified. Logging, error formatting, and session management moved to dedicated helper modules.
Microservices
2026-04-10 API v1.3.0 · Translator v0.9.0 Translation Service Refactor & Language API Overhaul
-
Added
GET /translation/languagesnow returns rich language metadata per entry: BCP-47 locale, ISO 639-1, ISO 639-3, display name, and an opaque language code for use in translate requests. - Added Language data is now sourced from a unified language map shared with the orchestrator. Only languages flagged as available and marked as the default locale for their language group are returned - one entry per language, no dialect duplicates.
-
Changed
Breaking
GET /translation/languagesresponse shape changed.languagesis now a sorted array instead of a keyed object.source_languagefield removed from the response. Each entry now containsbcp47,iso_639_1,iso_639_3,code, andnameinstead of the previouscode,name,script. -
Changed
Breaking
POST /translation/translatenow requires language codes in the format returned byGET /translation/languages. ISO 639-1 short codes (e.g.fr,en) are no longer accepted. -
Changed
POST /translation/translateresponse:source_languageandtarget_languageare now BCP-47 strings (e.g.fr-FR) instead ofLanguageInfoobjects. - Removed ISO 639-1 to internal code mapping. The orchestrator now sends language codes directly in the format expected by the translation service — no server-side conversion is performed.
-
Removed
source_languagefixed default. Both source and target language are always supplied by the caller.