Changelog
All notable changes to AALIATALK services.
Orchestrator
2026-04-10 v1.3.1 ASR/LID Overhaul & Pipeline Improvements
- Added Local text-based language identification integrated into the speech recognition pipeline. Language detection now runs on the transcribed text rather than relying on the speech service's locale field, which was unreliable for short utterances.
- Added Additional reformulation backend option.
-
Fixed
Speaker role detection (
medic/patient) returningunknownfor all sentences, caused by the speech service reporting an incorrect locale regardless of what was spoken. - Fixed Translation service connectivity failure in containerised deployments.
- Changed Locale list for speech recognition is now built with the medic language first. When the speech service cannot determine the language of a short utterance, it falls back to the first locale in the list — medic language is the more common speaker in a medical consultation.
- Changed VAD silence counter threshold increased from 10 to 20 frames (~0.64s) to reduce false sentence boundaries on natural speech pauses.
2026-04-08 v1.3.0 Language Handling Overhaul
-
Added
Multi-format language code acceptance at handshake.
language_medicandlanguage_patientnow accept BCP-47 (fr-FR), ISO 639-1 (fr), and ISO 639-3 (fra) — all normalized to BCP-47 internally. - Added New language normalization module. Single source of truth for all language code conversions across the pipeline.
- Added Structured language data files keyed by BCP-47, containing voice synthesis metadata, speech recognition availability flags, and operational status per locale.
-
Fixed
Speaker role detection was silently returning
unknownfor every sentence due to a format mismatch between speech recognition output and stored session languages. Detection is now dialect-aware. - Fixed Translation and voice synthesis services were receiving incorrectly formatted language codes.
-
Fixed
asr_resultnow returns a normalized BCP-47 locale in thelanguagefield instead of the raw code returned by the speech recognition engine. -
Changed
HandshakeSuccessresponse now echoes normalized BCP-47 inmedic_langandpatient_lang. -
Changed
Breaking
/languagesendpoint now returns a sorted JSON array of BCP-47 locale strings instead of a{ "fr": "Français" }object. - Changed Internal module structure clarified. Logging, error formatting, and session management moved to dedicated helper modules.
Microservices
2026-04-10 API v1.3.0 · Translator v0.9.0 Translation Service Refactor & Language API Overhaul
-
Added
GET /translation/languagesnow returns rich language metadata per entry: BCP-47 locale, ISO 639-1, ISO 639-3, display name, and an opaque language code for use in translate requests. - Added Language data is now sourced from a unified language map shared with the orchestrator. Only languages flagged as available and marked as the default locale for their language group are returned - one entry per language, no dialect duplicates.
-
Changed
Breaking
GET /translation/languagesresponse shape changed.languagesis now a sorted array instead of a keyed object.source_languagefield removed from the response. Each entry now containsbcp47,iso_639_1,iso_639_3,code, andnameinstead of the previouscode,name,script. -
Changed
Breaking
POST /translation/translatenow requires language codes in the format returned byGET /translation/languages. ISO 639-1 short codes (e.g.fr,en) are no longer accepted. -
Changed
POST /translation/translateresponse:source_languageandtarget_languageare now BCP-47 strings (e.g.fr-FR) instead ofLanguageInfoobjects. - Removed ISO 639-1 to internal code mapping. The orchestrator now sends language codes directly in the format expected by the translation service — no server-side conversion is performed.
-
Removed
source_languagefixed default. Both source and target language are always supplied by the caller.