Changelog

All notable changes to AALIATALK services.

Orchestrator
2026-04-10 v1.3.1 ASR/LID Overhaul & Pipeline Improvements
  • Added Local text-based language identification integrated into the speech recognition pipeline. Language detection now runs on the transcribed text rather than relying on the speech service's locale field, which was unreliable for short utterances.
  • Added Additional reformulation backend option.
  • Fixed Speaker role detection (medic / patient) returning unknown for all sentences, caused by the speech service reporting an incorrect locale regardless of what was spoken.
  • Fixed Translation service connectivity failure in containerised deployments.
  • Changed Locale list for speech recognition is now built with the medic language first. When the speech service cannot determine the language of a short utterance, it falls back to the first locale in the list — medic language is the more common speaker in a medical consultation.
  • Changed VAD silence counter threshold increased from 10 to 20 frames (~0.64s) to reduce false sentence boundaries on natural speech pauses.
2026-04-08 v1.3.0 Language Handling Overhaul
  • Added Multi-format language code acceptance at handshake. language_medic and language_patient now accept BCP-47 (fr-FR), ISO 639-1 (fr), and ISO 639-3 (fra) — all normalized to BCP-47 internally.
  • Added New language normalization module. Single source of truth for all language code conversions across the pipeline.
  • Added Structured language data files keyed by BCP-47, containing voice synthesis metadata, speech recognition availability flags, and operational status per locale.
  • Fixed Speaker role detection was silently returning unknown for every sentence due to a format mismatch between speech recognition output and stored session languages. Detection is now dialect-aware.
  • Fixed Translation and voice synthesis services were receiving incorrectly formatted language codes.
  • Fixed asr_result now returns a normalized BCP-47 locale in the language field instead of the raw code returned by the speech recognition engine.
  • Changed HandshakeSuccess response now echoes normalized BCP-47 in medic_lang and patient_lang.
  • Changed Breaking /languages endpoint now returns a sorted JSON array of BCP-47 locale strings instead of a { "fr": "Français" } object.
  • Changed Internal module structure clarified. Logging, error formatting, and session management moved to dedicated helper modules.
Microservices
2026-04-10 API v1.3.0  ·  Translator v0.9.0 Translation Service Refactor & Language API Overhaul
  • Added GET /translation/languages now returns rich language metadata per entry: BCP-47 locale, ISO 639-1, ISO 639-3, display name, and an opaque language code for use in translate requests.
  • Added Language data is now sourced from a unified language map shared with the orchestrator. Only languages flagged as available and marked as the default locale for their language group are returned - one entry per language, no dialect duplicates.
  • Changed Breaking GET /translation/languages response shape changed. languages is now a sorted array instead of a keyed object. source_language field removed from the response. Each entry now contains bcp47, iso_639_1, iso_639_3, code, and name instead of the previous code, name, script.
  • Changed Breaking POST /translation/translate now requires language codes in the format returned by GET /translation/languages. ISO 639-1 short codes (e.g. fr, en) are no longer accepted.
  • Changed POST /translation/translate response: source_language and target_language are now BCP-47 strings (e.g. fr-FR) instead of LanguageInfo objects.
  • Removed ISO 639-1 to internal code mapping. The orchestrator now sends language codes directly in the format expected by the translation service — no server-side conversion is performed.
  • Removed source_language fixed default. Both source and target language are always supplied by the caller.