Changelog

All notable changes to AALIATALK services.

Orchestrator
2026-06-22 v1.5.2 Speed Control, Session Control Messages & Pipeline Improvements
  • Added speed optional handshake parameter - initial TTS playback speed multiplier (default 1.0, range 0.25-4.0). Echoed back in HandshakeSuccess.
  • Added set_speed control message - change TTS speed mid-session without reconnecting. Takes effect on the next sentence. Server responds with speed_updated.
  • Added flag control message - flag a specific sentence or the entire conversation for review. Accepts a flag_type (mistranslation, wrong_speaker, bug, other) and an optional free-text note. Server responds with flag_ack.
  • Added correct_speaker control message - override the speaker role assigned by ASR for a specific sentence. Server responds with correction_ack.
  • Added trigger_branch_c control message - run the alternate-speaker hypothesis for a single sentence on demand, regardless of the enable_dual_direction handshake setting. Server responds immediately with branch_c_triggered, then asynchronously with branch_c_result.
  • Added enable_dual_direction handshake parameter - when true, the alternate-speaker hypothesis runs automatically after the literal translation for every sentence in the session.
  • Added conversation_exit control message - graceful client-triggered session end. Server drains in-flight pipelines, delivers the archive if requested, then closes the WebSocket with code 1000. Server responds with exit_acknowledged.
  • Changed Conversation archive is now delivered as a conversation_archive WebSocket message at session end, built entirely in memory. No files are written to disk at any point during or after a session.
  • Changed If the archive cannot be built, the server now sends an explicit error message with service: "Conversation Archive" instead of silently sending nothing.
  • Changed Reformulation backend updated.
  • Fixed Sentence-level flag and correct_speaker control messages always returned "Sentence not found" in production. The internal sentence lookup dictionary was declared but never populated at detection time. Both the ordered history and the lookup dictionary are now kept in sync using the same object reference.
  • Fixed Session-end server logs always reported 0 sentences processed. The snapshot was taken after cleanup had already cleared the sentence history. Fixed by snapshotting before cleanup runs.
  • Removed Per-session conversation download token and associated download route. Archives are now delivered directly via the WebSocket at session end.
  • Removed conversation_id field from session metrics.
2026-05-14 v1.4.6 Internal Metrics, Pipeline Throughput & Monitoring
  • Added Prometheus metrics endpoint mounted at /metrics, scraped by Grafana Agent every 15 seconds.
  • Added Per-session metrics tracking: sentences processed, empty, errored and flagged, peak concurrent pipelines, warnings and errors.
  • Added Pipeline throughput instrumentation: latency tracking from sentence detection through ASR, translation, reformulation and TTS for each branch.
  • Added TTS metrics: per-request latency in milliseconds, audio output size in bytes, error rate by failure type.
  • Added Reformulation metrics: per-request latency, error rate by failure type.
  • Added Back-translation metrics: success and error counters per sentence.
  • Added Branch-level completion and error counters for branches A, B and C. Branch C skip counters with reason labels (unknown_speaker_role, same_translation_direction).
  • Added Session lifecycle metrics: connection count, disconnection count, session duration histogram, total audio bytes processed per session.
  • Added Handshake rejection counters by reason (missing_fields, unsupported_language, timeout, invalid_json, send_failed).
  • Added WebSocket close counters split by normal and abnormal closure. Health check failure counters by service (vad, asr).
  • Changed /health endpoint now actively probes downstream services (VAD model, ASR) on each call rather than returning a static status, and increments failure counters on degraded results.
2026-04-10 v1.3.1 ASR/LID Overhaul & Pipeline Improvements
  • Added Local text-based language identification integrated into the speech recognition pipeline. Language detection now runs on the transcribed text rather than relying on the speech service's locale field, which was unreliable for short utterances.
  • Added Additional reformulation backend option.
  • Fixed Speaker role detection (medic / patient) returning unknown for all sentences, caused by the speech service reporting an incorrect locale regardless of what was spoken.
  • Fixed Translation service connectivity failure in containerised deployments.
  • Changed Locale list for speech recognition is now built with the medic language first. When the speech service cannot determine the language of a short utterance, it falls back to the first locale in the list — medic language is the more common speaker in a medical consultation.
  • Changed VAD silence counter threshold increased from 10 to 20 frames (~0.64s) to reduce false sentence boundaries on natural speech pauses.
2026-04-08 v1.3.0 Language Handling Overhaul
  • Added Multi-format language code acceptance at handshake. language_medic and language_patient now accept BCP-47 (fr-FR), ISO 639-1 (fr), and ISO 639-3 (fra) — all normalized to BCP-47 internally.
  • Added New language normalization module. Single source of truth for all language code conversions across the pipeline.
  • Added Structured language data files keyed by BCP-47, containing voice synthesis metadata, speech recognition availability flags, and operational status per locale.
  • Fixed Speaker role detection was silently returning unknown for every sentence due to a format mismatch between speech recognition output and stored session languages. Detection is now dialect-aware.
  • Fixed Translation and voice synthesis services were receiving incorrectly formatted language codes.
  • Fixed asr_result now returns a normalized BCP-47 locale in the language field instead of the raw code returned by the speech recognition engine.
  • Changed HandshakeSuccess response now echoes normalized BCP-47 in medic_lang and patient_lang.
  • Changed Breaking /languages endpoint now returns a sorted JSON array of BCP-47 locale strings instead of a { "fr": "Français" } object.
  • Changed Internal module structure clarified. Logging, error formatting, and session management moved to dedicated helper modules.
Microservices
2026-04-10 API v1.3.0  ·  Translator v0.9.0 Translation Service Refactor & Language API Overhaul
  • Added GET /translation/languages now returns rich language metadata per entry: BCP-47 locale, ISO 639-1, ISO 639-3, display name, and an opaque language code for use in translate requests.
  • Added Language data is now sourced from a unified language map shared with the orchestrator. Only languages flagged as available and marked as the default locale for their language group are returned - one entry per language, no dialect duplicates.
  • Changed Breaking GET /translation/languages response shape changed. languages is now a sorted array instead of a keyed object. source_language field removed from the response. Each entry now contains bcp47, iso_639_1, iso_639_3, code, and name instead of the previous code, name, script.
  • Changed Breaking POST /translation/translate now requires language codes in the format returned by GET /translation/languages. ISO 639-1 short codes (e.g. fr, en) are no longer accepted.
  • Changed POST /translation/translate response: source_language and target_language are now BCP-47 strings (e.g. fr-FR) instead of LanguageInfo objects.
  • Removed ISO 639-1 to internal code mapping. The orchestrator now sends language codes directly in the format expected by the translation service — no server-side conversion is performed.
  • Removed source_language fixed default. Both source and target language are always supplied by the caller.