Service Layer high complexity mobile
1
Dependencies
1
Dependents
0
Entities
1
Integrations

Description

Core service that wraps the speech_to_text Flutter package to provide a unified interface over iOS SFSpeechRecognizer and Android SpeechRecognizer. Manages the full lifecycle of a speech recognition session — initialisation, permission negotiation, start, partial results streaming, stop, and error handling — without relying on any third-party cloud API.

Feature: Speech-to-Text Input

speech-recognition-service

Summaries

The Speech Recognition Service is the foundational capability that makes voice-driven report entry possible — and it does so entirely on-device, without routing audio data through any third-party cloud API. This architectural choice delivers two critical business advantages: it eliminates per-transcription API costs that would scale directly with user volume, and it ensures sensitive spoken content (clinical notes, inspection findings, legal observations) never leaves the device, reducing data privacy risk and supporting compliance with healthcare and regulatory frameworks. The service abstracts the differences between iOS and Android speech engines, meaning the business invests in one implementation rather than two platform-specific integrations. Support for multiple locales expands the potential addressable market to non-English-speaking user bases without requiring separate engineering efforts.

This is the highest-complexity component in the dictation system and the primary technical risk for the feature. It wraps the `speech_to_text` Flutter plugin, which introduces a third-party dependency that must be evaluated for version stability, maintenance cadence, and known issues before sprint planning. Permission flows differ meaningfully between iOS and Android — both require runtime permission requests, but iOS additionally requires a usage description in Info.plist and may involve a system-level privacy prompt that cannot be dismissed programmatically.

Plan for dedicated testing time on physical devices; simulators do not reliably replicate speech engine behaviour. Error handling paths (engine unavailable, permission denied mid-session, locale not supported) must each be tested explicitly. Locale support scope should be agreed with product before development begins to avoid late-sprint scope changes. This component must be delivered and stable before any other dictation component can be integration-tested end-to-end.

The Speech Recognition Service is a high-complexity Flutter service class that wraps the `speech_to_text` package, providing a clean, reactive interface over the divergent iOS `SFSpeechRecognizer` and Android `SpeechRecognizer` native APIs via `native-speech-api-bridge`. `initialise()` must be called once at app start or feature activation and caches availability status. `requestPermissions()` triggers the platform permission dialogs; `hasPermission()` returns the cached result without re-prompting. `startListening(SpeechConfig config)` accepts a config object carrying locale, partialResults flag, and timeout values, translating these to platform-specific parameters.

`transcriptionStream()` returns a `Stream` emitting both partial and final result events with a `isFinal` discriminator. `stopListening()` triggers a graceful session end and flushes any buffered partial results before emitting the final event. Implement exponential backoff for transient engine errors and expose error codes as typed enums rather than raw strings to allow callers to make policy decisions. The service must be injected via the dependency injection container (e.g., GetIt or Riverpod) and mocked in unit tests using a `SpeechRecognitionPort` interface to avoid hardware dependencies in CI.

Responsibilities

  • Initialise the native speech engine and verify platform availability
  • Request and validate microphone and speech recognition permissions
  • Start and stop audio capture sessions on explicit user command only
  • Emit partial and final transcription results as a stream
  • Handle engine errors and locale/language configuration

Interfaces

initialise()
requestPermissions()
hasPermission()
startListening(SpeechConfig config)
stopListening()
cancelListening()
transcriptionStream()
isAvailable()
getSupportedLocales()
setLocale(String localeId)

Relationships

Dependencies (1)

Components this component depends on

Dependents (1)

Components that depend on this component

Used Integrations (1)

External integrations and APIs this component relies on

API Contract

View full contract →
REST /api/v1/speech-recognition 7 endpoints
GET /api/v1/speech-recognition/sessions
GET /api/v1/speech-recognition/sessions/:id
POST /api/v1/speech-recognition/sessions
PUT /api/v1/speech-recognition/sessions/:id
DELETE /api/v1/speech-recognition/sessions/:id
GET /api/v1/speech-recognition/permission
+1 more