Speech Recognition Service
Component Detail
Description
Core service that wraps the speech_to_text Flutter package to provide a unified interface over iOS SFSpeechRecognizer and Android SpeechRecognizer. Manages the full lifecycle of a speech recognition session — initialisation, permission negotiation, start, partial results streaming, stop, and error handling — without relying on any third-party cloud API.
speech-recognition-service
Summaries
The Speech Recognition Service is the foundational capability that makes voice-driven report entry possible — and it does so entirely on-device, without routing audio data through any third-party cloud API. This architectural choice delivers two critical business advantages: it eliminates per-transcription API costs that would scale directly with user volume, and it ensures sensitive spoken content (clinical notes, inspection findings, legal observations) never leaves the device, reducing data privacy risk and supporting compliance with healthcare and regulatory frameworks. The service abstracts the differences between iOS and Android speech engines, meaning the business invests in one implementation rather than two platform-specific integrations. Support for multiple locales expands the potential addressable market to non-English-speaking user bases without requiring separate engineering efforts.
This is the highest-complexity component in the dictation system and the primary technical risk for the feature. It wraps the `speech_to_text` Flutter plugin, which introduces a third-party dependency that must be evaluated for version stability, maintenance cadence, and known issues before sprint planning. Permission flows differ meaningfully between iOS and Android — both require runtime permission requests, but iOS additionally requires a usage description in Info.plist and may involve a system-level privacy prompt that cannot be dismissed programmatically.
Plan for dedicated testing time on physical devices; simulators do not reliably replicate speech engine behaviour. Error handling paths (engine unavailable, permission denied mid-session, locale not supported) must each be tested explicitly. Locale support scope should be agreed with product before development begins to avoid late-sprint scope changes. This component must be delivered and stable before any other dictation component can be integration-tested end-to-end.
The Speech Recognition Service is a high-complexity Flutter service class that wraps the `speech_to_text` package, providing a clean, reactive interface over the divergent iOS `SFSpeechRecognizer` and Android `SpeechRecognizer` native APIs via `native-speech-api-bridge`. `initialise()` must be called once at app start or feature activation and caches availability status. `requestPermissions()` triggers the platform permission dialogs; `hasPermission()` returns the cached result without re-prompting. `startListening(SpeechConfig config)` accepts a config object carrying locale, partialResults flag, and timeout values, translating these to platform-specific parameters.
`transcriptionStream()` returns a `Stream
Responsibilities
- Initialise the native speech engine and verify platform availability
- Request and validate microphone and speech recognition permissions
- Start and stop audio capture sessions on explicit user command only
- Emit partial and final transcription results as a stream
- Handle engine errors and locale/language configuration
Interfaces
initialise()
requestPermissions()
hasPermission()
startListening(SpeechConfig config)
stopListening()
cancelListening()
transcriptionStream()
isAvailable()
getSupportedLocales()
setLocale(String localeId)
Relationships
Used Integrations (1)
External integrations and APIs this component relies on