speech_to_text Flutter Package (iOS SFSpeechRecognizer / Android SpeechRecognizer) - Integration | Likepersonsapp

Description

The speech_to_text Flutter package wraps iOS SFSpeechRecognizer and Android SpeechRecognizer to enable hands-free report dictation for peer mentors. Dictation is used post-session for writing activity notes and report fields — not during the session itself, as recording during sensitive conversations was explicitly rejected by Norges Blindeforbund. Reduces barrier to reporting for users with motor impairments.

Detailed Analysis

Speech-to-text dictation directly addresses a key accessibility requirement identified by Norges Blindeforbund: reducing the reporting burden for peer mentors with motor impairments or visual disabilities. By enabling hands-free dictation for post-session activity notes and report fields, the app lowers the barrier to accurate and timely reporting — improving data quality for coordinators and reducing the risk of late or incomplete session records. Critically, dictation is scoped exclusively to post-session reporting; Norges Blindeforbund explicitly rejected microphone use during sensitive peer conversations, and the DictationScopeGuard component enforces this boundary technically. The integration uses iOS SFSpeechRecognizer and Android SpeechRecognizer via the open-source speech_to_text Flutter package, meaning there is no per-use licensing cost.

On-device recognition is prioritised, so audio never leaves the device for the majority of users, addressing privacy concerns without additional infrastructure. This is a zero-cost, high-accessibility-impact capability with a strong ethical foundation aligned to HLF's mission.

The speech-to-text integration spans nine components including the SpeechToTextAdapter, DictationMicrophoneButton, TranscriptionPreviewField, TranscriptionStateManager, DictationScopeGuard, and NativeSpeechApiBridge. Integration requires platform-specific permission declarations: NSSpeechRecognitionUsageDescription and NSMicrophoneUsageDescription in iOS Info.plist, and RECORD_AUDIO in the Android Manifest. These must be reviewed and approved as part of App Store and Play Store submission. Key test scenarios include: permission grant and denial flows on both platforms, on-device recognition availability on iOS 13+ and Android 5.0+, partial transcription behaviour when a user navigates away mid-dictation, and DictationScopeGuard enforcement ensuring dictation cannot be activated during active mentor conversations.

No external credentials or accounts are required — setup complexity is low compared to other integrations. Locale configuration for Norwegian (nb_NO) should be validated against both platform speech engines. Ongoing maintenance is minimal as both platform SDKs are OS-managed and the Flutter package is community-maintained with stable versioning.

The integration wraps iOS SFSpeechRecognizer and Android SpeechRecognizer via the speech_to_text Flutter package (^6.0.0). No API keys or server-side credentials are required — authentication is entirely via OS-level permission grants (NSSpeechRecognitionUsageDescription, NSMicrophoneUsageDescription on iOS; RECORD_AUDIO on Android). On-device recognition is preferred via iOS on-device mode (iOS 13+) and Android SpeechRecognizer, avoiding audio upload to third-party servers. The NativeSpeechApiBridge abstracts platform differences; TranscriptionStateManager handles recognition lifecycle including partial result streaming (< 500ms first partial result target), silence timeout, and graceful interruption.

The DictationScopeGuard is a critical safety component that prevents the microphone from activating during active peer conversations — enforced at the service layer. When speech recognition is unavailable on a device, the SpeechToTextFieldOverlay hides the dictation button entirely rather than showing an error. Partial transcriptions are preserved if the user navigates away mid-session. Locale is configurable to nb_NO for Norwegian.

The one-minute iOS server-based recognition limit is circumvented by preferring on-device recognition; for longer dictations, session restart logic should be implemented. Screen reader announcements on recording start/stop are required for accessibility compliance.

Using Components (9)

Infrastructure Speech-to-Text Adapter medium Ui Speech-to-Text Field Overlay medium Infrastructure Native Speech API Bridge medium Service Dictation Scope Guard low Service Speech Recognition Service high Service Transcription State Manager medium Ui Dictation Microphone Button low Ui Recording State Indicator low Ui Transcription Preview Field medium

Dependencies (4)

speech_to_text Flutter package >=6.0.0 External

iOS 13+ for on-device recognition External

Android 5.0+ with Google speech recognition service External

Microphone permission granted by user External

Authentication

Type	None
Requirements	NSSpeechRecognitionUsageDescription in iOS Info.plist, NSMicrophoneUsageDescription in iOS Info.plist, RECORD_AUDIO permission in Android Manifest
Scopes	microphonespeech_recognition

Configuration

Required Settings

NSSpeechRecognitionUsageDescription Required

NSMicrophoneUsageDescription Required

RECORD_AUDIO in AndroidManifest.xml Required

Optional Settings

Locale setting (nb_NO for Norwegian) Optional

Partial results display enabled Optional

Silence timeout duration Optional

Error Handling

Microphone permission denied shows plain-language explanation and settings deep link

Speech recognition unavailable on device hides dictation button entirely

Partial transcription saved if user navigates away mid-dictation

DictationScopeGuard prevents dictation during active peer mentor conversations

Monitoring

Health Checks

Speech recognition availability check on field focus

Metrics

Dictation usage rate

Transcription acceptance rate (user keeps vs discards)

Performance

Latency	< 500ms for first partial result display
Availability	On-device recognition preferred for offline availability

Rate Limits

iOS: 1 minute per recognition request for server-based recognition

On-device recognition: no time limits

Cost Implications

Pricing Model	Free open-source package; iOS on-device recognition is free

Cost Factors

Server-based recognition on older iOS devices may incur Apple's limits

Security Considerations

Audio never uploaded to third-party servers when on-device recognition used

DictationScopeGuard ensures microphone cannot be activated during sensitive peer conversations

Accessibility: screen reader announcement when recording starts/stops to inform visually impaired users

Fallback Mechanisms

Standard keyboard text entry when speech recognition unavailable or denied

Partial transcription retained if dictation interrupted

Documentation

https://pub.dev/packages/speech_to_text External