Implement dispatcher retry logic with backoff
epic-pause-status-notifications-foundation-task-008 — Add exponential backoff retry logic to the FCM dispatcher for transient failures (network timeout, FCM 500 errors). Implement configurable max retry count, retry delay calculation, and a dead-letter outcome model that callers can inspect to determine final delivery status.
Acceptance Criteria
Technical Requirements
Execution Context
Tier 3 - 413 tasks
Can start after Tier 2 completes
Implementation Notes
Implement RetryExecutor as a generic class: RetryExecutor
Use dart:math Random for jitter. Cap the computed delay at config.maxDelayMs before applying jitter. Store each attempt's timestamp using DateTime.now() so dead-letter records are auditable. In the Edge Function context (server-side), use setTimeout equivalents in Deno rather than Future.delayed.
Document that dead-letter results should trigger an alert (e.g. Supabase webhook or log-based alert) for ops visibility.
Testing Requirements
Unit tests using flutter_test with a fake clock (FakeAsync from async package) to test delays without real wall-clock waits. Required test cases: (1) transient failure on attempt 1, success on attempt 2 — result is success, (2) all 3 attempts fail with NETWORK_TIMEOUT — result is deadLetter with attemptCount=3, (3) permanent failure (invalidToken) on attempt 1 — no retry triggered, result is invalidToken, (4) FCM_RATE_LIMITED triggers retry with correct backoff delay sequence, (5) jitter is within ±20% of calculated delay. Test that RetryConfig overrides are respected (e.g. maxAttempts=1 exhausts after single attempt).
Use FakeAsync.elapse to assert correct delay durations without sleeping.
The org membership table structure used to resolve coordinator relationships may differ from what the repository assumes, causing incorrect coordinator lookup or missing rows for mentors in multi-chapter scenarios.
Mitigation & Contingency
Mitigation: Review the existing org membership table schema and RLS policies before writing repository queries. Align query logic with the patterns already used by peer-mentor-status-repository and multi-chapter-membership-service.
Contingency: If schema differs, add an adapter layer in the repository that normalises the membership resolution and document the discrepancy for the data team. Fall back to coordinator lookup via the feature's own stored coordinator_id field if org membership join fails.
Device tokens stored in the database may be stale or unregistered, causing FCM dispatch failures that silently drop coordinator notifications — the primary coordination safeguard of this feature.
Mitigation & Contingency
Mitigation: Implement token validation on every dispatch call and handle FCM's NOT_REGISTERED error by flagging the token as invalid in the database. Reuse the token refresh pattern already established by fcm-token-manager.
Contingency: If push delivery fails after retry, ensure the in-app notification record is always written regardless of push outcome so coordinators can still see the event in the notification centre.
The optional reason field may contain special characters, emoji, or non-Latin scripts that exceed the 200-character byte limit when FCM encodes the payload, causing delivery failures.
Mitigation & Contingency
Mitigation: Enforce the 200-character limit on Unicode code point count, not byte count, in the payload builder. Add a unit test with multi-byte input strings.
Contingency: If an oversized payload is detected at dispatch time, strip the reason field from the push notification body and note 'See in-app notification for full reason' to preserve delivery.