high priority medium complexity infrastructure pending backend specialist Tier 3

Acceptance Criteria

Retry logic only activates for transient error codes: NETWORK_TIMEOUT, FCM_SERVER_ERROR (5xx), FCM_RATE_LIMITED (429)
Retry logic never activates for permanent error codes: FCM_AUTH_ERROR (401/403), invalidToken (400/404), payload validation failure
Default max retry count is 3 (configurable via RetryConfig, injected at construction time)
Retry delay follows exponential backoff: delay = baseDelay * 2^attemptIndex, where baseDelay defaults to 1 second
Jitter is applied to delay (±20% random) to prevent thundering herd across concurrent dispatches
After max retries are exhausted without success, result is DispatchResult.deadLetter containing: original error, attempt count, timestamps of each attempt, last error code
Each retry attempt is logged at WARN level with attempt number and delay
RetryConfig exposes: maxAttempts (int), baseDelayMs (int), maxDelayMs (int, caps the backoff)
Retry logic is implemented in an isolated RetryExecutor utility so it can be reused for other dispatch channels
Total retry wall-clock time (including delays) must not exceed 30 seconds for default config

Technical Requirements

frameworks
Flutter
Dart
Riverpod
apis
FCM HTTP v1 API (indirectly, via task-006 dispatcher)
data models
RetryConfig
DispatchResult.deadLetter
RetryAttemptRecord (timestamp, error_code, attempt_index)
performance requirements
Backoff delay computation must be O(1)
Retry loop must not block the main isolate — all delays via Future.delayed
maxDelayMs cap prevents runaway delay in high-retry-count configs
security requirements
Retry logs must not include FCM tokens or PII
Dead-letter records stored for audit must follow the same PII restrictions as dispatch logs

Execution Context

Execution Tier
Tier 3

Tier 3 - 413 tasks

Can start after Tier 2 completes

Implementation Notes

Implement RetryExecutor as a generic class: RetryExecutor with a method Future execute(Future Function() operation, bool Function(T) isTransient, RetryConfig config). This keeps retry logic decoupled from FCM-specific code and reusable. The isTransient predicate receives the result and returns true if a retry should occur. For DispatchResult, define isTransient as: result is DispatchResult.failure && [FCM_SERVER_ERROR, NETWORK_TIMEOUT, FCM_RATE_LIMITED].contains(result.errorCode).

Use dart:math Random for jitter. Cap the computed delay at config.maxDelayMs before applying jitter. Store each attempt's timestamp using DateTime.now() so dead-letter records are auditable. In the Edge Function context (server-side), use setTimeout equivalents in Deno rather than Future.delayed.

Document that dead-letter results should trigger an alert (e.g. Supabase webhook or log-based alert) for ops visibility.

Testing Requirements

Unit tests using flutter_test with a fake clock (FakeAsync from async package) to test delays without real wall-clock waits. Required test cases: (1) transient failure on attempt 1, success on attempt 2 — result is success, (2) all 3 attempts fail with NETWORK_TIMEOUT — result is deadLetter with attemptCount=3, (3) permanent failure (invalidToken) on attempt 1 — no retry triggered, result is invalidToken, (4) FCM_RATE_LIMITED triggers retry with correct backoff delay sequence, (5) jitter is within ±20% of calculated delay. Test that RetryConfig overrides are respected (e.g. maxAttempts=1 exhausts after single attempt).

Use FakeAsync.elapse to assert correct delay durations without sleeping.

Component
FCM Notification Dispatcher
infrastructure medium
Epic Risks (3)
high impact medium prob integration

The org membership table structure used to resolve coordinator relationships may differ from what the repository assumes, causing incorrect coordinator lookup or missing rows for mentors in multi-chapter scenarios.

Mitigation & Contingency

Mitigation: Review the existing org membership table schema and RLS policies before writing repository queries. Align query logic with the patterns already used by peer-mentor-status-repository and multi-chapter-membership-service.

Contingency: If schema differs, add an adapter layer in the repository that normalises the membership resolution and document the discrepancy for the data team. Fall back to coordinator lookup via the feature's own stored coordinator_id field if org membership join fails.

high impact medium prob technical

Device tokens stored in the database may be stale or unregistered, causing FCM dispatch failures that silently drop coordinator notifications — the primary coordination safeguard of this feature.

Mitigation & Contingency

Mitigation: Implement token validation on every dispatch call and handle FCM's NOT_REGISTERED error by flagging the token as invalid in the database. Reuse the token refresh pattern already established by fcm-token-manager.

Contingency: If push delivery fails after retry, ensure the in-app notification record is always written regardless of push outcome so coordinators can still see the event in the notification centre.

medium impact low prob technical

The optional reason field may contain special characters, emoji, or non-Latin scripts that exceed the 200-character byte limit when FCM encodes the payload, causing delivery failures.

Mitigation & Contingency

Mitigation: Enforce the 200-character limit on Unicode code point count, not byte count, in the payload builder. Add a unit test with multi-byte input strings.

Contingency: If an oversized payload is detected at dispatch time, strip the reason field from the push notification body and note 'See in-app notification for full reason' to preserve delivery.