high priority medium complexity integration pending integration specialist Tier 1

Acceptance Criteria

On a transient HTTP error (5xx, network timeout), the sync operation retries up to 3 times with delays of 2s, 4s, and 8s (exponential backoff) before declaring failure
Retry attempts are logged with structured fields: attempt_number, delay_ms, error_code, operation_type, certification_id
After 3 consecutive failures, the circuit breaker transitions to OPEN state and blocks further sync attempts for a configurable cool-down period (default: 60 seconds)
When the circuit is OPEN, a circuit-open event is logged with: opened_at timestamp, failure_count, last_error_code
Circuit transitions to HALF-OPEN after cool-down and allows one probe request; success → CLOSED, failure → OPEN again
Failed sync operations (after exhausting retries) are persisted to a `sync_retry_queue` table (or equivalent) in Supabase with fields: operation_payload, failure_reason, attempt_count, created_at, status='pending_manual_replay'
No certification status changes are silently discarded — every failure results in either a successful retry, a queued manual replay entry, or an explicit error event
4xx errors (client errors, e.g. 400, 404) are NOT retried — they are immediately queued as unrecoverable with status='requires_investigation'
A `getCircuitState()` method returns current state (CLOSED/OPEN/HALF_OPEN) for health-check endpoints
Unit test: 3 consecutive 503 responses → circuit opens, retry queue receives one entry with attempt_count=3

Technical Requirements

frameworks
Flutter
Dart
Riverpod
apis
Microsoft Dynamics 365 REST API
Supabase PostgreSQL 15
data models
certification
performance requirements
Retry delays must not block the main isolate — use Future.delayed with async/await
Circuit breaker state transitions must be atomic to prevent race conditions in concurrent sync operations
Queue insertion on failure must complete within 500ms to avoid blocking the sync pipeline
security requirements
Retry queue entries must not store decrypted credentials or full JWT tokens — store only operation metadata
Queue table must have RLS policy restricting access to service role only
Logged error context must not include PII from certification payloads

Execution Context

Execution Tier
Tier 1

Tier 1 - 540 tasks

Can start after Tier 0 completes

Implementation Notes

Implement the circuit breaker as a standalone `DynamicsCircuitBreaker` class with state enum `CircuitState { closed, open, halfOpen }`. Inject it into `HLFDynamicsSyncService` as a dependency for testability. Use a `RetryPolicy` value object to encapsulate `maxAttempts=3` and `baseDelayMs=2000` (doubling per attempt). The retry loop should be a `for` loop with `await Future.delayed()` — do NOT use recursive calls.

For the queue: create a Supabase table `dynamics_sync_retry_queue` with columns `id`, `operation_type`, `payload_json`, `failure_reason`, `attempt_count`, `status`, `created_at`. Insert via Supabase client service role. The circuit breaker cool-down timer should use `DateTime.now()` comparisons rather than a running timer, so it survives isolate restarts. Log using a structured logger (avoid print statements in production code).

Testing Requirements

Unit tests with flutter_test and mock HTTP client covering: all 3 retry attempts exhausted → queue entry created, partial retry success (failure on attempt 1, success on attempt 2) → no queue entry, 4xx error → immediate queue with 'requires_investigation', circuit transitions CLOSED→OPEN→HALF_OPEN→CLOSED and CLOSED→OPEN→HALF_OPEN→OPEN, circuit OPEN state blocks calls without hitting HTTP. Integration test (against sandbox): 503 retry sequence, circuit trip on 3 consecutive failures. Test coverage target: 95%+ for circuit breaker state machine and retry logic.

Component
HLF Dynamics Sync Service
service medium
Epic Risks (3)
high impact medium prob integration

HLF Dynamics portal webhook API contract may be undocumented, subject to change, or require a separate authentication flow not yet agreed upon with HLF. If the contract changes post-implementation, the sync service silently fails and expired peer mentors remain on public listings.

Mitigation & Contingency

Mitigation: Obtain the official Dynamics webhook specification and test credentials from HLF before starting HLFDynamicsSyncService implementation. Agree on a versioned webhook contract and request a staging endpoint for integration testing.

Contingency: If the contract is unavailable, stub the sync service behind a feature flag and ship without Dynamics sync initially. Queue sync events locally and replay once the contract is confirmed.

high impact medium prob security

Supabase RLS policies for certifications must correctly scope data to the coordinator's chapter without leaking cross-organisation data, particularly complex in multi-chapter membership scenarios. A misconfigured policy could expose peer mentor PII to wrong coordinators.

Mitigation & Contingency

Mitigation: Write RLS policies against the established org-hierarchy schema used by other tables. Peer review all policies before migration deployment. Add integration tests that assert cross-organisation data isolation using test accounts with different org scopes.

Contingency: If a policy gap is discovered post-merge, immediately disable the affected query endpoint and apply a hotfix migration. Audit access logs in Supabase for any cross-org data access events.

medium impact low prob technical

Storing renewal history as a JSONB field rather than a normalised table simplifies queries but makes retrospective schema changes (adding fields to history entries) harder and could cause issues if history grows very large for long-tenured mentors.

Mitigation & Contingency

Mitigation: Define a versioned JSONB entry schema (include a schema_version field in each entry) so future migrations can transform old entries. Add a size guard in the repository to warn if renewal_history exceeds 500 entries.

Contingency: If JSONB approach proves limiting, add a normalised certification_renewal_events table and migrate history entries in a background job, keeping the JSONB field as a read cache.