high priority low complexity testing pending testing specialist Tier 4

Acceptance Criteria

Test: flag enabled — `isEnabled('feature_x')` returns `true` when mock returns `is_enabled: true` with no rollout
Test: flag disabled — `isEnabled('feature_y')` returns `false` when mock returns `is_enabled: false`
Test: TTL expiry — after `fakeAsync` advances time past TTL, the next `isEnabled` call schedules a background re-fetch; mock Supabase is called a second time
Test: TTL not expired — within TTL window, Supabase mock is called exactly once regardless of how many times `isEnabled` is called
Test: org-switch invalidation — after loading org_A flags and switching to org_B, `isEnabled` for a flag that was `true` in org_A returns `false` (org_B has no such flag)
Test: rollout 0% — `isEnabled` returns `false` for all user IDs
Test: rollout 100% — `isEnabled` returns `true` for all user IDs
Test: rollout 50% boundary — for a flag with 50% rollout, the same user ID always returns the same deterministic result across 50 invocations
Test: Supabase unreachable — when mock throws a network error and no cache exists, `isEnabled` returns `false` without throwing
All 9 scenarios have dedicated named test cases and pass consistently

Technical Requirements

frameworks
flutter_test
Riverpod
mocktail or mockito
data models
FeatureFlagState
org_feature_flags
performance requirements
All tests must complete within 1 second using fakeAsync (no real timers or network)
security requirements
Mock flag data must not use real flag keys from production to avoid leaking roadmap information in test files

Execution Context

Execution Tier
Tier 4

Tier 4 - 323 tasks

Can start after Tier 3 completes

Implementation Notes

To test TTL expiry without real timers, `FeatureFlagProvider` must accept a `Clock` dependency (from the `clock` package or a simple `DateTime Function()` callback) that can be overridden in tests. This is a common pattern in Dart — define `final Clock _clock` in the notifier and default it to `Clock.fixed(DateTime.now())` in production and override it in tests. For rollout determinism tests, pre-compute which user IDs fall above/below the 50% threshold: `'user_001'.hashCode.abs() % 100`. Document these values in test comments so future maintainers understand why specific IDs were chosen.

The org-switch test should verify not just that new flags are loaded but that calling `isEnabled` with a key only present in org_A returns `false` after switching to org_B — a subtle but important invariant.

Testing Requirements

Pure unit tests using `flutter_test` and `fakeAsync`. Use `mocktail` for `MockFeatureFlagRepository` (or equivalent mock of the Supabase abstraction layer). Use Riverpod `ProviderContainer` with provider overrides to inject mocks. For TTL tests, use `fakeAsync` + `clock.advance(Duration(minutes: 20))` to simulate time passing without real delays.

For rollout boundary tests, construct test cases using specific user ID strings whose hash values are known to fall at the boundaries (pre-compute these in a comment). Group tests: `group('toggle states', ...)`, `group('TTL behavior', ...)`, `group('org switching', ...)`, `group('rollout evaluation', ...)`, `group('error fallback', ...)`. Each group must have at minimum one passing and one failing state tested.

Component
Feature Flag Provider
infrastructure low
Epic Risks (3)
high impact medium prob technical

iOS Keychain and Android Keystore have meaningfully different failure modes and permission models. The secure storage plugin may throw platform-specific exceptions (e.g., biometric enrollment required, Keystore wipe after device re-enrolment) that crash higher-level flows if not caught at the adapter boundary.

Mitigation & Contingency

Mitigation: Wrap all storage plugin calls in try/catch at the adapter layer and expose a typed StorageResult<T> instead of throwing. Write integration tests on real device simulators for both platforms in CI using Fastlane. Document the exception matrix during spike.

Contingency: If a platform-specific failure cannot be handled gracefully, fall back to in-memory-only storage for the current session and surface a non-blocking warning to the user; log the event for investigation.

high impact medium prob integration

Setting a session-level Postgres variable (app.current_org_id) via a Supabase RPC requires that RLS policies on every table reference this variable. If the Supabase project schema has not yet defined these policies, the configurator will set the variable but queries will return unfiltered data, giving a false sense of security.

Mitigation & Contingency

Mitigation: Include a smoke-test RPC in the SupabaseRLSTenantConfigurator that verifies the variable is readable from a policy-scoped query before marking setup as complete. Coordinate with the database migration task to ensure RLS policies reference app.current_org_id before the configurator is shipped.

Contingency: If RLS policies are not in place at integration time, gate all data-fetching components behind a runtime check in SupabaseRLSTenantConfigurator.isRlsScopeVerified(); block data access and surface a developer warning until policies are confirmed.

medium impact medium prob technical

Fetching feature flags from Supabase on every cold start adds network latency before the first branded screen renders. On slow connections this may cause a perceptible blank-screen gap or cause the app to render with default (unflagged) state before flags arrive.

Mitigation & Contingency

Mitigation: Persist the last-known flag set to disk in the FeatureFlagProvider and serve stale-while-revalidate on startup. Gate flag refresh behind a configurable TTL (default 15 minutes) so network calls are not made on every launch.

Contingency: If stale flags cause a feature to appear that should be hidden, add a post-load re-evaluation pass that reconciles the live flag set with the rendered widget tree and triggers a targeted rebuild where needed.