Validate all manifest color pairs via ContrastRatioValidator
epic-visual-design-accessibility-foundation-task-010 — Write a validation script that iterates over all color pair declarations in the Accessibility Token Manifest and runs each pair through the ContrastRatioValidator. The script must produce a pass/fail report and fail the process with a non-zero exit code if any pair falls below the WCAG 2.2 AA threshold, enabling CI enforcement.
Acceptance Criteria
Technical Requirements
Execution Context
Tier 3 - 413 tasks
Can start after Tier 2 completes
Implementation Notes
Read the manifest file with `File(manifestPath).readAsStringSync()` and parse with `jsonDecode`. Iterate over the pairs list. Use `ContrastRatioValidator` (from task-009) for each pair. Collect results and print the report using `stdout.writeln` with ANSI color codes for terminal readability (green for PASS, red for FAIL) — but check `stdout.hasTerminal` before applying ANSI codes so CI log output remains plain text.
Use `exit(1)` from `dart:io` for failure. Do not use `process.exit()` or `Platform.exit()`. Add this script invocation to `.github/workflows/` (or equivalent CI config) as a job step: `- run: dart bin/validate_manifest_contrast.dart`. This step should run on every PR touching the manifest or the design token files.
Testing Requirements
Write integration tests using `dart:io` Process to invoke the script as a subprocess. Test scenarios: (1) provide a manifest JSON with all passing pairs — assert exit code 0 and stdout contains 'PASS' for each entry, (2) provide a manifest with one failing pair — assert exit code 1 and stdout identifies the failing pair by name, (3) provide malformed JSON — assert exit code 2 and stderr contains an error message. Keep test manifest fixtures in `test/fixtures/` as small JSON files with 2-3 pairs each.
The WCAG 2.2 relative luminance formula requires gamma-corrected sRGB calculations. Floating-point rounding differences between Dart and reference implementations could produce off-by-one classifications for near-threshold color pairs, resulting in pairs that just pass or just fail in CI but behave differently at runtime.
Mitigation & Contingency
Mitigation: Implement the algorithm directly from the WCAG 2.2 specification using the exact linearisation constants. Validate the Dart implementation against the W3C reference test vectors and against a known-good JavaScript implementation for at least 50 color pairs spanning the compliance boundaries.
Contingency: If discrepancies are found, add a configurable tolerance margin (e.g., ±0.005 on the ratio) and flag near-threshold pairs as warnings rather than hard failures, escalating to the design team for manual review.
The token manifest is a static data file. If developers add new color tokens to the design-token-provider without updating the manifest, the manifest becomes stale and the CI validator produces false negatives — passing builds that contain unvalidated color pairs.
Mitigation & Contingency
Mitigation: Add a CI step that cross-references every token constant exported by the design-token-provider against the manifest at build time, failing if any token is present in the provider but absent from the manifest. Document this requirement clearly in the contributing guide.
Contingency: If drift is detected post-merge, run a full manifest regeneration script and treat the resulting manifest diff as a blocking pull request with mandatory accessibility review.
The flutter_accessibility_lints package (or custom lint rules) may produce false positives on patterns the team deliberately uses — for example, decorative icon widgets that intentionally omit semantic labels. Excessive false positives will lead developers to add blanket ignore comments, undermining the entire lint strategy.
Mitigation & Contingency
Mitigation: Audit the full lint rule set against the existing codebase before enabling rules. Create a documented list of approved ignore-comment patterns with mandatory justification comments. Restrict ignore patterns to decorative-only contexts.
Contingency: If false positive rates exceed 10% of lint output, disable the highest-noise individual rules and replace them with targeted custom lint rules scoped to the specific patterns the team controls.