Implement multi-chapter deduplication algorithm
epic-bufdir-reporting-export-processing-services-task-002 — Extend the BufdirActivityQueryService with the multi-chapter deduplication algorithm that identifies and removes duplicate activity records when a peer mentor belongs to multiple chapters. Implement fingerprinting logic based on peer mentor ID, activity type, date, and duration to detect duplicates across chapter boundaries.
Acceptance Criteria
Technical Requirements
Execution Context
Tier 1 - 540 tasks
Can start after Tier 0 completes
Implementation Notes
Implement deduplication as a separate BufdirDeduplicator class rather than embedding it in BufdirActivityQueryService ā this keeps concerns separated and makes the algorithm independently testable. The fingerprint key should be: '${record.peerMentorId}|${record.activityType}|${record.activityDate.toIso8601String().substring(0, 10)}|${record.durationMinutes}'. Use a LinkedHashMap to preserve insertion order of surviving records. When a collision is detected, retain the record with the lexicographically smaller chapterId for full determinism.
Output the DeduplicationSummary as a simple value object: {originalCount, deduplicatedCount, removedCount, conflictingChapterPairs: List
Testing Requirements
Unit tests using flutter_test only ā no Supabase dependency needed since deduplication is a pure function. Required test cases: (1) list with no duplicates returns identical list, (2) two records with identical fingerprint ā one removed, (3) three records from three chapters with same fingerprint ā one remains, (4) two peer mentors with same activity fields but different peerMentorId ā both retained, (5) empty list returns empty list, (6) list with 1000+ records (performance test with stopwatch assertion), (7) determinism test: shuffle input and assert same output. Use table-driven test style with a helper that builds BufdirActivityRecord test fixtures. Target 100% branch coverage on the deduplication method.
NHF contacts can belong to up to five local chapters simultaneously. If the deduplication logic in the activity query service incorrectly attributes cross-chapter activities, organisations will either under-report or over-report to Bufdir, which could trigger grant clawback or compliance investigations.
Mitigation & Contingency
Mitigation: Implement deduplication using the existing multi-chapter membership service as the source of truth for chapter affiliation. Write test fixtures covering all known multi-chapter edge cases and validate outputs against manually prepared reference exports from NHF.
Contingency: If deduplication cannot be made deterministic for complex hierarchies before release, gate the export behind an org-level feature flag and require NHF to validate a preview export against their manual Excel before enabling in production.
Server-side Dart libraries for Excel generation are less mature than equivalents in Node.js or Python. The chosen library may lack support for Bufdir-required formatting features (merged cells, data validation, specific date formats), requiring significant workaround effort or a library switch mid-implementation.
Mitigation & Contingency
Mitigation: Evaluate the top two Dart xlsx libraries (excel, spreadsheet_decoder) against a Bufdir template sample file before committing. Identify all required formatting features and verify library support in a spike.
Contingency: If no Dart library meets requirements, implement the Excel generation as a Supabase Edge Function in TypeScript using the well-supported ExcelJS library, exposing it to the Dart backend via an internal RPC call.
The attachment bundler must retrieve documents from Supabase Storage that were uploaded by the document attachments feature. If storage paths, RLS policies, or signed URL expiry have not been standardised across features, the bundler may fail to retrieve attachments at export time.
Mitigation & Contingency
Mitigation: Audit the document attachments feature's storage schema and RLS policies before implementing the bundler. Agree on a stable internal service-account access pattern for cross-feature storage reads.
Contingency: If cross-feature storage access cannot be made reliable, implement the bundler to include only attachments that can be retrieved successfully and produce a manifest listing any attachments that could not be bundled, rather than failing the entire export.