Implement Duplicate Registration Detection in Aggregation
epic-bufdir-report-export-core-backend-task-004 — Add duplicate detection logic to the aggregation service that identifies activities where the same peer mentor registered the same contact on the same date with overlapping time windows. Flag these records in the aggregation output with a duplicate_warning boolean and include both record IDs so the coordinator can review. Duplicate detection must not block export ā it only annotates the payload for downstream consumers.
Acceptance Criteria
Technical Requirements
Execution Context
Tier 2 - 518 tasks
Can start after Tier 1 completes
Implementation Notes
Implement the duplicate detection as a pure Dart function `List
For the overlap_description string, format it as `'{aStart}ā{aEnd} overlaps with {bStart}ā{bEnd}'` using 24-hour HH:mm notation for consistency with Norwegian conventions. Add a TODO comment for a future enhancement: push this detection to a SQL window function once the activity volume at national scope warrants it.
Testing Requirements
Unit tests in Dart (flutter_test): test the duplicate detection algorithm in isolation by passing a list of mock `Activity` objects and asserting the returned list of `BufdirDuplicateFlag`. Test cases must include: (1) empty input ā empty flags; (2) two records, same peer mentor + contact + date, overlapping times ā one flag with correct record IDs and overlap description; (3) two records, same peer mentor + contact + date, non-overlapping times (e.g., 09:00ā10:00 and 11:00ā12:00) ā zero flags; (4) null contact_id ā excluded from detection, zero flags; (5) three records A/B/C where A overlaps B and B overlaps C ā two flags (A-B and B-C); (6) duplicate detection throws an internal exception ā caught, empty list returned, error logged. Integration test: verify the full `aggregateActivities` call returns a payload with populated `duplicate_warnings` when seeded data contains known duplicates.
Supabase Edge Functions have a default execution timeout. For large national-scope exports aggregating tens of thousands of activities across 1,400 chapters, the edge function may time out before completing, leaving coordinators with a failed export and no partial output.
Mitigation & Contingency
Mitigation: Optimise the aggregation SQL using pre-materialised aggregation views or RPC functions that run inside the database rather than iterating records in Deno. Profile query execution time against realistic production data volumes early. Request an elevated timeout limit from Supabase if needed. Implement progress checkpointing so the export can be resumed from the last completed aggregation batch.
Contingency: For organisations exceeding a configurable threshold (e.g. >5,000 activities), switch to an asynchronous export pattern: the edge function writes a 'pending' audit record and enqueues the job; the client polls for completion and is notified via Supabase Realtime when the file is ready.
Server-side PDF generation in a Deno Edge Function environment restricts library choices. Many popular PDF libraries require Node.js APIs not available in Deno, or produce large bundle sizes that exceed edge function limits. Choosing the wrong library could block the entire PDF generation path.
Mitigation & Contingency
Mitigation: Spike PDF library selection as the first task of this epic, evaluating at least two Deno-compatible options (e.g. pdf-lib, jsPDF with Deno compatibility shim). Test bundle size and basic rendering before committing to an implementation. Document the chosen library's constraints.
Contingency: If no suitable Deno-native PDF library is found, generate a well-structured HTML report from the edge function and use a headless Chromium service (e.g. Browserless, Gotenberg) for HTML-to-PDF conversion, or temporarily ship CSV-only export while the PDF path is resolved.
Peer mentors affiliated with multiple chapters (a documented NHF scenario) must not be double-counted in participant totals. Incorrect deduplication logic would overreport participation figures to Bufdir, which could be discovered during audit and damage organisational credibility.
Mitigation & Contingency
Mitigation: Define and document the deduplication contract explicitly before coding: deduplication is per-person per-period, not per-activity. Build dedicated unit tests with fixtures containing the exact multi-chapter membership patterns described in NHF's documentation. Have a NHF representative validate test fixture outputs against known-good manual counts.
Contingency: If deduplication logic produces results that cannot be verified against manual counts before launch, surface a deduplication warning in the export preview listing the affected peer mentor IDs, and require explicit coordinator acknowledgement before finalising the export.