Implement Duplicate Registration Detection in Aggregation

epic-bufdir-report-export-core-backend-task-004 — Add duplicate detection logic to the aggregation service that identifies activities where the same peer mentor registered the same contact on the same date with overlapping time windows. Flag these records in the aggregation output with a duplicate_warning boolean and include both record IDs so the coordinator can review. Duplicate detection must not block export — it only annotates the payload for downstream consumers.

high priority medium complexity backend pending backend specialist Tier 2

Acceptance Criteria

After the activity aggregation query completes, a duplicate detection pass is executed on the in-memory result set (or as an additional SQL query) before the `BufdirPayload` is constructed

Two activity records are classified as duplicates if they share the same peer_mentor_id, the same contact_id, and the same activity_date, AND their time windows overlap (i.e., start_time of one is before end_time of the other)

Each detected duplicate pair produces a `BufdirDuplicateFlag` instance with: original_record_id, duplicate_record_id, peer_mentor_id, contact_id, date, and an overlap_description string (e.g., '09:00–10:00 overlaps with 09:30–10:30')

All detected `BufdirDuplicateFlag` instances are attached to the `BufdirPayload.duplicate_warnings` list

Duplicate detection does NOT modify total_sessions or total_minutes in the `BufdirPeerMentorRecord` — both records are counted in the totals and the flag is informational only

If no duplicates are found, `BufdirPayload.duplicate_warnings` is an empty list (not null)

Activities without a contact_id (group activities or anonymous contacts) are excluded from duplicate detection — they cannot be compared by contact identity

Duplicate detection adds no more than 500ms to the total aggregation time for a chapter-level scope with up to 500 records

The `BufdirPayload` is still returned successfully even if the duplicate detection logic throws an internal error — in that case, `duplicate_warnings` is set to an empty list and the error is logged; export is never blocked

Unit tests cover: zero duplicates, one duplicate pair, multiple independent duplicate pairs, three records where A overlaps B and B overlaps C (all three flagged), and activities with null contact_id (excluded from detection)

Technical Requirements

frameworks

Dart (latest null-safe)

supabase_flutter

flutter_test + mocktail

apis

Supabase PostgREST API (activities table with start_time, end_time, contact_id fields required)

No additional external APIs required

data models

BufdirDuplicateFlag

BufdirPayload (duplicate_warnings field)

BufdirPeerMentorRecord

Activity (must include contact_id, start_time, end_time fields)

performance requirements

Duplicate detection adds under 500ms for up to 500 activity records at chapter scope

Use an efficient O(n log n) algorithm (sort by peer_mentor_id + contact_id + date, then linear scan for overlaps) rather than O(n²) brute-force comparison

For large orgs, consider pushing detection to a SQL window function in the aggregation RPC to avoid loading all raw records into Dart memory

security requirements

Duplicate detection operates only on records already returned by the authenticated aggregation query — no additional data access

contact_id values in BufdirDuplicateFlag are internal IDs only; no PII (names, personal numbers) is included in the flag object

Execution Context

Execution Tier

Tier 2

Tier 2 - 518 tasks

Can start after Tier 1 completes

View Full Execution Plan

Implementation Notes

Implement the duplicate detection as a pure Dart function `List detectDuplicates(List activities)` — no Supabase calls, no side effects. This makes it trivially testable. The algorithm: (1) filter out activities with null contact_id; (2) sort by (peer_mentor_id, contact_id, activity_date, start_time); (3) iterate with a sliding window comparing adjacent records that share the same (peer_mentor_id, contact_id, date) key; (4) check time overlap with the condition `a.start_time < b.end_time && b.start_time < a.end_time`. Wrap the call to this function in a try/catch in the aggregation service and fall back to an empty list on any error — this ensures the non-blocking guarantee is enforced at the call site.

For the overlap_description string, format it as `'{aStart}–{aEnd} overlaps with {bStart}–{bEnd}'` using 24-hour HH:mm notation for consistency with Norwegian conventions. Add a TODO comment for a future enhancement: push this detection to a SQL window function once the activity volume at national scope warrants it.

Testing Requirements

Unit tests in Dart (flutter_test): test the duplicate detection algorithm in isolation by passing a list of mock `Activity` objects and asserting the returned list of `BufdirDuplicateFlag`. Test cases must include: (1) empty input → empty flags; (2) two records, same peer mentor + contact + date, overlapping times → one flag with correct record IDs and overlap description; (3) two records, same peer mentor + contact + date, non-overlapping times (e.g., 09:00–10:00 and 11:00–12:00) → zero flags; (4) null contact_id → excluded from detection, zero flags; (5) three records A/B/C where A overlaps B and B overlaps C → two flags (A-B and B-C); (6) duplicate detection throws an internal exception → caught, empty list returned, error logged. Integration test: verify the full `aggregateActivities` call returns a payload with populated `duplicate_warnings` when seeded data contains known duplicates.

Component

Activity Aggregation Service

service high

Dependencies (1)

Implement the core Supabase query in the activity aggregation service that fetches all activity records for a given org_id and date range. Group results by peer mentor, compute total sessions and minutes per peer mentor, and attach activity type metadata. This is the baseline aggregation path for chapter-level scope before any hierarchy roll-up is applied. epic-bufdir-report-export-core-backend-task-002

Epic Risks (3)

high impact medium prob technical

Supabase Edge Functions have a default execution timeout. For large national-scope exports aggregating tens of thousands of activities across 1,400 chapters, the edge function may time out before completing, leaving coordinators with a failed export and no partial output.

Mitigation & Contingency

Mitigation: Optimise the aggregation SQL using pre-materialised aggregation views or RPC functions that run inside the database rather than iterating records in Deno. Profile query execution time against realistic production data volumes early. Request an elevated timeout limit from Supabase if needed. Implement progress checkpointing so the export can be resumed from the last completed aggregation batch.

Contingency: For organisations exceeding a configurable threshold (e.g. >5,000 activities), switch to an asynchronous export pattern: the edge function writes a 'pending' audit record and enqueues the job; the client polls for completion and is notified via Supabase Realtime when the file is ready.

medium impact medium prob technical

Server-side PDF generation in a Deno Edge Function environment restricts library choices. Many popular PDF libraries require Node.js APIs not available in Deno, or produce large bundle sizes that exceed edge function limits. Choosing the wrong library could block the entire PDF generation path.

Mitigation & Contingency

Mitigation: Spike PDF library selection as the first task of this epic, evaluating at least two Deno-compatible options (e.g. pdf-lib, jsPDF with Deno compatibility shim). Test bundle size and basic rendering before committing to an implementation. Document the chosen library's constraints.

Contingency: If no suitable Deno-native PDF library is found, generate a well-structured HTML report from the edge function and use a headless Chromium service (e.g. Browserless, Gotenberg) for HTML-to-PDF conversion, or temporarily ship CSV-only export while the PDF path is resolved.

high impact high prob technical

Peer mentors affiliated with multiple chapters (a documented NHF scenario) must not be double-counted in participant totals. Incorrect deduplication logic would overreport participation figures to Bufdir, which could be discovered during audit and damage organisational credibility.

Mitigation & Contingency

Mitigation: Define and document the deduplication contract explicitly before coding: deduplication is per-person per-period, not per-activity. Build dedicated unit tests with fixtures containing the exact multi-chapter membership patterns described in NHF's documentation. Have a NHF representative validate test fixture outputs against known-good manual counts.

Contingency: If deduplication logic produces results that cannot be verified against manual counts before launch, surface a deduplication warning in the export preview listing the affected peer mentor IDs, and require explicit coordinator acknowledgement before finalising the export.

Quick Links

All Tasks Execution Plan