Implement fingerprint comparison SQL function
epic-organizational-hierarchy-management-duplicate-detection-task-004 — Implement a PostgreSQL function that compares two activity records using the fingerprint fields: same participant, activity type match, date within configurable time window, and duration within similarity threshold. Function must return a similarity score and accept per-organization threshold configuration as parameters.
Acceptance Criteria
Technical Requirements
Execution Context
Tier 2 - 518 tasks
Can start after Tier 1 completes
Implementation Notes
Implement using PL/pgSQL with explicit variable declarations for readability. Structure: (1) fetch both activity rows by PK into record variables; (2) org-isolation and participant-match guards with early RETURN 0.0; (3) compute each component as a local variable; (4) combine with weights and RETURN LEAST(total, 1.0) to cap at 1.0. Use `EXTRACT(EPOCH FROM (a.activity_date - b.activity_date)) / 3600.0` for hour difference. Use `ABS(a.duration_minutes - b.duration_minutes)` for duration delta.
Document the weight rationale in a block comment: participant match is binary (prerequisite), date proximity is the strongest signal (0.40) reflecting NHF's concern about activities on the same day, activity type is second (0.35), duration is a tiebreaker (0.25). Consider adding a `similarity_breakdown` JSONB output column to a debug variant of the function for future diagnostics.
Testing Requirements
pgTAP unit tests covering: (1) identical activities (same participant, type, date, duration) → score = 1.0; (2) same participant, different org → score = 0.0; (3) different participants → score = 0.0; (4) same participant and type but date difference equals time_window_hours exactly → date component = 0.0; (5) NULL activity_type on one side → type component = 0.0 but other components still calculated; (6) duration difference = duration_tolerance_minutes exactly → duration component = 0.0; (7) activities meeting combined threshold → `activities_exceed_similarity_threshold` returns true; (8) activities below threshold → returns false. Also test with real org config rows to validate the config lookup path in the companion function.
Fingerprint-based similarity matching may produce high false-positive rates for common activity types (e.g., weekly group sessions with the same participants), causing alert fatigue among coordinators and undermining trust in the detection system.
Mitigation & Contingency
Mitigation: Start with conservative, high-confidence thresholds (exact peer mentor match + same date + same activity type) before adding looser fuzzy matching. Allow NHF administrators to tune thresholds based on observed false-positive rates. Log all detection decisions for retrospective threshold calibration.
Contingency: Introduce a snooze mechanism allowing coordinators to dismiss false positives for a configurable period. Track dismissal rates per activity type and automatically raise the similarity threshold for activity types with high dismissal rates.
A database trigger on the activities insert path adds synchronous overhead to every activity registration. For HLF peer mentors with 380 annual registrations or coordinators doing bulk proxy registration, this could create perceptible latency or lock contention.
Mitigation & Contingency
Mitigation: Implement the trigger as a DEFERRED constraint trigger (fires after the transaction commits) or replace it with a LISTEN/NOTIFY pattern that queues detection work asynchronously via an Edge Function, completely decoupling detection from the registration write path.
Contingency: Disable the synchronous trigger entirely and rely solely on the scheduled Edge Function for batch detection. Accept a detection delay of up to the scheduling interval (e.g., 15 minutes) in exchange for zero impact on registration latency.
The duplicate detection logic must be validated and approved by NHF before go-live, including agreement on threshold values and the review workflow. NHF stakeholder availability for sign-off may delay this epic's release independently of technical readiness.
Mitigation & Contingency
Mitigation: Gate the feature behind the NHF-specific feature flag so technical deployment can proceed independently of business approval. Involve an NHF administrator in threshold calibration sessions during QA, reducing the formal sign-off surface to policy and workflow rather than technical details.
Contingency: Release the detection system in 'silent mode' — flagging duplicates internally without surfacing notifications to coordinators — until NHF approves the workflow. Use the silent period to collect real data on false-positive rates and refine thresholds before activating notifications.