epic-organizational-hierarchy-management-duplicate-detection-task-004 - Implementation Task | Likepersonsapp

high priority high complexity database pending database specialist Tier 2

Acceptance Criteria

Function signature: `calculate_activity_similarity(activity_a_id uuid, activity_b_id uuid, time_window_hours integer, duration_tolerance_minutes integer) RETURNS numeric` — returns a score in [0.0, 1.0]

Returns 0.0 immediately if activities belong to different organizations (security guard)

Returns 0.0 if participant_id fields are not identical (participant match is a hard prerequisite)

Activity type match contributes a weighted score component (suggested weight: 0.35 of total)

Date proximity within time_window_hours contributes a weighted score component (weight: 0.40); score decreases linearly from 1.0 at 0 hours difference to 0.0 at time_window_hours difference

Duration proximity within duration_tolerance_minutes contributes a weighted score component (weight: 0.25); same linear decay

Weights are defined as constants inside the function body with explanatory comments

Function is STABLE (no side effects, safe for parallel execution)

Function handles NULL fields gracefully — a NULL activity_type on either side scores 0 for that component (not an error)

Companion function `activities_exceed_similarity_threshold(a_id uuid, b_id uuid, org_id uuid) RETURNS boolean` wraps the above, reading thresholds from duplicate_detection_config for the given org

Both functions are created with `CREATE OR REPLACE` so re-running migrations is safe

Technical Requirements

frameworks

PostgreSQL 15+ PL/pgSQL

apis

Supabase RPC (to call from Flutter via `supabase.rpc()`)

duplicate_detection_config table

data models

activities

duplicate_detection_config

performance requirements

Function must execute in under 2ms per pair on production hardware (single pair comparison is O(1))

Must not perform table scans — all needed data is passed as parameters or fetched by PK

security requirements

Function must be defined with SECURITY DEFINER only if RLS bypass is required for config lookup; otherwise use SECURITY INVOKER

Organization-isolation check (activities must share the same org) must be the first operation to prevent timing attacks

Execution Context

Execution Tier

Tier 2

Tier 2 - 518 tasks

Can start after Tier 1 completes

View Full Execution Plan

Implementation Notes

Implement using PL/pgSQL with explicit variable declarations for readability. Structure: (1) fetch both activity rows by PK into record variables; (2) org-isolation and participant-match guards with early RETURN 0.0; (3) compute each component as a local variable; (4) combine with weights and RETURN LEAST(total, 1.0) to cap at 1.0. Use `EXTRACT(EPOCH FROM (a.activity_date - b.activity_date)) / 3600.0` for hour difference. Use `ABS(a.duration_minutes - b.duration_minutes)` for duration delta.

Document the weight rationale in a block comment: participant match is binary (prerequisite), date proximity is the strongest signal (0.40) reflecting NHF's concern about activities on the same day, activity type is second (0.35), duration is a tiebreaker (0.25). Consider adding a `similarity_breakdown` JSONB output column to a debug variant of the function for future diagnostics.

Testing Requirements

pgTAP unit tests covering: (1) identical activities (same participant, type, date, duration) → score = 1.0; (2) same participant, different org → score = 0.0; (3) different participants → score = 0.0; (4) same participant and type but date difference equals time_window_hours exactly → date component = 0.0; (5) NULL activity_type on one side → type component = 0.0 but other components still calculated; (6) duration difference = duration_tolerance_minutes exactly → duration component = 0.0; (7) activities meeting combined threshold → `activities_exceed_similarity_threshold` returns true; (8) activities below threshold → returns false. Also test with real org config rows to validate the config lookup path in the companion function.

Component

Duplicate Activity Detector

infrastructure high

Dependencies (2)

Create the Supabase database table for per-organization duplicate detection configuration, storing time-window thresholds (e.g., activities within X hours considered potential duplicates) and similarity thresholds. Include migration scripts and RLS policies so organizations can only read their own config. epic-organizational-hierarchy-management-duplicate-detection-task-002 Create the Supabase database table for storing flagged duplicate activity pairs, including fields for both activity IDs, similarity score, detection timestamp, review status (pending/confirmed_duplicate/false_positive), and reviewer ID. Add indexes for efficient querying by organization and review status. epic-organizational-hierarchy-management-duplicate-detection-task-003

Epic Risks (3)

medium impact high prob technical

Fingerprint-based similarity matching may produce high false-positive rates for common activity types (e.g., weekly group sessions with the same participants), causing alert fatigue among coordinators and undermining trust in the detection system.

Mitigation & Contingency

Mitigation: Start with conservative, high-confidence thresholds (exact peer mentor match + same date + same activity type) before adding looser fuzzy matching. Allow NHF administrators to tune thresholds based on observed false-positive rates. Log all detection decisions for retrospective threshold calibration.

Contingency: Introduce a snooze mechanism allowing coordinators to dismiss false positives for a configurable period. Track dismissal rates per activity type and automatically raise the similarity threshold for activity types with high dismissal rates.

medium impact medium prob technical

A database trigger on the activities insert path adds synchronous overhead to every activity registration. For HLF peer mentors with 380 annual registrations or coordinators doing bulk proxy registration, this could create perceptible latency or lock contention.

Mitigation & Contingency

Mitigation: Implement the trigger as a DEFERRED constraint trigger (fires after the transaction commits) or replace it with a LISTEN/NOTIFY pattern that queues detection work asynchronously via an Edge Function, completely decoupling detection from the registration write path.

Contingency: Disable the synchronous trigger entirely and rely solely on the scheduled Edge Function for batch detection. Accept a detection delay of up to the scheduling interval (e.g., 15 minutes) in exchange for zero impact on registration latency.

medium impact medium prob dependency

The duplicate detection logic must be validated and approved by NHF before go-live, including agreement on threshold values and the review workflow. NHF stakeholder availability for sign-off may delay this epic's release independently of technical readiness.

Mitigation & Contingency

Mitigation: Gate the feature behind the NHF-specific feature flag so technical deployment can proceed independently of business approval. Involve an NHF administrator in threshold calibration sessions during QA, reducing the formal sign-off surface to policy and workflow rather than technical details.

Contingency: Release the detection system in 'silent mode' — flagging duplicates internally without surfacing notifications to coordinators — until NHF approves the workflow. Use the silent period to collect real data on false-positive rates and refine thresholds before activating notifications.

Quick Links

All Tasks Execution Plan

Implement fingerprint comparison SQL function

Acceptance Criteria

Technical Requirements

Execution Context

Implementation Notes

Testing Requirements