epic-organizational-hierarchy-management-duplicate-detection-task-003 - Implementation Task | Likepersonsapp

high priority medium complexity database pending database specialist Tier 1

Acceptance Criteria

Table `suspected_duplicates` exists with columns: id (uuid PK), organization_id (uuid FK), activity_id_a (uuid FK → activities), activity_id_b (uuid FK → activities), similarity_score (numeric, 0.0–1.0, NOT NULL), detected_at (timestamptz, NOT NULL, default now()), review_status (text, CHECK IN ('pending','confirmed_duplicate','false_positive'), NOT NULL, default 'pending'), reviewed_by (uuid FK → users, nullable), reviewed_at (timestamptz, nullable), detection_method (text, CHECK IN ('trigger','batch','manual'), NOT NULL)

UNIQUE constraint on (activity_id_a, activity_id_b) prevents duplicate flagging of the same pair — activity_id_a must always be the lexicographically smaller UUID to avoid (A,B) and (B,A) both being stored

Composite index on (organization_id, review_status) for admin review queue queries

Index on (activity_id_a) and (activity_id_b) separately for reverse-lookup when an activity is deleted

RLS: org admins and coordinators can SELECT rows where organization_id matches their org; only admins can UPDATE review_status

reviewed_at is automatically set to now() when review_status is changed from 'pending' via a trigger

Migration is idempotent and includes rollback section as a comment

Foreign keys use ON DELETE CASCADE for activity_id_a and activity_id_b so flagged pairs are removed when activities are deleted

Technical Requirements

frameworks

Supabase (PostgreSQL 15+)

apis

Supabase RLS policies

Supabase Realtime (for admin live updates)

data models

suspected_duplicates

activities

organizations

user_roles

performance requirements

Admin review queue query (org_id + status='pending') must return in under 100ms for up to 10,000 flagged pairs

Index on (organization_id, review_status) must be a partial index where review_status='pending' to keep it small

security requirements

RLS prevents cross-organization data leakage

Only users with role='admin' in the same org can mark pairs as confirmed_duplicate or false_positive

activity_id_a < activity_id_b ordering must be enforced by a CHECK constraint to preserve pair uniqueness

Execution Context

Execution Tier

Tier 1

Tier 1 - 540 tasks

Can start after Tier 0 completes

View Full Execution Plan

Implementation Notes

Enforce the (activity_id_a < activity_id_b) canonical ordering via a CHECK constraint: `CHECK (activity_id_a < activity_id_b)`. Callers (trigger, batch function, manual insert) are responsible for sorting the pair before inserting. For the reviewed_at trigger, use a BEFORE UPDATE trigger that sets `NEW.reviewed_at = now()` when `OLD.review_status = 'pending' AND NEW.review_status != 'pending'`. Expose this table in Flutter through a `SuspectedDuplicatesRepository` with methods: `getPendingPairs(orgId)`, `reviewPair(id, status)`, and a Supabase Realtime subscription stream for the admin review queue.

Use the `detection_method` field to distinguish between trigger-detected and batch-detected pairs — this helps admins understand urgency (trigger = just happened vs batch = historical).

Testing Requirements

pgTAP tests: (1) insert a valid pair and verify defaults; (2) insert (A,B) then (B,A) — second insert must fail UNIQUE constraint; (3) insert pair with similarity_score=1.5 must fail CHECK; (4) set review_status='confirmed_duplicate' and verify reviewed_at is populated by trigger; (5) delete activity_id_a and verify the pair row is cascade-deleted; (6) verify RLS blocks org B admin from reading org A's pairs. Flutter integration test: subscribe to Supabase Realtime on suspected_duplicates for the test org and verify a new insert triggers the stream.

Component

Duplicate Activity Detector

infrastructure high

Dependencies (1)

Define the data structure for activity fingerprints used in duplicate detection: participant ID, activity date, activity type, duration, coordinator ID, and unit level. Design the database schema for storing fingerprints and suspected duplicate pairs, including configurable similarity thresholds per organization. epic-organizational-hierarchy-management-duplicate-detection-task-001

Epic Risks (3)

medium impact high prob technical

Fingerprint-based similarity matching may produce high false-positive rates for common activity types (e.g., weekly group sessions with the same participants), causing alert fatigue among coordinators and undermining trust in the detection system.

Mitigation & Contingency

Mitigation: Start with conservative, high-confidence thresholds (exact peer mentor match + same date + same activity type) before adding looser fuzzy matching. Allow NHF administrators to tune thresholds based on observed false-positive rates. Log all detection decisions for retrospective threshold calibration.

Contingency: Introduce a snooze mechanism allowing coordinators to dismiss false positives for a configurable period. Track dismissal rates per activity type and automatically raise the similarity threshold for activity types with high dismissal rates.

medium impact medium prob technical

A database trigger on the activities insert path adds synchronous overhead to every activity registration. For HLF peer mentors with 380 annual registrations or coordinators doing bulk proxy registration, this could create perceptible latency or lock contention.

Mitigation & Contingency

Mitigation: Implement the trigger as a DEFERRED constraint trigger (fires after the transaction commits) or replace it with a LISTEN/NOTIFY pattern that queues detection work asynchronously via an Edge Function, completely decoupling detection from the registration write path.

Contingency: Disable the synchronous trigger entirely and rely solely on the scheduled Edge Function for batch detection. Accept a detection delay of up to the scheduling interval (e.g., 15 minutes) in exchange for zero impact on registration latency.

medium impact medium prob dependency

The duplicate detection logic must be validated and approved by NHF before go-live, including agreement on threshold values and the review workflow. NHF stakeholder availability for sign-off may delay this epic's release independently of technical readiness.

Mitigation & Contingency

Mitigation: Gate the feature behind the NHF-specific feature flag so technical deployment can proceed independently of business approval. Involve an NHF administrator in threshold calibration sessions during QA, reducing the formal sign-off surface to policy and workflow rather than technical details.

Contingency: Release the detection system in 'silent mode' — flagging duplicates internally without surfacing notifications to coordinators — until NHF approves the workflow. Use the silent period to collect real data on false-positive rates and refine thresholds before activating notifications.

Quick Links

All Tasks Execution Plan

Create suspected duplicates table

Acceptance Criteria

Technical Requirements

Execution Context

Implementation Notes

Testing Requirements