epic-organizational-hierarchy-management-duplicate-detection-task-005 - Implementation Task | Likepersonsapp

high priority high complexity database pending database specialist Tier 3

Acceptance Criteria

Function signature: `get_cross_hierarchy_activity_candidates(target_activity_id uuid, time_window_hours integer) RETURNS TABLE(candidate_activity_id uuid, coordinator_id uuid, unit_id uuid, unit_level integer, similarity_score numeric)`

Returns only activities from the same organization as target_activity_id (org-isolated)

Returns only activities from coordinators assigned to a different organizational unit than the coordinator of the target activity

Returns only activities within the configured time window (uses the time_window_hours parameter)

Returns only activities with the same participant_id as the target activity

Includes the similarity_score from `calculate_activity_similarity()` for each candidate

Filters out pairs already present in suspected_duplicates table (already flagged — avoid re-flagging)

Filters out the target activity itself

Returns results ordered by similarity_score DESC so callers can apply threshold filtering

Handles organizations with up to 1,400 local units (NHF scale) without full-table scans — relies on indexes on activities(participant_id, organization_id, activity_date)

Function is STABLE and safe for parallel execution

Technical Requirements

frameworks

PostgreSQL 15+ PL/pgSQL / SQL

apis

calculate_activity_similarity() function

Supabase RPC

data models

activities

organization_units

unit_assignments

suspected_duplicates

duplicate_detection_config

performance requirements

For NHF-scale orgs (1,400 units, ~50,000 activities), the candidate set before similarity scoring must be narrowed by participant_id + date range index scan — not a full org-wide scan

Total function execution for a single activity must complete in under 500ms at p99

security requirements

Function must be callable only by authenticated users (not anonymous) via Supabase RPC

Organization isolation enforced inside function, not just via RLS, because function will be called from a trigger context

Execution Context

Execution Tier

Tier 3

Tier 3 - 413 tasks

Can start after Tier 2 completes

View Full Execution Plan

Implementation Notes

Use a CTE structure for clarity: (1) `target_info` CTE fetches the target activity's participant_id, org_id, activity_date, coordinator_id, and unit_id; (2) `same_unit_coordinators` CTE collects all coordinator IDs sharing the same unit_id as the target coordinator (to exclude); (3) `candidates` CTE queries activities matching participant_id, org_id, and date range while excluding same-unit coordinators and already-flagged pairs; (4) final SELECT scores each candidate via `calculate_activity_similarity()`. Ensure indexes exist: `CREATE INDEX IF NOT EXISTS idx_activities_participant_org_date ON activities(participant_id, organization_id, activity_date)`. The unit hierarchy join can use a recursive CTE if the org structure is tree-shaped, but for the initial implementation a flat join on `unit_assignments` (coordinator → unit → org) is sufficient — document this limitation and the path to a recursive upgrade.

Testing Requirements

Create a test dataset with: (1) org A with 3 units (national, regional, local) and 4 activities for the same participant registered by coordinators in different units on the same day → function must return all 3 cross-unit candidates; (2) same participant, same unit coordinator → must NOT be returned (same-unit activities are not cross-hierarchy duplicates); (3) activities outside the time window → must NOT be returned; (4) pair already in suspected_duplicates → must NOT be returned; (5) activity from org B → must NOT be returned even if participant_id matches. Performance test: insert 10,000 activities for a large org and assert the function returns in under 500ms for a single call.

Component

Duplicate Activity Detector

infrastructure high

Dependencies (1)

Implement a PostgreSQL function that compares two activity records using the fingerprint fields: same participant, activity type match, date within configurable time window, and duration within similarity threshold. Function must return a similarity score and accept per-organization threshold configuration as parameters. epic-organizational-hierarchy-management-duplicate-detection-task-004

Epic Risks (3)

medium impact high prob technical

Fingerprint-based similarity matching may produce high false-positive rates for common activity types (e.g., weekly group sessions with the same participants), causing alert fatigue among coordinators and undermining trust in the detection system.

Mitigation & Contingency

Mitigation: Start with conservative, high-confidence thresholds (exact peer mentor match + same date + same activity type) before adding looser fuzzy matching. Allow NHF administrators to tune thresholds based on observed false-positive rates. Log all detection decisions for retrospective threshold calibration.

Contingency: Introduce a snooze mechanism allowing coordinators to dismiss false positives for a configurable period. Track dismissal rates per activity type and automatically raise the similarity threshold for activity types with high dismissal rates.

medium impact medium prob technical

A database trigger on the activities insert path adds synchronous overhead to every activity registration. For HLF peer mentors with 380 annual registrations or coordinators doing bulk proxy registration, this could create perceptible latency or lock contention.

Mitigation & Contingency

Mitigation: Implement the trigger as a DEFERRED constraint trigger (fires after the transaction commits) or replace it with a LISTEN/NOTIFY pattern that queues detection work asynchronously via an Edge Function, completely decoupling detection from the registration write path.

Contingency: Disable the synchronous trigger entirely and rely solely on the scheduled Edge Function for batch detection. Accept a detection delay of up to the scheduling interval (e.g., 15 minutes) in exchange for zero impact on registration latency.

medium impact medium prob dependency

The duplicate detection logic must be validated and approved by NHF before go-live, including agreement on threshold values and the review workflow. NHF stakeholder availability for sign-off may delay this epic's release independently of technical readiness.

Mitigation & Contingency

Mitigation: Gate the feature behind the NHF-specific feature flag so technical deployment can proceed independently of business approval. Involve an NHF administrator in threshold calibration sessions during QA, reducing the formal sign-off surface to policy and workflow rather than technical details.

Contingency: Release the detection system in 'silent mode' — flagging duplicates internally without surfacing notifications to coordinators — until NHF approves the workflow. Use the silent period to collect real data on false-positive rates and refine thresholds before activating notifications.

Quick Links

All Tasks Execution Plan

Build cross-hierarchy activity query

Acceptance Criteria

Technical Requirements

Execution Context

Implementation Notes

Testing Requirements