epic-organizational-hierarchy-management-duplicate-detection-task-001 - Implementation Task | Likepersonsapp

high priority medium complexity database pending database specialist Tier 0

Acceptance Criteria

A Supabase migration file defines the activity_fingerprints table with columns: id (uuid, PK), activity_id (uuid, FK → activities), participant_id (uuid, nullable FK → users), activity_date (date, not null), activity_type (text, not null), duration_minutes (integer, not null), coordinator_id (uuid, FK → users), unit_id (uuid, FK → hierarchy_nodes), organization_id (uuid, FK → organizations, not null), fingerprint_hash (text, not null), created_at (timestamptz, default now())

A Supabase migration file defines the suspected_duplicate_pairs table with columns: id (uuid, PK), activity_id_a (uuid, FK → activities), activity_id_b (uuid, FK → activities), similarity_score (float, 0.0–1.0), matched_fields (jsonb, array of field names that matched), status (text, CHECK IN ('pending', 'confirmed_duplicate', 'dismissed')), resolved_by (uuid, nullable FK → users), resolved_at (timestamptz), organization_id (uuid, FK → organizations, not null), created_at (timestamptz, default now())

A Supabase migration file defines the duplicate_detection_config table with columns: id (uuid, PK), organization_id (uuid, FK → organizations, unique, not null), similarity_threshold (float, default 0.85, CHECK 0.0–1.0), enabled_fields (jsonb, array of field names to include in fingerprint), auto_flag_threshold (float, default 0.95), created_at (timestamptz), updated_at (timestamptz)

RLS policies are defined for all three tables: users can only read/write rows where organization_id matches their JWT organization claim

All three tables have appropriate indexes: activity_fingerprints(organization_id, activity_date, activity_type), suspected_duplicate_pairs(organization_id, status), duplicate_detection_config(organization_id)

Dart model classes are defined for ActivityFingerprint, SuspectedDuplicatePair, and DuplicateDetectionConfig with fromJson/toJson factory methods

The fingerprint_hash column is designed to store a deterministic hash of the configurable fields (SHA-256 of a canonical JSON string) — the hashing algorithm and field order are documented in a code comment

Migration files are tested: applying the migration to a clean Supabase test project succeeds without errors and rollback SQL is provided

Technical Requirements

frameworks

Flutter

Riverpod

apis

Supabase REST API

data models

ActivityFingerprint

SuspectedDuplicatePair

DuplicateDetectionConfig

Activity

HierarchyNode

Organization

performance requirements

activity_fingerprints table must support efficient lookups by (organization_id, activity_date, activity_type) — composite index required

fingerprint_hash column enables O(1) exact-match duplicate detection via index scan — ensure index on (organization_id, fingerprint_hash)

suspected_duplicate_pairs table partitioned or indexed by organization_id and status to support paginated pending-review queries efficiently

security requirements

RLS policy on all three tables: SELECT, INSERT, UPDATE, DELETE restricted to rows where organization_id = (SELECT organization_id FROM user_profiles WHERE id = auth.uid())

resolved_by column must be validated server-side — client cannot set it to an arbitrary user ID; use a Supabase Edge Function or trigger to set resolved_by = auth.uid() on UPDATE

fingerprint_hash must be computed server-side (Supabase trigger or Edge Function) to prevent client-side manipulation of duplicate detection logic

participant_id must not appear in any API response visible to users without coordinator or admin role — apply column-level RLS or filter in the repository layer

Execution Context

Execution Tier

Tier 0

Tier 0 - 440 tasks

View Full Execution Plan

Implementation Notes

Place all three migration SQL files in supabase/migrations/ with timestamp prefix (e.g., 20260329_activity_fingerprints.sql). Use UUIDs generated by gen_random_uuid() for all PKs. For the fingerprint hash, compute it as a SHA-256 of a UTF-8 encoded canonical JSON string of the enabled fields in alphabetical key order — this makes the hash deterministic regardless of field insertion order. Store the hash computation as a Supabase trigger function (BEFORE INSERT/UPDATE on activities) so that fingerprints are always in sync without requiring client-side logic.

The enabled_fields JSONB column in duplicate_detection_config allows each organization to include only the fields that are meaningful for their duplicate detection rules — for example, NHF might weight unit_id heavily while HLF might weight participant_id. Design the schema to be extensible: future fields (e.g., location, event_title) can be added to the fingerprint without a migration by simply adding them to enabled_fields. Document the schema decisions in a schema-decisions.md file adjacent to the migration files, covering: why JSONB for matched_fields (flexibility), why float for similarity_score (future fuzzy matching), and why organization-level thresholds (multi-tenant requirement from NHF's duplicate detection use case in the requirements doc).

Testing Requirements

Database migration tests: apply migration to a Supabase test project and assert all three tables exist with correct column types and constraints using a Dart test script or Supabase SQL query. RLS tests: authenticate as user from org A, attempt INSERT into activity_fingerprints with organization_id of org B, assert 403/RLS error returned. Dart unit tests: ActivityFingerprint.fromJson() correctly parses a fixture JSON map; toJson() produces the expected map; fingerprint hash is deterministic (same inputs always produce same hash). Unit test for hash canonicalization: field order in the JSON string used for hashing is always alphabetical to prevent hash collisions from field reordering.

Component

Duplicate Activity Detector

infrastructure high

Epic Risks (3)

medium impact high prob technical

Fingerprint-based similarity matching may produce high false-positive rates for common activity types (e.g., weekly group sessions with the same participants), causing alert fatigue among coordinators and undermining trust in the detection system.

Mitigation & Contingency

Mitigation: Start with conservative, high-confidence thresholds (exact peer mentor match + same date + same activity type) before adding looser fuzzy matching. Allow NHF administrators to tune thresholds based on observed false-positive rates. Log all detection decisions for retrospective threshold calibration.

Contingency: Introduce a snooze mechanism allowing coordinators to dismiss false positives for a configurable period. Track dismissal rates per activity type and automatically raise the similarity threshold for activity types with high dismissal rates.

medium impact medium prob technical

A database trigger on the activities insert path adds synchronous overhead to every activity registration. For HLF peer mentors with 380 annual registrations or coordinators doing bulk proxy registration, this could create perceptible latency or lock contention.

Mitigation & Contingency

Mitigation: Implement the trigger as a DEFERRED constraint trigger (fires after the transaction commits) or replace it with a LISTEN/NOTIFY pattern that queues detection work asynchronously via an Edge Function, completely decoupling detection from the registration write path.

Contingency: Disable the synchronous trigger entirely and rely solely on the scheduled Edge Function for batch detection. Accept a detection delay of up to the scheduling interval (e.g., 15 minutes) in exchange for zero impact on registration latency.

medium impact medium prob dependency

The duplicate detection logic must be validated and approved by NHF before go-live, including agreement on threshold values and the review workflow. NHF stakeholder availability for sign-off may delay this epic's release independently of technical readiness.

Mitigation & Contingency

Mitigation: Gate the feature behind the NHF-specific feature flag so technical deployment can proceed independently of business approval. Involve an NHF administrator in threshold calibration sessions during QA, reducing the formal sign-off surface to policy and workflow rather than technical details.

Contingency: Release the detection system in 'silent mode' — flagging duplicates internally without surfacing notifications to coordinators — until NHF approves the workflow. Use the silent period to collect real data on false-positive rates and refine thresholds before activating notifications.

Quick Links

All Tasks Execution Plan

Design activity fingerprint schema

Acceptance Criteria

Technical Requirements

Execution Context

Implementation Notes

Testing Requirements