Document detection system and threshold tuning guide

epic-organizational-hierarchy-management-duplicate-detection-task-012 — Write technical documentation covering the duplicate detection architecture, fingerprint algorithm, configurable threshold parameters, and how to tune sensitivity per organization. Include a guide for admins on interpreting similarity scores, recommended default thresholds based on NHF's multi-chapter structure, and an explanation of the flag-for-review (not auto-delete) policy rationale.

medium priority low complexity documentation pending documentor Tier 6

Acceptance Criteria

A markdown document `docs/duplicate-detection-system.md` is created covering: system overview, architecture diagram (Mermaid or ASCII), fingerprint algorithm description

Document explains all configurable threshold parameters in DuplicateDetectionConfig with: parameter name, data type, valid range, default value, and effect on sensitivity

Document includes a threshold tuning guide with 3 recommended profiles: Strict (high sensitivity, suitable for small single-chapter orgs), Balanced (default, suitable for NHF's 1400-chapter structure), Permissive (low sensitivity for orgs with legitimately similar recurring activities)

Document includes a section explaining why the system flags for review rather than auto-deleting, referencing Bufdir audit requirements and the risk of false positives in NHF's multi-chapter environment

Admin guide section explains how to interpret similarity scores (0.0–1.0 scale) with concrete examples: score 0.95 = near-identical title+date+duration, score 0.7 = same event type different day

Document includes a step-by-step guide for an admin to update org-specific thresholds via the app's admin settings screen

Document is written in English and is accessible to non-technical admin users (avoid raw SQL, explain acronyms)

Document is reviewed by at least one team member before merging (PR comment or approval required)

Technical Requirements

frameworks

Markdown

Mermaid (for architecture diagrams if supported in docs tooling)

data models

DuplicateDetectionConfig

suspected_duplicates

ActivityFingerprint

security requirements

Document must not include real database credentials, service role keys, or production org IDs

Examples must use placeholder data (e.g., org_id: 'example-org-uuid')

Execution Context

Execution Tier

Tier 6

Tier 6 - 158 tasks

Can start after Tier 5 completes

View Full Execution Plan

Implementation Notes

Write documentation from the implementer's perspective — you have just completed tasks 003–011 so you understand the system deeply. Focus the admin guide on the NHF use case: NHF has 1,400 chapters and 12 national associations, so activities registered at different chapters for the same event type (e.g., monthly meetings) are common and should NOT be flagged as duplicates unless titles, dates, and durations are very close. The recommended Balanced profile threshold should reflect this. Include a FAQ section addressing common admin questions: 'Why does the system show so many suspected duplicates?' (threshold too low) and 'An obvious duplicate was not detected' (threshold too high).

Keep the tone practical and approachable.

Testing Requirements

Documentation review checklist: (1) All threshold parameter names match the actual DuplicateDetectionConfig Dart model fields and DB column names, (2) Default values in documentation match the values in the Supabase seed/migration files, (3) Architecture diagram accurately reflects the implemented trigger → fingerprint → comparison → flag pipeline, (4) Admin guide steps are tested by a non-developer team member performing a threshold update on staging and confirming the instructions are accurate.

Component

Duplicate Activity Detector

infrastructure high

Dependencies (1)

Write integration tests covering the full duplicate detection pipeline: trigger fires on new activity insert, cross-hierarchy comparison identifies matching fingerprints, suspected duplicate record is created, RLS policies correctly scope data per organization, and resolution updates are persisted. Use Supabase local dev environment for test execution. epic-organizational-hierarchy-management-duplicate-detection-task-011

Epic Risks (3)

medium impact high prob technical

Fingerprint-based similarity matching may produce high false-positive rates for common activity types (e.g., weekly group sessions with the same participants), causing alert fatigue among coordinators and undermining trust in the detection system.

Mitigation & Contingency

Mitigation: Start with conservative, high-confidence thresholds (exact peer mentor match + same date + same activity type) before adding looser fuzzy matching. Allow NHF administrators to tune thresholds based on observed false-positive rates. Log all detection decisions for retrospective threshold calibration.

Contingency: Introduce a snooze mechanism allowing coordinators to dismiss false positives for a configurable period. Track dismissal rates per activity type and automatically raise the similarity threshold for activity types with high dismissal rates.

medium impact medium prob technical

A database trigger on the activities insert path adds synchronous overhead to every activity registration. For HLF peer mentors with 380 annual registrations or coordinators doing bulk proxy registration, this could create perceptible latency or lock contention.

Mitigation & Contingency

Mitigation: Implement the trigger as a DEFERRED constraint trigger (fires after the transaction commits) or replace it with a LISTEN/NOTIFY pattern that queues detection work asynchronously via an Edge Function, completely decoupling detection from the registration write path.

Contingency: Disable the synchronous trigger entirely and rely solely on the scheduled Edge Function for batch detection. Accept a detection delay of up to the scheduling interval (e.g., 15 minutes) in exchange for zero impact on registration latency.

medium impact medium prob dependency

The duplicate detection logic must be validated and approved by NHF before go-live, including agreement on threshold values and the review workflow. NHF stakeholder availability for sign-off may delay this epic's release independently of technical readiness.

Mitigation & Contingency

Mitigation: Gate the feature behind the NHF-specific feature flag so technical deployment can proceed independently of business approval. Involve an NHF administrator in threshold calibration sessions during QA, reducing the formal sign-off surface to policy and workflow rather than technical details.

Contingency: Release the detection system in 'silent mode' — flagging duplicates internally without surfacing notifications to coordinators — until NHF approves the workflow. Use the silent period to collect real data on false-positive rates and refine thresholds before activating notifications.

Quick Links

All Tasks Execution Plan