Build scheduled Edge Function for batch detection
epic-organizational-hierarchy-management-duplicate-detection-task-007 — Implement a Supabase Edge Function that runs on a configurable schedule to perform batch duplicate detection across existing activities. This handles cases the trigger may miss (e.g., retroactive imports) and allows re-evaluation when configuration thresholds change. Includes rate limiting and organization-scoped processing.
Acceptance Criteria
Technical Requirements
Execution Context
Tier 5 - 253 tasks
Can start after Tier 4 completes
Implementation Notes
Structure the Edge Function as: (1) JWT validation middleware; (2) parameter parsing and validation; (3) org list resolution (single org or all active orgs from config); (4) per-org processing loop with try/catch that populates the errors array; (5) structured JSON response. For cursor-based pagination, store `{ org_id, last_processed_activity_id, updated_at }` in `batch_detection_cursors`. Use the Supabase service role client initialized with `createClient(SUPABASE_URL, SERVICE_ROLE_KEY)` — store these as Edge Function secrets, never hardcoded. For the pg_cron job, add a migration: `SELECT cron.schedule('batch-duplicate-detection', '0 2 * * *', $$SELECT net.http_post(url := 'https://
Use `since_hours: 26` (slightly more than 24h) in the cron job to provide overlap and catch any activities that fell in the gap between trigger firings. Document the overlap strategy in a code comment.
Testing Requirements
Unit tests (Deno test runner): (1) mock Supabase client and verify correct RPC calls are made for a single org; (2) dry_run=true returns count without insert calls; (3) invalid org_id UUID returns HTTP 400; (4) missing/invalid Authorization header returns HTTP 401; (5) RPC error for one org is caught and added to errors array without stopping other orgs. Integration tests against local Supabase CLI: (1) seed 20 cross-hierarchy near-duplicate activities and run the function → verify 10 pairs inserted (unique pairs); (2) run again immediately → 0 new pairs inserted (idempotency); (3) change similarity_threshold in config and run again → verify previously-missed pairs are now flagged with detection_method='batch'; (4) simulate org with 600 activities → verify cursor is written and only 500 are processed. End-to-end: trigger pg_cron manually and verify the full pipeline completes and sends a Supabase Realtime event to the admin Flutter client.
Fingerprint-based similarity matching may produce high false-positive rates for common activity types (e.g., weekly group sessions with the same participants), causing alert fatigue among coordinators and undermining trust in the detection system.
Mitigation & Contingency
Mitigation: Start with conservative, high-confidence thresholds (exact peer mentor match + same date + same activity type) before adding looser fuzzy matching. Allow NHF administrators to tune thresholds based on observed false-positive rates. Log all detection decisions for retrospective threshold calibration.
Contingency: Introduce a snooze mechanism allowing coordinators to dismiss false positives for a configurable period. Track dismissal rates per activity type and automatically raise the similarity threshold for activity types with high dismissal rates.
A database trigger on the activities insert path adds synchronous overhead to every activity registration. For HLF peer mentors with 380 annual registrations or coordinators doing bulk proxy registration, this could create perceptible latency or lock contention.
Mitigation & Contingency
Mitigation: Implement the trigger as a DEFERRED constraint trigger (fires after the transaction commits) or replace it with a LISTEN/NOTIFY pattern that queues detection work asynchronously via an Edge Function, completely decoupling detection from the registration write path.
Contingency: Disable the synchronous trigger entirely and rely solely on the scheduled Edge Function for batch detection. Accept a detection delay of up to the scheduling interval (e.g., 15 minutes) in exchange for zero impact on registration latency.
The duplicate detection logic must be validated and approved by NHF before go-live, including agreement on threshold values and the review workflow. NHF stakeholder availability for sign-off may delay this epic's release independently of technical readiness.
Mitigation & Contingency
Mitigation: Gate the feature behind the NHF-specific feature flag so technical deployment can proceed independently of business approval. Involve an NHF administrator in threshold calibration sessions during QA, reducing the formal sign-off surface to policy and workflow rather than technical details.
Contingency: Release the detection system in 'silent mode' — flagging duplicates internally without surfacing notifications to coordinators — until NHF approves the workflow. Use the silent period to collect real data on false-positive rates and refine thresholds before activating notifications.