Performance validation: sub-500ms query benchmark
epic-duplicate-activity-detection-foundation-task-012 — Execute query performance benchmarks against the check_activity_duplicates RPC on a dataset representative of NHF's 1,400 chapters (seed test data with realistic volume). Confirm EXPLAIN ANALYZE shows index scan on the composite index. Document p95 latency results and confirm sub-500ms threshold is met. If threshold is missed, profile and apply additional query optimizations (partial indexes, statistics updates). Write benchmark script for CI regression detection.
Acceptance Criteria
Technical Requirements
Execution Context
Tier 4 - 323 tasks
Can start after Tier 3 completes
Implementation Notes
Seed data must mirror NHF's real distribution: 1,400 chapters, each with ~5–15 peer mentors, and each peer mentor with ~50–200 historical activities, to stress the index at production scale. Use PostgreSQL's `generate_series` or a Dart seed script calling Supabase bulk insert. For EXPLAIN ANALYZE, connect via `dart:io` Process to `psql` or use the `postgres` Dart package directly (bypassing Supabase client) to capture raw query plans — the Supabase REST client does not expose EXPLAIN output. Key pitfall: Supabase's connection pooler (PgBouncer in transaction mode) can add ~10–30ms overhead per call; benchmark against the direct PostgreSQL connection string (port 5432) to isolate true query latency from pooler overhead, and document both numbers.
For the composite index, confirm it covers columns in the order used by the WHERE clause to maximize selectivity. If p95 fails, first try a partial index filtering on `is_deleted = false` before more invasive rewrites.
Testing Requirements
This task IS the testing artifact. The benchmark script itself constitutes the primary deliverable. It must: (1) use a Dart script or SQL migration to seed realistic volume data; (2) invoke check_activity_duplicates via Supabase RPC client in a loop (minimum 100 iterations) to produce statistically stable latency measurements; (3) capture EXPLAIN ANALYZE output from a direct PostgreSQL connection and assert index usage; (4) compute p50/p95 from measured round-trip times; (5) assert p95 < 500ms and exit non-zero if violated, enabling CI gating. A secondary integration test should verify that the benchmark script itself runs idempotently (running it twice produces the same measurements within ±20% tolerance).
The `check_activity_duplicates` RPC may not meet the 500ms target on production-scale data if the composite index is not applied correctly or if Supabase RLS evaluation adds unexpected overhead, causing the duplicate check to noticeably delay activity submission.
Mitigation & Contingency
Mitigation: Write the RPC with an explicit EXPLAIN ANALYZE in development against a seeded dataset representative of a large chapter (10,000+ activities). Pin the index hint in the RPC body and verify the query plan in Supabase's SQL editor before merging.
Contingency: If the 500ms target cannot be met with the RPC approach, introduce an async post-submit check pattern where the activity is saved first and the duplicate warning is surfaced as a follow-up notification, preserving submission speed at the cost of real-time blocking UX.
RLS policies for the coordinator_duplicate_queue view must correctly scope results to the coordinator's chapters. Incorrect policies could expose duplicate records from other chapters (privacy violation) or hide legitimate duplicates (functional regression).
Mitigation & Contingency
Mitigation: Write explicit integration tests that verify RLS behaviour using at least three distinct coordinator + chapter combinations, including a peer mentor belonging to two chapters. Use Supabase's built-in RLS testing utilities.
Contingency: If RLS proves too complex for the queue view, move the chapter-scoping filter into the DuplicateQueueRepository query layer at the application level, trading database-enforced isolation for application-enforced scoping with full test coverage.
Adding the duplicate_reviewed column to the activities table and the composite index requires a migration against a live table. If the migration locks the table for an extended period, it could disrupt active coordinators submitting activities.
Mitigation & Contingency
Mitigation: Use PostgreSQL's `CREATE INDEX CONCURRENTLY` to avoid table lock. Add the duplicate_reviewed column with a DEFAULT false so no backfill update lock is required. Schedule the migration during a low-traffic window.
Contingency: If concurrent index creation fails or takes too long, fall back to a smaller partial index scoped to the last 90 days of activities, then expand it incrementally.