Implement generate_bufdir_report RPC Function
epic-bufdir-data-aggregation-foundation-task-007 — Write the PostgreSQL server-side generate_bufdir_report(p_org_id UUID, p_period_start DATE, p_period_end DATE, p_mapping_version INT) RPC function. The function must perform COUNT DISTINCT on participant_id (deduplication), SUM hours, and GROUP BY bufdir_category using the category mappings table — all within org-scoped RLS. Return a JSONB result set matching Bufdir's expected report schema. Include deduplication of proxy-registered activities per the is_proxy_registered flag.
Acceptance Criteria
Technical Requirements
Execution Context
Tier 3 - 413 tasks
Can start after Tier 2 completes
Implementation Notes
Structure the function as a series of CTEs for readability: (1) `filtered_activities` — activities in org + date range, (2) `mapped_activities` — JOIN with bufdir_category_mappings on internal_type_id + mapping_version, (3) `aggregated` — GROUP BY bufdir_code with COUNT DISTINCT participant_id and SUM hours, (4) `final_result` — format as JSONB array. For deduplication of proxy vs direct: use a UNION approach where you first collect all (participant_id, bufdir_code) pairs regardless of is_proxy_registered, then COUNT DISTINCT — this naturally deduplicates. The is_proxy_registered flag is only relevant if you want to report separately on proxy vs direct hours (keep this as a future extension but implement COUNT DISTINCT naively first). Return JSONB with `json_agg(json_build_object(...))` for the result array.
Handle the NULL case (no activities) by returning an empty JSON array `'[]'::jsonb` not NULL.
Testing Requirements
SQL unit tests using pgTAP or manual test scripts: (1) insert known test activities and verify COUNT DISTINCT matches expected unique participant count, (2) verify proxy-registered activities do not inflate participant_count for participants who also have direct registrations, (3) verify activities outside the date range are excluded, (4) verify unmapped activity types appear in 'unmapped' category, (5) verify a user from org B cannot retrieve org A's data by passing org A's p_org_id. Performance test: load 50,000 rows for a test org and EXPLAIN ANALYZE the function to confirm index usage. Test with p_mapping_version that has no entries — function must return empty array, not error.
Supabase RLS policies may not propagate correctly into RPC function execution context, causing org-scoping predicates to be silently ignored when the function is invoked with service_role key. This could lead to cross-org data exposure in production without any obvious error.
Mitigation & Contingency
Mitigation: Invoke all RPCs using the anon/authenticated key rather than service_role, write explicit WHERE org_id = auth.uid()::org_id predicates inside the RPC body as a secondary control, and include automated cross-org leakage tests in the CI pipeline from day one.
Contingency: If RLS bypass is discovered post-deployment, immediately revoke service_role usage in all aggregation paths and hotfix with explicit org_id parameters passed as function arguments validated server-side.
Bufdir may update its official reporting category taxonomy between the mapping configuration being defined and the annual submission deadline. If the ActivityCategoryMappingConfig is compiled as a static Dart constant, it cannot be updated without an app release, potentially causing mapping failures that block submission.
Mitigation & Contingency
Mitigation: Store the mapping as a remote-configurable table (bufdir_category_mappings) in Supabase with a version field rather than as a hardcoded Dart constant. Fetch the current mapping at aggregation time so updates can be pushed without a new app release.
Contingency: If a mapping mismatch is detected during an active reporting cycle, coordinators can be temporarily directed to the manual Excel fallback while an emergency mapping update is pushed to the Supabase table.
For large organisations like NHF with 1,400 local chapters and potentially tens of thousands of activity records per reporting period, the Supabase RPC aggregation query may exceed the default PostgREST statement timeout, causing the aggregation to fail with a 503 error.
Mitigation & Contingency
Mitigation: Add partial indexes on (organization_id, created_at) and (organization_id, activity_type_id) to the activities table before writing the RPC. Profile the query plan against a realistic fixture of 50,000 records during development and increase the statement_timeout setting for the RPC role if needed.
Contingency: Implement chunked aggregation fallback: split the period into monthly sub-ranges and aggregate each chunk client-side, merging results with UNION-style Dart logic before assembling the final payload.