critical priority high complexity database pending database specialist Tier 3

Acceptance Criteria

Function signature: generate_bufdir_report(p_org_id UUID, p_period_start DATE, p_period_end DATE, p_mapping_version INT) RETURNS JSONB
Returns JSONB array where each element has: bufdir_code (TEXT), participant_count (INT, COUNT DISTINCT participant_id), total_hours (NUMERIC), activity_count (INT)
Only activities with activity_date BETWEEN p_period_start AND p_period_end are included
Only activities belonging to p_org_id are included (enforced by both the WHERE clause and active RLS context)
Proxy-registered activities (is_proxy_registered = true) are included in hour totals but participant_id from proxy activities is deduplicated against directly registered activities — a participant is counted at most once per bufdir_category per period
Function uses bufdir_category_mappings to translate internal_type_id to bufdir_code using p_mapping_version
Activity types with no mapping in bufdir_category_mappings for the given version are grouped under an 'unmapped' category with a warning flag in the result
Function executes within 10 seconds for an org with 50,000 activities in the period
Function is SECURITY DEFINER with a restricted search_path to prevent privilege escalation
Calling the function with a p_org_id the authenticated user does not belong to returns an empty result (RLS enforcement), not an error

Technical Requirements

frameworks
PostgreSQL (Supabase)
PL/pgSQL
apis
Supabase RPC endpoint (POST /rest/v1/rpc/generate_bufdir_report)
data models
activities
bufdir_category_mappings
participants (referenced via participant_id FK)
performance requirements
Query plan must use index scans on (org_id, activity_date) and (internal_type_id, mapping_version)
COUNT DISTINCT must not perform a full table scan — ensure participant_id is indexed
Execution time under 10 seconds for 50,000 activity rows in the reporting period
Use a CTE for the JOIN with bufdir_category_mappings to allow the planner to materialize the small lookup table
security requirements
SECURITY DEFINER with SET search_path = public to prevent search_path injection
GRANT EXECUTE to authenticated role only — not to anon
Function must validate p_org_id is a well-formed UUID and p_period_start ≤ p_period_end, raising an exception with a safe message on invalid input
No dynamic SQL (EXECUTE) to prevent SQL injection
Result must not include PII (participant names, contact details) — only counts and aggregates

Execution Context

Execution Tier
Tier 3

Tier 3 - 413 tasks

Can start after Tier 2 completes

Implementation Notes

Structure the function as a series of CTEs for readability: (1) `filtered_activities` — activities in org + date range, (2) `mapped_activities` — JOIN with bufdir_category_mappings on internal_type_id + mapping_version, (3) `aggregated` — GROUP BY bufdir_code with COUNT DISTINCT participant_id and SUM hours, (4) `final_result` — format as JSONB array. For deduplication of proxy vs direct: use a UNION approach where you first collect all (participant_id, bufdir_code) pairs regardless of is_proxy_registered, then COUNT DISTINCT — this naturally deduplicates. The is_proxy_registered flag is only relevant if you want to report separately on proxy vs direct hours (keep this as a future extension but implement COUNT DISTINCT naively first). Return JSONB with `json_agg(json_build_object(...))` for the result array.

Handle the NULL case (no activities) by returning an empty JSON array `'[]'::jsonb` not NULL.

Testing Requirements

SQL unit tests using pgTAP or manual test scripts: (1) insert known test activities and verify COUNT DISTINCT matches expected unique participant count, (2) verify proxy-registered activities do not inflate participant_count for participants who also have direct registrations, (3) verify activities outside the date range are excluded, (4) verify unmapped activity types appear in 'unmapped' category, (5) verify a user from org B cannot retrieve org A's data by passing org A's p_org_id. Performance test: load 50,000 rows for a test org and EXPLAIN ANALYZE the function to confirm index usage. Test with p_mapping_version that has no entries — function must return empty array, not error.

Component
Supabase Aggregation RPC Functions
infrastructure high
Epic Risks (3)
high impact medium prob security

Supabase RLS policies may not propagate correctly into RPC function execution context, causing org-scoping predicates to be silently ignored when the function is invoked with service_role key. This could lead to cross-org data exposure in production without any obvious error.

Mitigation & Contingency

Mitigation: Invoke all RPCs using the anon/authenticated key rather than service_role, write explicit WHERE org_id = auth.uid()::org_id predicates inside the RPC body as a secondary control, and include automated cross-org leakage tests in the CI pipeline from day one.

Contingency: If RLS bypass is discovered post-deployment, immediately revoke service_role usage in all aggregation paths and hotfix with explicit org_id parameters passed as function arguments validated server-side.

high impact medium prob dependency

Bufdir may update its official reporting category taxonomy between the mapping configuration being defined and the annual submission deadline. If the ActivityCategoryMappingConfig is compiled as a static Dart constant, it cannot be updated without an app release, potentially causing mapping failures that block submission.

Mitigation & Contingency

Mitigation: Store the mapping as a remote-configurable table (bufdir_category_mappings) in Supabase with a version field rather than as a hardcoded Dart constant. Fetch the current mapping at aggregation time so updates can be pushed without a new app release.

Contingency: If a mapping mismatch is detected during an active reporting cycle, coordinators can be temporarily directed to the manual Excel fallback while an emergency mapping update is pushed to the Supabase table.

high impact low prob technical

For large organisations like NHF with 1,400 local chapters and potentially tens of thousands of activity records per reporting period, the Supabase RPC aggregation query may exceed the default PostgREST statement timeout, causing the aggregation to fail with a 503 error.

Mitigation & Contingency

Mitigation: Add partial indexes on (organization_id, created_at) and (organization_id, activity_type_id) to the activities table before writing the RPC. Profile the query plan against a realistic fixture of 50,000 records during development and increase the statement_timeout setting for the RPC role if needed.

Contingency: Implement chunked aggregation fallback: split the period into monthly sub-ranges and aggregate each chunk client-side, merging results with UNION-style Dart logic before assembling the final payload.