Structured Error Logging and Missed Auto-Pause Monitoring
epic-certification-management-automation-task-005 — Add structured JSON error logging for all failure paths in the cron function, including service call failures, database errors, and unexpected exceptions. Implement a monitoring query that scans for certifications that expired but were not auto-paused, and log a warning report so the operations team can detect and remediate missed transitions.
Acceptance Criteria
Technical Requirements
Execution Context
Tier 4 - 323 tasks
Can start after Tier 3 completes
Implementation Notes
Introduce a structured logger helper (e.g. `logEvent(severity, code, message, context)`) at the top of the cron module that serialises to JSON and writes to stdout — Supabase Edge Functions stream stdout to the platform log aggregator. Use a UUID v4 generated once per cron invocation as `cron_run_id` so all log lines from a single run can be correlated. For the monitoring query, use a single Supabase RPC call (e.g.
`rpc('get_missed_auto_pauses')`) rather than a raw query from the Edge Function — this keeps the SQL in a versioned migration file and avoids string interpolation. Insert the audit row at the very end of the cron, after all processing, using an upsert keyed on `run_id` so a partial re-run does not create duplicates. Keep the `resolved` column for the ops team to manually acknowledge remediations without deleting rows. Do not throw from the monitoring section — wrap in try/catch and log a WARN if the monitoring query itself fails, so a broken audit query does not mask the primary cron outcome.
Testing Requirements
Write unit tests using Deno's built-in test runner (or Jest if the Edge Function project uses it). Test 1: mock a Supabase client that throws a PostgREST error on the service call; assert the emitted log object matches the expected JSON schema (all required fields present, severity=ERROR, error_code populated). Test 2: seed an in-memory dataset with two expired-but-not-paused certifications; run the monitoring query function; assert the returned object contains missed_pause_count=2 and the correct IDs. Test 3: simulate a clean run with no missed transitions; assert no WARN log is emitted and cron_audit_log row has missed_pause_count=0.
Test 4: inject an unexpected runtime exception; assert top-level catch produces an ERROR log with stack_trace field present. Achieve 100% branch coverage on the logging and monitoring modules.
Supabase Edge Functions can have cold-start latency that causes the nightly cron to time out when processing large cohorts of expiring certifications, resulting in partial reminder dispatches.
Mitigation & Contingency
Mitigation: Batch the cron processing in chunks of 50 mentors per iteration. Use pagination with a cursor to resume processing if the function is re-invoked. Keep total invocation time well under the Edge Function timeout limit.
Contingency: If timeouts occur in production, split the cron into two separate functions: one for reminders and one for auto-pauses, each with its own schedule offset to reduce peak load.
Certification BLoC covers three distinct workflows (view, renew, enrol) which may lead to an overly complex state machine that is hard to test and maintain, particularly when error states from multiple concurrent operations need to be differentiated in the UI.
Mitigation & Contingency
Mitigation: Use separate sealed state classes per workflow (CertificationViewState, RenewalState, EnrolmentState) composed into a single BLoC state wrapper. Follow the existing BLoC patterns established in the codebase for consistency.
Contingency: If the BLoC grows too complex, split into two BLoCs: CertificationBLoC (view/load) and CertificationActionBLoC (mutations), connected via a shared stream.