high priority low complexity backend pending backend specialist Tier 4

Acceptance Criteria

All error paths in the cron function emit structured JSON log entries with fields: timestamp (ISO 8601), severity (ERROR|WARN|INFO), cron_run_id (UUID), error_code, message, context object (affected certification_id, peer_mentor_id if applicable), and stack_trace for unexpected exceptions
Service call failures (Supabase RPC errors, network timeouts) are caught and logged with the error response body and HTTP status code included in the context field
Database errors include the failed SQL operation name and relevant record identifiers in the log context
A monitoring query runs at the end of each cron execution that selects all certifications where expiry_date < now() AND status != 'paused' AND idempotency record is absent, returning count and IDs
If the monitoring query returns one or more missed transitions, a WARN-level log entry is written with the list of affected certification IDs and a remediation instruction string
The monitoring report is written to a dedicated Supabase table (cron_audit_log) with columns: run_id, run_at, missed_pause_count, missed_ids (jsonb), resolved (boolean default false)
Unexpected exceptions (non-Supabase errors) are caught by a top-level try/catch, logged at ERROR severity with full stack trace, and do not crash the cron process silently
Log output is valid JSON on every line — no mixed plain-text lines that would break log aggregation parsers
Unit test verifies that a simulated service failure produces a log entry matching the expected JSON schema

Technical Requirements

frameworks
Supabase Edge Functions (Deno/TypeScript)
Supabase CLI
apis
Supabase PostgREST RPC
Supabase cron_audit_log table insert
data models
certifications
certification_idempotency_log
cron_audit_log
performance requirements
Monitoring query must complete within 2 seconds on datasets up to 10,000 certifications — add index on (expiry_date, status) if not present
Logging must be non-blocking — errors in the logging layer itself must never prevent the cron from completing its primary work
Total overhead of error logging and monitoring query must not exceed 500ms per cron run
security requirements
Log entries must never include full PII (names, national identity numbers) — only system IDs (certification_id, peer_mentor_id UUIDs)
cron_audit_log table must have Row Level Security enabled, readable only by service_role and ops-admin role
Stack traces written to logs must be sanitised to remove any embedded secret values or connection strings

Execution Context

Execution Tier
Tier 4

Tier 4 - 323 tasks

Can start after Tier 3 completes

Implementation Notes

Introduce a structured logger helper (e.g. `logEvent(severity, code, message, context)`) at the top of the cron module that serialises to JSON and writes to stdout — Supabase Edge Functions stream stdout to the platform log aggregator. Use a UUID v4 generated once per cron invocation as `cron_run_id` so all log lines from a single run can be correlated. For the monitoring query, use a single Supabase RPC call (e.g.

`rpc('get_missed_auto_pauses')`) rather than a raw query from the Edge Function — this keeps the SQL in a versioned migration file and avoids string interpolation. Insert the audit row at the very end of the cron, after all processing, using an upsert keyed on `run_id` so a partial re-run does not create duplicates. Keep the `resolved` column for the ops team to manually acknowledge remediations without deleting rows. Do not throw from the monitoring section — wrap in try/catch and log a WARN if the monitoring query itself fails, so a broken audit query does not mask the primary cron outcome.

Testing Requirements

Write unit tests using Deno's built-in test runner (or Jest if the Edge Function project uses it). Test 1: mock a Supabase client that throws a PostgREST error on the service call; assert the emitted log object matches the expected JSON schema (all required fields present, severity=ERROR, error_code populated). Test 2: seed an in-memory dataset with two expired-but-not-paused certifications; run the monitoring query function; assert the returned object contains missed_pause_count=2 and the correct IDs. Test 3: simulate a clean run with no missed transitions; assert no WARN log is emitted and cron_audit_log row has missed_pause_count=0.

Test 4: inject an unexpected runtime exception; assert top-level catch produces an ERROR log with stack_trace field present. Achieve 100% branch coverage on the logging and monitoring modules.

Component
Certification Expiry Nightly Cron Job
infrastructure medium
Epic Risks (2)
medium impact low prob technical

Supabase Edge Functions can have cold-start latency that causes the nightly cron to time out when processing large cohorts of expiring certifications, resulting in partial reminder dispatches.

Mitigation & Contingency

Mitigation: Batch the cron processing in chunks of 50 mentors per iteration. Use pagination with a cursor to resume processing if the function is re-invoked. Keep total invocation time well under the Edge Function timeout limit.

Contingency: If timeouts occur in production, split the cron into two separate functions: one for reminders and one for auto-pauses, each with its own schedule offset to reduce peak load.

low impact medium prob technical

Certification BLoC covers three distinct workflows (view, renew, enrol) which may lead to an overly complex state machine that is hard to test and maintain, particularly when error states from multiple concurrent operations need to be differentiated in the UI.

Mitigation & Contingency

Mitigation: Use separate sealed state classes per workflow (CertificationViewState, RenewalState, EnrolmentState) composed into a single BLoC state wrapper. Follow the existing BLoC patterns established in the codebase for consistency.

Contingency: If the BLoC grows too complex, split into two BLoCs: CertificationBLoC (view/load) and CertificationActionBLoC (mutations), connected via a shared stream.