Implement edge function error handling and execution logging
epic-certificate-expiry-notifications-orchestration-services-task-016 — Add structured error handling to the expiry check edge function: catch downstream service failures, log execution summaries (mentor counts per tier, notification counts dispatched, suppressions applied, errors encountered) to a persistent execution log table, and emit an alert if the function exits with a non-zero error count. Ensure partial failures do not abort the entire run.
Acceptance Criteria
Technical Requirements
Execution Context
Tier 4 - 323 tasks
Can start after Tier 3 completes
Implementation Notes
Use the try/catch/finally pattern: initialise an `executionContext` object at function start, populate it as stages complete, and write the log row in a `finally` block to guarantee the INSERT always runs regardless of errors. Use `performance.now()` (available in Deno) to measure `duration_ms`. Keep the error accumulator as an array of typed objects `Array<{ tier: string; mentorId: string; code: string; message: string }>` — the `message` field must be sanitised to strip any PII before recording. For the alert mechanism, a `console.error(JSON.stringify({ alert: 'expiry-check-errors', errorCount, executionId }))` line is sufficient initially — Supabase log alerting can be configured in the dashboard to trigger on `error` level log lines from this function.
The 90-day retention purge can be implemented as a second `cron.schedule` entry in the same migration file that runs monthly.
Testing Requirements
Unit and integration tests using Deno test runner with mocked Supabase client. Required scenarios: (1) fully successful run produces a log row with `error_count = 0` and accurate tier counts; (2) one downstream failure increments `error_count` to 1 and appends a structured error object to `errors` array without aborting other tier processing; (3) complete tier failure (all mentors in a tier fail) is recorded in the log but does not prevent other tiers from processing; (4) log write failure (Supabase INSERT rejected) is caught and `console.error`-logged without crashing the function (best-effort logging); (5) `duration_ms` field is a positive integer. Additionally, write a manual monitoring test: after a staging cron run, query `expiry_check_execution_log` and confirm the row was inserted with accurate counts.
If the daily edge function runs more than once in a 24-hour window due to a Supabase scheduling anomaly or manual re-trigger, the orchestrator could dispatch duplicate push notifications to the same mentor and coordinator for the same threshold, eroding user trust.
Mitigation & Contingency
Mitigation: Implement idempotency at the notification record level using a unique constraint on (mentor_id, threshold_days, certification_id). The orchestrator checks for an existing record before dispatching. Use a database-level upsert with ON CONFLICT DO NOTHING.
Contingency: If duplicate notifications are reported in production, add a rate-limiting guard in the edge function that aborts if a notification for the same mentor and threshold was created within the last 20 hours, and add an alerting rule to Supabase logs for duplicate dispatch attempts.
The mentor visibility suppressor relies on the daily edge function to detect expiry and update suppression_status. A mentor whose certificate expires at midnight may remain visible for up to 24 hours if the cron runs at a fixed time, violating HLF's requirement that expired mentors disappear promptly.
Mitigation & Contingency
Mitigation: Schedule the edge function to run at 00:05 UTC to minimise lag after midnight transitions. Additionally, the RLS policy can include a direct date comparison (certification_expiry_date < now()) as a secondary predicate that does not rely on suppression_status, providing real-time enforcement at the database level.
Contingency: If the cron lag is unacceptable after launch, implement a Supabase database trigger on the certifications table that fires on UPDATE of expiry_date and calls the suppressor immediately, reducing lag to near-zero for renewal and expiry events.
The orchestrator needs to resolve the coordinator assigned to a specific peer mentor to dispatch coordinator-side notifications. If the assignment relationship is not normalised or is missing for some mentors, coordinator notifications will silently fail.
Mitigation & Contingency
Mitigation: Query the coordinator assignment from the existing assignments or user_roles table before dispatch. Log a structured warning (missing_coordinator_assignment: mentor_id) when no coordinator is found. Add a data quality check in the edge function that reports mentors without coordinators.
Contingency: If coordinator assignments are missing at scale, fall back to notifying the chapter-level admin role for the mentor's chapter, and surface a data quality report to the admin dashboard showing mentors without assigned coordinators.
The course enrollment prompt service generates deep-link URLs targeting the course administration feature. If the course administration feature changes its deep-link schema or the Dynamics portal URL structure changes, enrollment prompts will navigate to broken destinations.
Mitigation & Contingency
Mitigation: Define the deep-link contract between the certificate expiry feature and the course administration feature as a shared constant in a cross-feature navigation config. Version the deep-link schema and validate the generated URL format in unit tests.
Contingency: If the deep-link breaks in production, the course enrollment prompt service should gracefully fall back to opening the course administration feature root screen with a query parameter indicating the notification context, allowing the user to manually locate the correct course.