Document orchestration pipeline and ops runbook
epic-certificate-expiry-notifications-orchestration-services-task-018 — Write developer and operations documentation covering the full certificate expiry notification pipeline: architecture diagram showing the edge function, orchestrator, visibility suppressor, acknowledgement service, and enrollment prompt service; configuration reference for cron schedule, threshold constants, and FCM credentials; ops runbook for diagnosing failed runs, re-triggering missed executions, and manually reactivating suppressed mentors.
Acceptance Criteria
Technical Requirements
Execution Context
Tier 6 - 158 tasks
Can start after Tier 5 completes
Implementation Notes
Use Mermaid flowchart syntax for the architecture diagram — it renders natively in GitHub Markdown and requires no external tooling. Structure the document in three sections: (1) Architecture Overview (diagram + component responsibilities), (2) Configuration Reference (table format: variable name, description, example value, where set), (3) Ops Runbook (numbered steps with commands). For the runbook, use admonition blocks (> **Warning:**) to highlight destructive operations. The mentor reactivation SQL snippet should use a transaction with a rollback point so operators can preview the affected rows before committing.
Cross-reference the integration test file path so developers can see working examples of each pipeline stage alongside the documentation.
Testing Requirements
Documentation testing is manual peer review. Assign a developer unfamiliar with the pipeline to follow the ops runbook from scratch against the local Supabase stack. They must successfully: (1) identify a simulated failed run from injected log output, (2) re-invoke the edge function using the provided curl command, (3) reactivate a suppressed mentor using the provided SQL snippet. Any step that requires clarification or produces an error is a defect that must be fixed before the document is accepted.
If the daily edge function runs more than once in a 24-hour window due to a Supabase scheduling anomaly or manual re-trigger, the orchestrator could dispatch duplicate push notifications to the same mentor and coordinator for the same threshold, eroding user trust.
Mitigation & Contingency
Mitigation: Implement idempotency at the notification record level using a unique constraint on (mentor_id, threshold_days, certification_id). The orchestrator checks for an existing record before dispatching. Use a database-level upsert with ON CONFLICT DO NOTHING.
Contingency: If duplicate notifications are reported in production, add a rate-limiting guard in the edge function that aborts if a notification for the same mentor and threshold was created within the last 20 hours, and add an alerting rule to Supabase logs for duplicate dispatch attempts.
The mentor visibility suppressor relies on the daily edge function to detect expiry and update suppression_status. A mentor whose certificate expires at midnight may remain visible for up to 24 hours if the cron runs at a fixed time, violating HLF's requirement that expired mentors disappear promptly.
Mitigation & Contingency
Mitigation: Schedule the edge function to run at 00:05 UTC to minimise lag after midnight transitions. Additionally, the RLS policy can include a direct date comparison (certification_expiry_date < now()) as a secondary predicate that does not rely on suppression_status, providing real-time enforcement at the database level.
Contingency: If the cron lag is unacceptable after launch, implement a Supabase database trigger on the certifications table that fires on UPDATE of expiry_date and calls the suppressor immediately, reducing lag to near-zero for renewal and expiry events.
The orchestrator needs to resolve the coordinator assigned to a specific peer mentor to dispatch coordinator-side notifications. If the assignment relationship is not normalised or is missing for some mentors, coordinator notifications will silently fail.
Mitigation & Contingency
Mitigation: Query the coordinator assignment from the existing assignments or user_roles table before dispatch. Log a structured warning (missing_coordinator_assignment: mentor_id) when no coordinator is found. Add a data quality check in the edge function that reports mentors without coordinators.
Contingency: If coordinator assignments are missing at scale, fall back to notifying the chapter-level admin role for the mentor's chapter, and surface a data quality report to the admin dashboard showing mentors without assigned coordinators.
The course enrollment prompt service generates deep-link URLs targeting the course administration feature. If the course administration feature changes its deep-link schema or the Dynamics portal URL structure changes, enrollment prompts will navigate to broken destinations.
Mitigation & Contingency
Mitigation: Define the deep-link contract between the certificate expiry feature and the course administration feature as a shared constant in a cross-feature navigation config. Version the deep-link schema and validate the generated URL format in unit tests.
Contingency: If the deep-link breaks in production, the course enrollment prompt service should gracefully fall back to opening the course administration feature root screen with a query parameter indicating the notification context, allowing the user to manually locate the correct course.