Add structured logging and error handling to scheduler
epic-scenario-push-engagement-core-engine-task-013 — Instrument the Scenario Edge Function Scheduler with structured JSON logging: log each scheduler run with total active mentors iterated, total evaluations dispatched, total suppressed, and any per-mentor errors encountered. Ensure individual mentor evaluation failures are caught and logged without aborting the entire batch run.
Acceptance Criteria
Technical Requirements
Execution Context
Tier 9 - 22 tasks
Can start after Tier 8 completes
Implementation Notes
Import and instantiate `StructuredLogger` from the shared logger module built in task-010, passing `{ component: 'scenario-scheduler', invocation_id: crypto.randomUUID() }` as context at function startup — this context is merged into every log entry automatically. Maintain a `BatchStats` object `{ totalActive: 0, evaluated: 0, dispatched: 0, suppressed: 0, errors: 0, pagesProcessed: 0 }` updated synchronously after each mentor evaluation result. Wrap the per-mentor trigger engine call in `try/catch` and classify errors by checking `error instanceof` patterns or inspecting HTTP response status codes from the trigger engine. The `error_type` classification should be an exhaustive string union type in TypeScript to ensure all cases are handled.
Log the 'scheduler_run_completed' entry inside a `finally` block so it always fires even if the timeout guard triggers an early return — use a flag to distinguish normal completion from partial completion.
Testing Requirements
Unit tests for scheduler logging: (1) scheduler_run_started emitted at invocation with correct fields, (2) scheduler_run_completed emitted at end with accurate counts after processing mock mentor list, (3) per-mentor error caught, logged as 'mentor_evaluation_error', and batch continues to next mentor, (4) timeout guard emits 'scheduler_run_partial' with correct remaining count, (5) top-level failure emits 'scheduler_run_failed'. Verify all log entries are valid NDJSON by parsing captured stdout in tests. Verify 'component' field present in all entries. Verify no PII appears in any log output.
Reuse the StructuredLogger unit tests from task-010 to confirm logger is being reused, not reimplemented.
The scenario-edge-function-scheduler must evaluate all active peer mentors within the 30-second Supabase Edge Function timeout. For large organisations, a sequential evaluation loop may exceed this limit, causing partial runs and missed notifications.
Mitigation & Contingency
Mitigation: Design the trigger engine to batch mentor evaluations using database-side SQL queries (bulk inactivity check via a single query rather than per-mentor calls), and add a performance test against 500 mentors during development. Document the evaluated mentor count per scenario type in scenario-evaluation-config to allow selective scenario execution per run.
Contingency: If single-run execution is insufficient, split evaluation into per-scenario-type scheduled functions (inactivity check, milestone check, expiry check) on separate cron schedules, dividing the computational load across multiple invocations.
A race condition between concurrent scheduler invocations or retried cron triggers could cause the same scenario notification to be dispatched multiple times to a mentor, severely degrading trust in the feature.
Mitigation & Contingency
Mitigation: Implement cooldown enforcement using a database-level upsert with a unique constraint on (user_id, scenario_type, cooldown_window_start) so that a second invocation within the same window is rejected at the persistence layer rather than the application layer.
Contingency: Add an idempotency key derived from (user_id, scenario_type, evaluation_date) to the notification record insert; if a duplicate key violation is caught, log it as a warning and skip dispatch without error.
The trigger engine queries peer mentor activity history across potentially multiple organisations and chapters. RLS policies configured for app-user roles may block the Edge Function's service-role queries, or query performance may degrade on large activity tables.
Mitigation & Contingency
Mitigation: Confirm the Edge Function runs with the Supabase service role key (bypassing RLS) and add composite indexes on (user_id, activity_date) to the activity tables before implementing the inactivity detection query.
Contingency: If service-role access is restricted by organisational policy, implement a dedicated database function (SECURITY DEFINER) that performs the inactivity aggregation and is callable by the Edge Function with limited scope.