Sync Scheduler concurrency guard and dead letter queue

epic-external-system-integration-configuration-backend-infrastructure-task-011 — Implement concurrency protection to prevent overlapping sync runs for the same integration. Use a database-level advisory lock or unique constraint on (org_id, integration_type, status='running') to ensure at-most-one active run. For permanently failed jobs, insert records into a dead_letter_queue table and emit a Postgres notification so the health monitor can surface the failure in the admin dashboard.

high priority medium complexity backend pending backend specialist Tier 4

Acceptance Criteria

A unique partial index exists on sync_run_log(org_id, integration_type) WHERE status = 'running', preventing two concurrent rows with status='running' for the same (org, integration) pair

Attempting to start a sync run when one is already running returns a structured error { code: 'SYNC_ALREADY_RUNNING', message: '...' } without inserting a duplicate row

dead_letter_queue table exists with columns: id (uuid PK), org_id (uuid NOT NULL), integration_type (text NOT NULL), failed_run_id (uuid REFERENCES sync_run_log), failure_reason (text), created_at (timestamptz DEFAULT now()), acknowledged_at (timestamptz), acknowledged_by (uuid)

When a sync run fails after exhausting its retry policy, a row is inserted into dead_letter_queue referencing the failed run log row

A NOTIFY 'dead_letter_inserted' event is emitted after each dead_letter_queue insert, carrying the payload { orgId, integrationType }

RLS on dead_letter_queue allows org admins to SELECT their org's rows; INSERT is service-role only

A function acknowledgeDLQEntry(dlqId, actorUserId) marks the entry as acknowledged and records who cleared it

If a dead letter entry already exists for the same (org_id, integration_type) and is unacknowledged, a duplicate is not inserted — the existing entry is updated with the latest failure reason and a new NOTIFY is emitted

Integration test confirms that two simultaneous Edge Function invocations for the same org/integration result in exactly one active run log row

Technical Requirements

frameworks

Supabase PostgreSQL 15

Supabase Edge Functions (Deno)

Supabase Realtime

apis

Supabase Edge Functions (Deno)

Supabase Realtime (LISTEN/NOTIFY)

data models

bufdir_export_audit_log

performance requirements

Concurrency check (index lookup) must add under 5ms overhead to sync job startup

Dead letter queue insert must complete within 100ms including the NOTIFY emit

security requirements

Dead letter queue INSERT restricted to service role only — mobile clients cannot directly write failure records

NOTIFY payload must contain only non-sensitive identifiers (org_id, integration_type) — no credentials or PII

acknowledgeDLQEntry validates the actor has org-admin role before updating the row

RLS on dead_letter_queue prevents cross-org reads

Execution Context

Execution Tier

Tier 4

Tier 4 - 323 tasks

Can start after Tier 3 completes

Integration Task

Handles integration between different epics or system components. Requires coordination across multiple development streams.

View Full Execution Plan

Implementation Notes

Prefer a unique partial index over pg_advisory_lock for the concurrency guard — advisory locks require careful release logic and can leak if the Edge Function crashes. The partial index approach is simpler and enforced at the DB level without application-side lock management. For the dead letter upsert, use INSERT ... ON CONFLICT (org_id, integration_type) WHERE acknowledged_at IS NULL DO UPDATE SET failure_reason = EXCLUDED.failure_reason, failed_run_id = EXCLUDED.failed_run_id to implement the deduplication requirement.

The NOTIFY payload is picked up by the health monitor via pg_listen; ensure the health monitor subscribes to the 'dead_letter_inserted' channel on startup. Keep retry policy configuration (max retries, backoff) in the integration_config table added in task-007 so it is per-integration tunable.

Testing Requirements

Integration tests using a local Supabase instance: (1) concurrent Edge Function calls for same org/integration — assert only one run log row has status='running'; (2) dead letter insert triggers NOTIFY with correct payload; (3) duplicate dead letter entries for same (org, integration) are upserted not duplicated; (4) acknowledgeDLQEntry rejects callers without org-admin role; (5) RLS test confirms org-B admin cannot read org-A dead letter entries. Unit tests cover the concurrency guard helper function with mocked DB responses. Test the retry exhaustion path by simulating max retries reached.

Component

Sync Scheduler

service medium

Dependencies (2)

Design and implement the sync_run_log database table and associated repository layer for the Sync Scheduler. The table tracks job start time, end time, status, records processed, records failed, error message, and triggered_by (scheduler vs manual). Implement write operations for status transitions and read operations for the admin dashboard to display last run results per integration. epic-external-system-integration-configuration-backend-infrastructure-task-010 Build the core job trigger engine that executes on a pg_cron or Supabase scheduled Edge Function invocation. The engine reads the due integrations from the cron config reader, creates export job records in the sync_run_log table with status 'pending', invokes the Integration Edge Function for each due integration, and updates job status to 'running', 'completed', or 'failed' based on the response. epic-external-system-integration-configuration-backend-infrastructure-task-008

Epic Risks (3)

medium impact medium prob technical

Supabase Edge Functions have cold start latency that can cause the first sync invocation after idle periods to fail or timeout when the external API has a short connection window, leading to missed scheduled syncs that go undetected.

Mitigation & Contingency

Mitigation: Configure Edge Function memory and implement a warm-up ping mechanism before heavy sync invocations. Set generous timeout values on the external API calls. Log all cold-start incidents for monitoring.

Contingency: If cold starts cause consistent sync failures, migrate the sync scheduler to a persistent Supabase cron job that pre-warms the function 30 seconds before the scheduled sync time.

high impact low prob technical

The sync scheduler must execute jobs at predictable times for financial reporting accuracy. Drift in cron execution timing (due to Supabase infrastructure delays) could cause syncs to run at wrong times, leading to missing data in accounting exports or duplicate exports across reporting periods.

Mitigation & Contingency

Mitigation: Implement idempotency keys based on integration ID + scheduled period, so re-runs of a delayed sync cannot create duplicate exports. Log actual execution timestamps vs scheduled timestamps and alert on drift exceeding 5 minutes.

Contingency: If scheduler reliability is insufficient, integrate with a dedicated cron service (e.g., pg_cron on Supabase) for millisecond-precise scheduling, replacing the application-level scheduler.

high impact medium prob integration

Aggressive health monitoring ping frequency could trigger rate limiting on external APIs (especially Xledger and Dynamics), causing legitimate export calls to fail after the monitor exhausts the API's request quota.

Mitigation & Contingency

Mitigation: Use lightweight health check endpoints (HEAD requests or vendor-specific ping/status endpoints) rather than data requests. Set health check frequency to once per 15 minutes minimum. Implement exponential backoff after consecutive failures.

Contingency: If rate limiting occurs, disable active health monitoring for the affected integration type and switch to passive health detection (mark unhealthy only when a scheduled sync fails).

Quick Links

All Tasks Execution Plan