Health status aggregation and persistence

epic-external-system-integration-configuration-backend-infrastructure-task-014 — Implement the integration_health_status table and repository for storing and querying health check results. The table holds the latest status per (org_id, integration_type) pair plus a rolling 24-hour history for trend display. Implement queries that return current health status for all integrations of an org (used by the admin dashboard) and aggregate health across all orgs (used by global admins). Include RLS policies ensuring org isolation.

high priority medium complexity database pending database specialist Tier 3

Acceptance Criteria

integration_health_status table has a unique constraint on (org_id, integration_type) and serves as the single current-status record per pair

integration_health_history table stores historical records with columns: id, org_id, integration_type, status, latency_ms, checked_at; indexed on (org_id, integration_type, checked_at DESC)

A Postgres trigger on integration_health_status AFTER UPDATE inserts a copy of the new row into integration_health_history automatically

getOrgHealthStatus(orgId) RPC returns all current status rows for the org, joined with integration display names

getGlobalHealthSummary() RPC (service-role only) returns count of healthy/degraded/unreachable integrations grouped by integration_type across all orgs

getHealthTrend(orgId, integrationType, hours) RPC returns up to (hours × 4) history rows (at 15-min polling intervals) ordered by checked_at DESC

RLS on integration_health_status: org admin can SELECT own org rows; global admin role can SELECT all rows

RLS on integration_health_history mirrors the same policies as integration_health_status

History records older than 30 days are deleted by a scheduled pg_cron cleanup job

An index on integration_health_history(org_id, integration_type, checked_at DESC) supports trend queries within 100ms for 96 data points

Technical Requirements

frameworks

Supabase PostgreSQL 15

Supabase Edge Functions (Deno)

apis

Supabase REST API

Supabase Edge Functions (Deno)

data models

bufdir_export_audit_log

performance requirements

getOrgHealthStatus must return in under 150ms for orgs with up to 10 integrations

getHealthTrend for 24 hours (96 rows) must return in under 100ms

History cleanup pg_cron job must not lock the table during normal operations — use DELETE ... WHERE checked_at < now() - interval '30 days' LIMIT 1000 in a loop

security requirements

Global admin RPC getGlobalHealthSummary must validate caller has global_admin role in JWT claims before executing

RLS policies prevent org-A admin from reading org-B health data

INSERT and UPDATE on integration_health_status restricted to service role — no direct mobile client writes

History table inherits the same RLS restrictions as the current-status table

Execution Context

Execution Tier

Tier 3

Tier 3 - 413 tasks

Can start after Tier 2 completes

Integration Task

Handles integration between different epics or system components. Requires coordination across multiple development streams.

View Full Execution Plan

Implementation Notes

The trigger-based history copy is the most reliable approach — it guarantees history is always written even if the application code path crashes after the upsert. Implement the trigger as a BEFORE INSERT OR UPDATE trigger that also prunes rows beyond the 96-row cap for the same (org_id, integration_type) pair to bound table growth. For the global admin summary, implement as a Postgres RPC function with SECURITY DEFINER and explicit role check on current_setting('request.jwt.claims') rather than relying on RLS alone, since RLS for SELECT ALL requires a bypass policy. Store the display name for each integration_type in a separate integration_types lookup table (seeded at migration time) so the getOrgHealthStatus join always has a human-readable label even for integrations with no current status row.

Testing Requirements

Unit tests with mocked Supabase client: (1) getOrgHealthStatus returns correct rows for the given orgId; (2) getHealthTrend respects the hours parameter and returns rows ordered DESC; (3) getGlobalHealthSummary rejects callers without global_admin role. Integration tests against local Supabase: (1) trigger correctly copies updated row to history table; (2) RLS blocks cross-org SELECT; (3) pg_cron cleanup deletes rows older than 30 days without deleting recent rows; (4) unique constraint on (org_id, integration_type) prevents duplicate current-status rows. Test with 97 history rows — confirm only 96 returned for 24-hour trend.

Component

Integration Health Monitor

service medium

Dependencies (1)

Build the core health check framework for the Integration Health Monitor. Define a HealthCheck interface with methods check(orgId, integrationConfig): HealthResult that each adapter-specific check must implement. Implement the test runner that invokes checks in parallel across all configured integrations for an org, collects results, and writes them to the integration_health_status table with timestamp, status (healthy/degraded/unreachable), and latency_ms. epic-external-system-integration-configuration-backend-infrastructure-task-012

Epic Risks (3)

medium impact medium prob technical

Supabase Edge Functions have cold start latency that can cause the first sync invocation after idle periods to fail or timeout when the external API has a short connection window, leading to missed scheduled syncs that go undetected.

Mitigation & Contingency

Mitigation: Configure Edge Function memory and implement a warm-up ping mechanism before heavy sync invocations. Set generous timeout values on the external API calls. Log all cold-start incidents for monitoring.

Contingency: If cold starts cause consistent sync failures, migrate the sync scheduler to a persistent Supabase cron job that pre-warms the function 30 seconds before the scheduled sync time.

high impact low prob technical

The sync scheduler must execute jobs at predictable times for financial reporting accuracy. Drift in cron execution timing (due to Supabase infrastructure delays) could cause syncs to run at wrong times, leading to missing data in accounting exports or duplicate exports across reporting periods.

Mitigation & Contingency

Mitigation: Implement idempotency keys based on integration ID + scheduled period, so re-runs of a delayed sync cannot create duplicate exports. Log actual execution timestamps vs scheduled timestamps and alert on drift exceeding 5 minutes.

Contingency: If scheduler reliability is insufficient, integrate with a dedicated cron service (e.g., pg_cron on Supabase) for millisecond-precise scheduling, replacing the application-level scheduler.

high impact medium prob integration

Aggressive health monitoring ping frequency could trigger rate limiting on external APIs (especially Xledger and Dynamics), causing legitimate export calls to fail after the monitor exhausts the API's request quota.

Mitigation & Contingency

Mitigation: Use lightweight health check endpoints (HEAD requests or vendor-specific ping/status endpoints) rather than data requests. Set health check frequency to once per 15 minutes minimum. Implement exponential backoff after consecutive failures.

Contingency: If rate limiting occurs, disable active health monitoring for the affected integration type and switch to passive health detection (mark unhealthy only when a scheduled sync fails).

Quick Links

All Tasks Execution Plan