Per-integration health check implementations

epic-external-system-integration-configuration-backend-infrastructure-task-013 — Implement concrete health check classes for each supported integration target: Xledger (authenticate + list accounts endpoint), Dynamics (OAuth token refresh + GET organisations), Cornerstone (ping auth endpoint), Consio (list endpoint), and Bufdir (schema validation endpoint). Each check uses the Vault credential access layer, performs a lightweight read-only probe, and measures round-trip latency. Checks must complete within a 10-second timeout.

high priority high complexity integration pending integration specialist Tier 3

Acceptance Criteria

XledgerHealthCheck.check() authenticates using stored API key from Vault, calls the list-accounts endpoint, and returns status='healthy' with measured latency_ms on HTTP 200; status='degraded' on 4xx; status='unreachable' on network error or timeout

DynamicsHealthCheck.check() performs an OAuth2 client-credentials token refresh using stored Azure AD credentials from Vault, then calls GET /organizations with the new token; token is NOT persisted — it is ephemeral within the check execution

CornerstoneHealthCheck.check() POSTs to the auth endpoint with stored credentials, validates a 200 response, and returns the result; no data is read beyond the auth response

ConsioHealthCheck.check() calls a lightweight list endpoint with stored API credentials and validates the response schema matches expectations; any schema mismatch returns status='degraded' with a descriptive error

BufdirHealthCheck.check() calls the schema validation endpoint, confirms the response matches the current bufdir_column_schema version stored in the database, and returns 'degraded' if the remote schema version differs

All five checks fetch credentials exclusively via the Vault credential access layer — no hardcoded secrets, no environment variables accessed directly in check classes

All checks respect the 10-second AbortController timeout passed by the framework

Latency measurement uses performance.now() bracketing the HTTP call only — Vault lookup time is excluded from latency_ms

Each check is independently testable by injecting a mock fetch function

If the Vault lookup fails (credential not configured), the check returns status='unreachable' with error='credentials not configured' — it does not throw

Technical Requirements

frameworks

Supabase Edge Functions (Deno)

apis

Xledger REST API

Microsoft Dynamics 365 REST API

Bufdir Reporting API

Supabase Edge Functions (Deno)

data models

bufdir_column_schema

performance requirements

Each check must complete within 10 seconds including Vault credential retrieval

HTTP probe requests must use read-only, minimal-payload endpoints (list/ping only — no write operations)

Checks should not cache credentials between invocations to avoid stale credential issues

security requirements

Xledger API key stored in Vault — never in mobile client or environment variable accessible to client

Dynamics Azure AD client secret stored server-side only; OAuth token obtained at check time and discarded immediately after

Bufdir organisation credentials in Vault; report data never transmitted during a health check — only schema metadata

All external HTTP calls over TLS only; certificate validation must not be disabled

Error messages returned to the caller must not include raw credential values or full stack traces

Execution Context

Execution Tier

Tier 3

Tier 3 - 413 tasks

Can start after Tier 2 completes

Integration Task

Handles integration between different epics or system components. Requires coordination across multiple development streams.

View Full Execution Plan

Implementation Notes

Structure each adapter as a class implementing the HealthCheck interface defined in task-012. Use a factory function getHealthCheckAdapter(integrationType: string): HealthCheck that maps integration type strings to adapter instances — this keeps the test runner agnostic of adapter implementations. For Dynamics, the OAuth token refresh endpoint is lightweight and suitable as a proxy for overall connectivity. For Bufdir, compare the remote schema version header or response field against the latest bufdir_column_schema.version in the database — a version mismatch indicates a breaking change that would cause actual syncs to fail, making it a valid 'degraded' signal.

Wrap every external fetch in a try/catch regardless of the AbortController timeout, as some network errors throw before the timeout fires. Log only integration_type, org_id, and status at INFO level — never log credential values or response bodies.

Testing Requirements

Unit tests for each of the 5 check classes using injected mock fetch: (1) successful response → 'healthy' with correct latency_ms; (2) HTTP 4xx → 'degraded'; (3) network error / fetch throws → 'unreachable'; (4) timeout (AbortController signal) → 'unreachable'; (5) Vault credential lookup failure → 'unreachable' with 'credentials not configured'. Integration tests against sandbox/staging environments for each external system where available. DynamicsHealthCheck: verify token is not written to any table after the check. BufdirHealthCheck: test with mismatched schema version returning 'degraded'.

Total: at least 25 unit test cases across all 5 adapters.

Component

Integration Health Monitor

service medium

Dependencies (1)

Build the core health check framework for the Integration Health Monitor. Define a HealthCheck interface with methods check(orgId, integrationConfig): HealthResult that each adapter-specific check must implement. Implement the test runner that invokes checks in parallel across all configured integrations for an org, collects results, and writes them to the integration_health_status table with timestamp, status (healthy/degraded/unreachable), and latency_ms. epic-external-system-integration-configuration-backend-infrastructure-task-012

Epic Risks (3)

medium impact medium prob technical

Supabase Edge Functions have cold start latency that can cause the first sync invocation after idle periods to fail or timeout when the external API has a short connection window, leading to missed scheduled syncs that go undetected.

Mitigation & Contingency

Mitigation: Configure Edge Function memory and implement a warm-up ping mechanism before heavy sync invocations. Set generous timeout values on the external API calls. Log all cold-start incidents for monitoring.

Contingency: If cold starts cause consistent sync failures, migrate the sync scheduler to a persistent Supabase cron job that pre-warms the function 30 seconds before the scheduled sync time.

high impact low prob technical

The sync scheduler must execute jobs at predictable times for financial reporting accuracy. Drift in cron execution timing (due to Supabase infrastructure delays) could cause syncs to run at wrong times, leading to missing data in accounting exports or duplicate exports across reporting periods.

Mitigation & Contingency

Mitigation: Implement idempotency keys based on integration ID + scheduled period, so re-runs of a delayed sync cannot create duplicate exports. Log actual execution timestamps vs scheduled timestamps and alert on drift exceeding 5 minutes.

Contingency: If scheduler reliability is insufficient, integrate with a dedicated cron service (e.g., pg_cron on Supabase) for millisecond-precise scheduling, replacing the application-level scheduler.

high impact medium prob integration

Aggressive health monitoring ping frequency could trigger rate limiting on external APIs (especially Xledger and Dynamics), causing legitimate export calls to fail after the monitor exhausts the API's request quota.

Mitigation & Contingency

Mitigation: Use lightweight health check endpoints (HEAD requests or vendor-specific ping/status endpoints) rather than data requests. Set health check frequency to once per 15 minutes minimum. Implement exponential backoff after consecutive failures.

Contingency: If rate limiting occurs, disable active health monitoring for the affected integration type and switch to passive health detection (mark unhealthy only when a scheduled sync fails).

Quick Links

All Tasks Execution Plan