Per-integration health check implementations
epic-external-system-integration-configuration-backend-infrastructure-task-013 — Implement concrete health check classes for each supported integration target: Xledger (authenticate + list accounts endpoint), Dynamics (OAuth token refresh + GET organisations), Cornerstone (ping auth endpoint), Consio (list endpoint), and Bufdir (schema validation endpoint). Each check uses the Vault credential access layer, performs a lightweight read-only probe, and measures round-trip latency. Checks must complete within a 10-second timeout.
Acceptance Criteria
Technical Requirements
Execution Context
Tier 3 - 413 tasks
Can start after Tier 2 completes
Handles integration between different epics or system components. Requires coordination across multiple development streams.
Implementation Notes
Structure each adapter as a class implementing the HealthCheck interface defined in task-012. Use a factory function getHealthCheckAdapter(integrationType: string): HealthCheck that maps integration type strings to adapter instances — this keeps the test runner agnostic of adapter implementations. For Dynamics, the OAuth token refresh endpoint is lightweight and suitable as a proxy for overall connectivity. For Bufdir, compare the remote schema version header or response field against the latest bufdir_column_schema.version in the database — a version mismatch indicates a breaking change that would cause actual syncs to fail, making it a valid 'degraded' signal.
Wrap every external fetch in a try/catch regardless of the AbortController timeout, as some network errors throw before the timeout fires. Log only integration_type, org_id, and status at INFO level — never log credential values or response bodies.
Testing Requirements
Unit tests for each of the 5 check classes using injected mock fetch: (1) successful response → 'healthy' with correct latency_ms; (2) HTTP 4xx → 'degraded'; (3) network error / fetch throws → 'unreachable'; (4) timeout (AbortController signal) → 'unreachable'; (5) Vault credential lookup failure → 'unreachable' with 'credentials not configured'. Integration tests against sandbox/staging environments for each external system where available. DynamicsHealthCheck: verify token is not written to any table after the check. BufdirHealthCheck: test with mismatched schema version returning 'degraded'.
Total: at least 25 unit test cases across all 5 adapters.
Supabase Edge Functions have cold start latency that can cause the first sync invocation after idle periods to fail or timeout when the external API has a short connection window, leading to missed scheduled syncs that go undetected.
Mitigation & Contingency
Mitigation: Configure Edge Function memory and implement a warm-up ping mechanism before heavy sync invocations. Set generous timeout values on the external API calls. Log all cold-start incidents for monitoring.
Contingency: If cold starts cause consistent sync failures, migrate the sync scheduler to a persistent Supabase cron job that pre-warms the function 30 seconds before the scheduled sync time.
The sync scheduler must execute jobs at predictable times for financial reporting accuracy. Drift in cron execution timing (due to Supabase infrastructure delays) could cause syncs to run at wrong times, leading to missing data in accounting exports or duplicate exports across reporting periods.
Mitigation & Contingency
Mitigation: Implement idempotency keys based on integration ID + scheduled period, so re-runs of a delayed sync cannot create duplicate exports. Log actual execution timestamps vs scheduled timestamps and alert on drift exceeding 5 minutes.
Contingency: If scheduler reliability is insufficient, integrate with a dedicated cron service (e.g., pg_cron on Supabase) for millisecond-precise scheduling, replacing the application-level scheduler.
Aggressive health monitoring ping frequency could trigger rate limiting on external APIs (especially Xledger and Dynamics), causing legitimate export calls to fail after the monitor exhausts the API's request quota.
Mitigation & Contingency
Mitigation: Use lightweight health check endpoints (HEAD requests or vendor-specific ping/status endpoints) rather than data requests. Set health check frequency to once per 15 minutes minimum. Implement exponential backoff after consecutive failures.
Contingency: If rate limiting occurs, disable active health monitoring for the affected integration type and switch to passive health detection (mark unhealthy only when a scheduled sync fails).