Service Layer high complexity Shared Component backend
2
Dependencies
1
Dependents
0
Entities
0
Integrations

Description

Ensures that unique participant counts are accurate by detecting and excluding double-counted records arising from coordinator proxy registrations and bulk registrations. Prevents the same participant from being counted multiple times across overlapping registrations within the same reporting period.

Feature: Bufdir Data Aggregation

participant-deduplication-service

Summaries

Accurate participant counts are the foundation of credible Bufdir reporting — inflated or duplicated numbers risk funding disputes and compliance failures. The Participant Deduplication Service solves the specific problem created by coordinator proxy registrations and bulk sign-ups, where the same individual can appear multiple times across overlapping registration records. By ensuring each unique participant is counted exactly once per reporting period, this component protects the organization's integrity with Bufdir, prevents overstating impact metrics, and builds the kind of data trust that supports long-term funding relationships. As a shared service, it delivers this accuracy benefit consistently across all reporting workflows without duplicated development cost.

This is a shared, high-complexity component consumed by the Bufdir Aggregation Service, meaning its completion is a hard prerequisite before aggregation integration testing can begin. The deduplication logic must handle edge cases around proxy and bulk registrations that may not be fully documented — plan for a requirements clarification sprint with domain experts before development starts. The flagAmbiguousDuplicates() and resolveDeduplicationConflict() interfaces imply a manual resolution workflow exists, which requires coordination with the frontend team if a UI for conflict resolution is in scope. Testing strategy must include realistic data scenarios with overlapping registrations.

Given the shared status, any regressions affect multiple downstream consumers simultaneously — prioritize test coverage accordingly.

The service operates on raw registration records provided by the AggregationQueryBuilder and applies set-based deduplication logic scoped per organization via MultiOrgDataIsolator. The core deduplication algorithm needs to handle three overlap cases: direct-vs-proxy (same participant registered by coordinator and by themselves), direct-vs-bulk (participant in a bulk import and an individual registration), and proxy-vs-bulk. The detectProxyOverlap() method is the most algorithmically sensitive — its implementation should use deterministic identifier matching (not probabilistic) to avoid non-reproducible results across aggregation runs. The flagAmbiguousDuplicates() interface surfaces low-confidence cases for human review, implying a conflicts table in the database schema.

getDeduplicationReport() should be idempotent and cacheable. Being a shared component, it must be strictly stateless between calls to avoid cross-organization contamination.

Responsibilities

  • Identify participants registered both directly and via proxy/bulk
  • Deduplicate participant sets per activity type and reporting period
  • Flag edge cases where deduplication confidence is low
  • Produce a clean unique-participant count per Bufdir reporting category

Interfaces

deduplicateParticipants(rawRecords, orgId)
getUniqueParticipantCount(orgId, periodId, activityType)
detectProxyOverlap(directRegistrations, bulkRegistrations)
flagAmbiguousDuplicates(conflicts)
resolveDeduplicationConflict(conflictId, resolution)
getDeduplicationReport(orgId, periodId)

Relationships

Dependencies (2)

Components this component depends on

Dependents (1)

Components that depend on this component

API Contract

View full contract →
REST /api/v1/participant-deduplication 6 endpoints
GET /api/v1/participant-deduplication
GET /api/v1/participant-deduplication/:id
POST /api/v1/participant-deduplication
PUT /api/v1/participant-deduplication/:id
DELETE /api/v1/participant-deduplication/:id
GET /api/v1/participant-deduplication/unique-count