Cross-Site Semantic Consistency in OMOP Networks

What cross-site semantic consistency means

In a federated OMOP network, semantic consistency refers to whether sites in the network apply clinical concept IDs to the same patient populations. It is the difference between shared vocabulary and shared meaning.

Two sites can use the same concept ID, follow all OMOP mapping conventions correctly, and still define that concept differently in clinical practice. One site may require a confirmatory test before coding a diagnosis. Another may code on clinical presentation alone. A third may apply a stricter or looser threshold. The vocabulary is the same. The patients captured under it are not.

Cross-site semantic consistency exists when the populations represented by a concept ID at each site are clinically equivalent. Measuring it requires looking beyond structural data quality into the distributional and contextual properties of how each site applies each concept.

Why semantic interoperability in OMOP is harder than it looks

OMOP achieves vocabulary interoperability, a genuine and important form of standardization. Across a network, concept IDs carry the same formal definition. The challenge is that formal definitions do not fully determine clinical coding practice.

Clinical coding is a human activity shaped by local factors: the EHR interface a clinician uses, the documentation requirements of the institution, the coding policies enforced by the billing department, the specialty distribution of the treating team, and the clinical thresholds applied in that institution's practice culture. These factors produce systematic variation in what gets coded under a given concept ID even when every site is following OMOP conventions correctly.

Semantic interoperability in healthcare data is not achieved by vocabulary mapping alone. It requires evidence that sites are interpreting and applying shared concepts in a consistent way. That evidence has to be generated and documented.

How clinical concept mapping variation manifests

Clinical concept mapping variation is not random. It tends to be systematic, institution-specific, and reproducible. Common patterns include:

Diagnostic confirmation thresholds: some sites require objective evidence before coding; others code on clinical judgment. This produces consistent differences in the patient populations captured under diagnostic concepts.
Source code distribution: the specific billing and clinical codes that map to a given OMOP concept vary across sites, reflecting local coding vocabulary and documentation workflows. Sites that predominantly use different source codes under the same concept ID may be capturing different clinical realities.
Co-occurrence patterns: the conditions that co-occur with a concept at a site reflect the clinical context in which that concept is being applied. Divergent co-occurrence profiles suggest the concept is being used differently.
Specialty mix: which clinical specialties are involved in diagnosis and treatment reflects the patient population being captured. A concept predominantly coded by cardiologists at one site and by primary care at another may represent systematically different patients.

The six signals Vercori measures

Vercori is designed to evaluate cross-site semantic consistency across six quantifiable dimensions for each concept in a study:

Source code distribution

How patients are distributed across the source codes that map to this concept at each site.

Co-occurrence patterns

Which conditions co-occur with this concept and whether that pattern is consistent across sites.

Measurement availability

Whether relevant measurement data is present and comparable, reflecting diagnostic confirmation practice.

Demographic profile

Whether the age, gender, and demographic composition of patients coded under this concept is consistent.

Drug co-prescription

Whether treatment patterns reflect a consistent patient population across sites.

Specialty mix

Whether the clinical specialties involved in diagnosis and treatment are comparable.

Each dimension is scored and weighted. The composite divergence score classifies each concept as consistent, divergent with a documented explanation, or divergent and requiring clinical review before the study proceeds.

How Vercori is designed to generate cross-site consistency evidence

Each site in the network runs a local analysis inside its own OMOP environment. The output is a semantic fingerprint: a statistical representation of how the site defines each concept across the six measurement dimensions. No patient records are transmitted. No identifiable data leaves any institution.

Vercori is designed to compare fingerprints across all sites in the network and generate a concept-level consistency report. Qualified reviewers assess each divergent finding and record their decisions in a tamper-evident audit log. The report is built to document which concepts are consistent, which diverge, the nature and magnitude of each divergence, and what was decided about it.

That report is designed to be attached to a study submission, providing the documentation needed to answer the question regulators are increasingly asking: how do you know your sites were measuring the same clinical thing?

What cross-site semantic consistency evaluation is designed to produce

A Vercori evaluation is built to serve three purposes. First, to identify problems early, before data is locked and analysis begins, at the point where protocol adjustments are still possible. Second, to produce documentation for regulatory submission: a reviewer-signed, tamper-evident record of the cross-site consistency assessment for every concept in the study. Third, to generate a network-level picture of how sites define clinical concepts, which is itself a research contribution that can support publication and advance the field.

Common Questions

Cross-site semantic consistency: frequently asked questions

Is cross-site semantic consistency the same as semantic interoperability?

They are related but not identical. Semantic interoperability generally refers to whether systems can exchange data and interpret it correctly. Cross-site semantic consistency in the federated RWE context refers specifically to whether different sites are applying clinical concept IDs to equivalent patient populations. Vocabulary interoperability is a necessary condition but not a sufficient one.

How large does a divergence need to be to matter analytically?

That depends on the study. A divergence that shifts a concept's patient population by 15 percent in age or comorbidity burden can materially affect an effect estimate in a study powered for a specific outcome. Vercori is designed to report the magnitude of divergence and its likely analytical impact so that study teams can make informed protocol decisions, not just flag that a divergence exists.

What happens if a concept is flagged as divergent?

The report is designed to document the nature and magnitude of the divergence. Qualified reviewers assess whether the divergence is explainable and acceptable, requires a protocol adjustment such as excluding a site or restricting a concept definition, or needs further clinical investigation. That decision is recorded in the audit log. The report does not mandate a specific action; it provides the information needed to make one.

Does running a cross-site consistency evaluation require special technical infrastructure?

Each site needs to be able to run a local query against its OMOP instance and transmit the resulting fingerprint. The technical requirements are designed to be modest. Vercori is built for deployment in existing OMOP network environments without requiring new data infrastructure at participating sites.

Can the consistency evaluation be run on a subset of concepts?

Yes. Vercori is designed to evaluate the specific concepts driving a study rather than an entire OMOP instance. A focused evaluation of the primary exposure, outcome, and key covariate concepts is typically the right scope for a study-level consistency assessment.

The layer of data quality
no one was measuring.

What cross-site semantic consistency means

Why semantic interoperability in OMOP is harder than it looks

How clinical concept mapping variation manifests

The six signals Vercori measures

Source code distribution

Co-occurrence patterns

Measurement availability

Demographic profile

Drug co-prescription

Specialty mix

How Vercori is designed to generate cross-site consistency evidence

What cross-site semantic consistency evaluation is designed to produce

Built for the way federated networks actually operate.

Cross-site semantic consistency: frequently asked questions

Is cross-site semantic consistency the same as semantic interoperability?

How large does a divergence need to be to matter analytically?

What happens if a concept is flagged as divergent?

Does running a cross-site consistency evaluation require special technical infrastructure?

Can the consistency evaluation be run on a subset of concepts?

Looking for pilot partners.

Measure what your sites actually mean.

The layer of data qualityno one was measuring.

What cross-site semantic consistency means

Why semantic interoperability in OMOP is harder than it looks

How clinical concept mapping variation manifests

The six signals Vercori measures

Source code distribution

Co-occurrence patterns

Measurement availability

Demographic profile

Drug co-prescription

Specialty mix

How Vercori is designed to generate cross-site consistency evidence

What cross-site semantic consistency evaluation is designed to produce

Built for the way federated networks actually operate.

Cross-site semantic consistency: frequently asked questions

Is cross-site semantic consistency the same as semantic interoperability?

How large does a divergence need to be to matter analytically?

What happens if a concept is flagged as divergent?

Does running a cross-site consistency evaluation require special technical infrastructure?

Can the consistency evaluation be run on a subset of concepts?

Looking for pilot partners.

Measure what your sites actually mean.

The layer of data quality
no one was measuring.