Cross-Site Semantic Consistency

The layer of data quality
no one was measuring.

Semantic interoperability in OMOP networks requires more than shared vocabulary. It requires verifying that sites define clinical concepts the same way in practice. Until now, there was no practical way to do that.

What cross-site semantic consistency means

In a federated OMOP network, semantic consistency refers to whether sites in the network apply clinical concept IDs to the same patient populations. It is the difference between shared vocabulary and shared meaning.

Two sites can use the same concept ID, follow all OMOP mapping conventions correctly, and still define that concept differently in clinical practice. One site may require a confirmatory test before coding a diagnosis. Another may code on clinical presentation alone. A third may apply a stricter or looser threshold. The vocabulary is the same. The patients captured under it are not.

Cross-site semantic consistency exists when the populations represented by a concept ID at each site are clinically equivalent. Measuring it requires looking beyond structural data quality into the distributional and contextual properties of how each site applies each concept.

Why semantic interoperability in OMOP is harder than it looks

OMOP achieves vocabulary interoperability, a genuine and important form of standardization. Across a network, concept IDs carry the same formal definition. The challenge is that formal definitions do not fully determine clinical coding practice.

Clinical coding is a human activity shaped by local factors: the EHR interface a clinician uses, the documentation requirements of the institution, the coding policies enforced by the billing department, the specialty distribution of the treating team, and the clinical thresholds applied in that institution's practice culture. These factors produce systematic variation in what gets coded under a given concept ID even when every site is following OMOP conventions correctly.

Semantic interoperability in healthcare data is not achieved by vocabulary mapping alone. It requires evidence that sites are interpreting and applying shared concepts in a consistent way. That evidence has to be generated and documented.

How clinical concept mapping variation manifests

Clinical concept mapping variation is not random. It tends to be systematic, institution-specific, and reproducible. Common patterns include:

The six signals Vercori measures

Vercori is designed to evaluate cross-site semantic consistency across six quantifiable dimensions for each concept in a study:

01

Source code distribution

How patients are distributed across the source codes that map to this concept at each site.

02

Co-occurrence patterns

Which conditions co-occur with this concept and whether that pattern is consistent across sites.

03

Measurement availability

Whether relevant measurement data is present and comparable, reflecting diagnostic confirmation practice.

04

Demographic profile

Whether the age, gender, and demographic composition of patients coded under this concept is consistent.

05

Drug co-prescription

Whether treatment patterns reflect a consistent patient population across sites.

06

Specialty mix

Whether the clinical specialties involved in diagnosis and treatment are comparable.

Each dimension is scored and weighted. The composite divergence score classifies each concept as consistent, divergent with a documented explanation, or divergent and requiring clinical review before the study proceeds.

How Vercori is designed to generate cross-site consistency evidence

Each site in the network runs a local analysis inside its own OMOP environment. The output is a semantic fingerprint: a statistical representation of how the site defines each concept across the six measurement dimensions. No patient records are transmitted. No identifiable data leaves any institution.

Vercori is designed to compare fingerprints across all sites in the network and generate a concept-level consistency report. Qualified reviewers assess each divergent finding and record their decisions in a tamper-evident audit log. The report is built to document which concepts are consistent, which diverge, the nature and magnitude of each divergence, and what was decided about it.

That report is designed to be attached to a study submission, providing the documentation needed to answer the question regulators are increasingly asking: how do you know your sites were measuring the same clinical thing?

What cross-site semantic consistency evaluation is designed to produce

A Vercori evaluation is built to serve three purposes. First, to identify problems early, before data is locked and analysis begins, at the point where protocol adjustments are still possible. Second, to produce documentation for regulatory submission: a reviewer-signed, tamper-evident record of the cross-site consistency assessment for every concept in the study. Third, to generate a network-level picture of how sites define clinical concepts, which is itself a research contribution that can support publication and advance the field.

How Vercori Works

Built for the way federated networks actually operate.

Each institution runs its own local analysis. Vercori receives per-site, per-concept fingerprints, never patient data, never source codes. The platform is designed to:

1

Designed to score every concept by comparing how it is actually recorded across sites and quantifying divergence at the individual concept level.

2

Designed to flag every concept with unresolved divergence before it reaches your analysis, holding results until a qualified reviewer has documented a resolution.

3

Designed to record the full chain including classifications, reviewer decisions, resolution rationales, and gating actions in a tamper-evident log packaged for regulatory submission.

Common Questions

Cross-site semantic consistency: frequently asked questions

Is cross-site semantic consistency the same as semantic interoperability?

They are related but not identical. Semantic interoperability generally refers to whether systems can exchange data and interpret it correctly. Cross-site semantic consistency in the federated RWE context refers specifically to whether different sites are applying clinical concept IDs to equivalent patient populations. Vocabulary interoperability is a necessary condition but not a sufficient one.

How large does a divergence need to be to matter analytically?

That depends on the study. A divergence that shifts a concept's patient population by 15 percent in age or comorbidity burden can materially affect an effect estimate in a study powered for a specific outcome. Vercori is designed to report the magnitude of divergence and its likely analytical impact so that study teams can make informed protocol decisions, not just flag that a divergence exists.

What happens if a concept is flagged as divergent?

The report is designed to document the nature and magnitude of the divergence. Qualified reviewers assess whether the divergence is explainable and acceptable, requires a protocol adjustment such as excluding a site or restricting a concept definition, or needs further clinical investigation. That decision is recorded in the audit log. The report does not mandate a specific action; it provides the information needed to make one.

Does running a cross-site consistency evaluation require special technical infrastructure?

Each site needs to be able to run a local query against its OMOP instance and transmit the resulting fingerprint. The technical requirements are designed to be modest. Vercori is built for deployment in existing OMOP network environments without requiring new data infrastructure at participating sites.

Can the consistency evaluation be run on a subset of concepts?

Yes. Vercori is designed to evaluate the specific concepts driving a study rather than an entire OMOP instance. A focused evaluation of the primary exposure, outcome, and key covariate concepts is typically the right scope for a study-level consistency assessment.

Pilot Program

Looking for pilot partners.

Vercori is in active pilot development. We are working with a small number of founding partners to build and validate the platform against real OMOP network use cases. If you run multi-site OMOP studies, operate a network site, or advise pharma sponsors on real-world evidence, we want to hear from you. Pilot studies are scoped individually based on network size and use case.

Book a demo  →

Measure what your sites actually mean.

Vercori is designed to generate cross-site semantic consistency evidence for every concept in your study. No patient data leaves any site.

Get in touch  →