OMOP Data Consistency

OMOP standardizes structure.
Not meaning.

In federated research networks, sites can share identical concept IDs and still measure entirely different patient populations. Here is why that happens and what it means for your study.

What OMOP data consistency actually means

The OMOP Common Data Model gives federated research networks a shared language. Sites map their local data to standardized concept IDs, enabling multi-site studies that would otherwise require years of custom integration work. This is a genuine and important achievement.

But there is a distinction that gets overlooked. OMOP standardizes the structure of data. It standardizes vocabulary mapping. What it does not standardize is the clinical meaning behind each concept ID at each site.

OMOP data consistency, in the sense that matters for federated studies, refers to whether sites are actually measuring the same clinical reality when they apply the same concept ID. That is a different question than whether their data is structurally correct.

A concept ID can be applied perfectly according to OMOP conventions at every site in a network, pass every data quality check, and still represent meaningfully different patient populations across those sites.

Why cross-site OMOP variation is common

Clinical coding practice varies across institutions for reasons that have nothing to do with data quality in the traditional sense. Hospitals differ in their diagnostic confirmation requirements, their coding policies, their EHR configurations, and the clinical thresholds their staff apply when assigning diagnoses.

Consider a concept like heart failure with reduced ejection fraction. One hospital may require an echocardiogram confirming an ejection fraction at or below 40 percent before the code is applied. Another may code on clinical judgment without a confirmatory test. A third may use a different EF threshold. All three use the same concept ID. All three have complete, conformant, structurally valid OMOP data.

The patients captured under that concept ID at each site are not the same patients. When you combine those sites in a federated study, you are combining three different populations under a single label.

What standard data quality tools do and do not check

The OHDSI Data Quality Dashboard is an important and widely used tool. It checks three categories of data quality within a single site:

These checks are within-site. The DQD does not compare one site against another. It does not evaluate whether two sites that both pass their individual DQD checks are actually measuring the same thing. That is outside its scope by design.

Cross-site OMOP data consistency requires a separate layer of evaluation, one that looks across the network rather than within each node.

The impact on federated real-world evidence

Federated studies combine data across sites to achieve the statistical power needed for credible real-world evidence. The assumption underlying that combination is that sites are semantically equivalent: that the same concept ID at Site A represents the same clinical population as at Site B.

When that assumption is wrong, the combined dataset reflects a mixture of populations, not a single coherent one. The study may produce a result that is statistically clean, peer-reviewed, and FDA-submitted while being based on a comparison that never actually held.

The problem is not always visible in the results. A silent bias in population composition can shift an effect estimate in ways that are plausible enough to pass clinical review. It may only surface in a post-submission data quality challenge, a failed replication, or a regulatory question that the study team cannot answer.

What a cross-site consistency evaluation looks like

Evaluating OMOP data consistency across sites requires comparing how each site defines each concept in practice, not just whether it uses the right concept ID. That comparison needs to look across multiple dimensions:

Vercori is designed to run this evaluation using semantic fingerprints generated locally at each site. No patient data leaves any institution. The fingerprints are compared centrally and the output is a concept-level consistency report documenting where sites align and where they diverge, along with the magnitude and likely analytical impact of each divergence.

The output is designed to be attached to a study submission. It answers the question regulators are increasingly asking: how do you know your sites were measuring the same thing?

OMOP data consistency and regulatory expectations

FDA guidance on real-world data and real-world evidence has increasingly emphasized consistency across sites as a component of data reliability. Structural correctness within a site is necessary but not sufficient. Demonstrating that sites in a federated network were semantically aligned is becoming part of what a defensible RWE submission requires.

Vercori is built to produce documentation for that purpose: a tamper-evident, reviewer-signed record of cross-site consistency assessment for every concept in a study, designed to attach to a submission package or reference in a methods section.

How Vercori Works

Built for the way federated networks actually operate.

Each institution runs its own local analysis. Vercori receives per-site, per-concept fingerprints, never patient data, never source codes. The platform is designed to:

1

Designed to score every concept by comparing how it is actually recorded across sites and quantifying divergence at the individual concept level.

2

Designed to flag every concept with unresolved divergence before it reaches your analysis, holding results until a qualified reviewer has documented a resolution.

3

Designed to record the full chain including classifications, reviewer decisions, resolution rationales, and gating actions in a tamper-evident log packaged for regulatory submission.

Common Questions

OMOP data consistency: frequently asked questions

Can a site have perfect OMOP data quality and still create cross-site inconsistency?

Yes. The Data Quality Dashboard evaluates data quality within a single site. A site can score perfectly on every DQD check and still define a concept differently than every other site in the network. Cross-site consistency is a separate dimension of data quality that requires a separate evaluation.

Does this problem affect all OMOP concepts equally?

No. Some concepts are highly standardized in clinical practice and show little cross-site variation. Others, particularly chronic conditions, complex diagnoses, and conditions where clinical judgment plays a large role in coding, tend to show more variation. A concept-level evaluation identifies which concepts in a specific study carry meaningful cross-site risk.

Is this a problem with OMOP specifically, or with federated research in general?

It is a feature of federated research generally. Any multi-site study that combines data coded by different institutions faces this challenge. OMOP reduces it by standardizing vocabulary, but it does not eliminate it because vocabulary standardization does not determine clinical coding practice.

How is Vercori designed to evaluate consistency without accessing patient data?

Each site generates a statistical fingerprint of its local OMOP data for the concepts under study. The fingerprint is designed to capture distributional information across six dimensions without containing any patient-level records. Only the fingerprint is transmitted. Vercori then compares fingerprints across sites to identify divergence.

What is the target turnaround for a cross-site consistency evaluation?

The timeline is scoped individually with each pilot partner based on the number of sites, the number of concepts under evaluation, and the complexity of any divergence findings that require clinical review.

Pilot Program

Looking for pilot partners.

Vercori is in active pilot development. We are working with a small number of founding partners to build and validate the platform against real OMOP network use cases. If you run multi-site OMOP studies, operate a network site, or advise pharma sponsors on real-world evidence, we want to hear from you. Pilot studies are scoped individually based on network size and use case.

Book a demo  →

Find out if your sites are measuring the same thing.

Vercori is designed to evaluate cross-site OMOP consistency before your study runs. No patient data leaves any site.

Get in touch  →