Distributed clinical data analysis builds statistical power by combining sites. The assumption underneath that combination is that sites are equivalent. When it is wrong, the study does not fail visibly. It produces a number that is wrong.
Federated real-world evidence studies distribute analysis across multiple healthcare sites rather than centralizing patient data. Each site analyzes its own data locally. Results or statistical summaries are aggregated centrally. This architecture enables studies that would otherwise be impossible: datasets large enough to detect rare outcomes, diverse enough to support subgroup analysis, and assembled without the legal and logistical barriers of a central data repository.
For FDA submissions, federated RWE offers a path to generating evidence at scale from routine clinical data. For sponsors, it reduces data transfer risk and accelerates study timelines. For networks like OHDSI, it is the core methodology enabling international collaboration.
The benefits are real. So is the risk that most federated study designs do not address.
Every federated study rests on an assumption: that the same concept ID at Site A represents the same clinical population as at Site B. Call it the equivalence assumption. It is the foundation on which the statistical combination of sites is built.
If Site A and Site B are measuring different patients under the same label, combining their data does not increase statistical power on a shared population. It combines two different populations under a single analysis while treating them as one.
Most federated study designs document that sites passed structural data quality checks. Almost none document whether sites were semantically equivalent. These are different questions with different answers.
The problem is structural. When data is collected and coded independently at each site, local coding practice governs what ends up in the dataset. Those practices vary in ways that are systematic, institution-specific, and largely invisible to researchers working with aggregated results.
The variation is not random noise. It reflects real differences in clinical culture, EHR configuration, documentation workflows, and the judgment calls clinicians make at the point of care. A site with a strong cardiology program may document and code cardiovascular concepts more completely and specifically than a generalist community hospital. A site with a particular EHR vendor may have different default coding pathways than a site on a different system.
These differences do not show up in structural data quality checks because those checks evaluate whether data is complete, conformant, and plausible within a site. They do not evaluate whether one site's complete, conformant, plausible dataset is measuring the same clinical phenomenon as another site's.
A federated RWE study that fails due to cross-site semantic inconsistency does not fail obviously. It produces results. Those results may be statistically significant, clinically plausible, and internally consistent. They may pass peer review. They may be submitted to FDA.
The failure mode is that the effect estimate reflects a mixture of populations rather than a single defined one. If one site's patients are systematically sicker, or younger, or coded under a stricter diagnostic threshold, the combined result reflects that compositional difference rather than the clinical reality the study intended to measure.
The most dangerous version is when cross-site variation partially cancels out, producing an effect estimate that appears stable across sensitivity analyses while masking the underlying inconsistency. The study looks robust. The problem is invisible.
Documenting that sites passed the Data Quality Dashboard is necessary. It is not sufficient for a credible federated study. What is also needed is a concept-level evaluation of whether sites are semantically equivalent on the specific concepts driving the study.
That evaluation needs to operate across several dimensions for each concept:
Vercori is designed to evaluate all six dimensions for each concept in a study, generate a divergence score, and produce a documented report of findings with reviewer decisions recorded in a tamper-evident audit log.
FDA guidance on real-world evidence has progressively emphasized that data reliability requires more than structural correctness. The agency expects sponsors to demonstrate that data from multiple sources is fit for the specific purpose of the study, which includes demonstrating consistency in how key concepts are defined and applied across sites.
Vercori is built to produce the documentation needed to answer that question directly: a site-by-site, concept-by-concept consistency assessment designed to attach to a submission package or reference in a study methods section.
Each institution runs its own local analysis. Vercori receives per-site, per-concept fingerprints, never patient data, never source codes. The platform is designed to:
Designed to score every concept by comparing how it is actually recorded across sites and quantifying divergence at the individual concept level.
Designed to flag every concept with unresolved divergence before it reaches your analysis, holding results until a qualified reviewer has documented a resolution.
Designed to record the full chain including classifications, reviewer decisions, resolution rationales, and gating actions in a tamper-evident log packaged for regulatory submission.
It affects any study that combines data from multiple independently coded sites. The magnitude of risk varies by concept, therapeutic area, and how much coding practice varies across the specific sites in a network. A concept-level evaluation identifies which concepts in a specific study carry meaningful cross-site risk rather than applying a blanket assessment.
OHDSI network membership means sites have mapped their data to OMOP and meet certain data quality standards. It does not mean sites code concepts identically. Cross-site semantic variation has been documented within OHDSI networks in published research. Shared network membership reduces structural heterogeneity, not semantic heterogeneity.
Standard methods like site-stratification and mixed-effects models can account for some forms of cross-site heterogeneity. They cannot correct for the underlying problem if the source of heterogeneity is unknown or unmeasured. Knowing which concepts diverge, and by how much, is a prerequisite for making informed methodological choices.
Vercori is designed to run before the study analysis, during the data characterization phase. Each site would generate a semantic fingerprint of its local OMOP data for the study concepts. Fingerprints are compared centrally. The resulting consistency report is intended to inform protocol decisions before data is locked and analysis begins.
The report is designed to classify each concept and document the nature and magnitude of the divergence. Qualified reviewers record a decision: whether the divergence is explainable and acceptable, requires a protocol adjustment, or needs additional clinical review. Those decisions and their rationale are recorded in the audit log and included in the final report.
Vercori is in active pilot development. We are working with a small number of founding partners to build and validate the platform against real OMOP network use cases. If you run multi-site OMOP studies, operate a network site, or advise pharma sponsors on real-world evidence, we want to hear from you. Pilot studies are scoped individually based on network size and use case.
Book a demo →Vercori is designed to evaluate cross-site semantic consistency before your study runs. Timeline is scoped individually with each pilot partner.
Get in touch →