How do you know your sites were measuring the same thing? Structural data quality checks answer a different question. Here is what regulatory-ready RWE submissions now require.
FDA's framework for real-world evidence has matured substantially. The agency's guidance documents on RWE for drug and biological products and for medical devices have progressively raised the standard for what sponsors need to demonstrate about their data before it can support a regulatory decision.
The core question the agency asks is whether the real-world data used in a study is fit for purpose: whether it is reliable, relevant to the study question, and of sufficient quality to support the conclusions being drawn. For multi-site federated studies, that question has a dimension that most study teams have not been systematically addressing.
For a single-site study, fitness for purpose is largely a question of within-site data quality: completeness, accuracy, temporal coverage, and relevance to the population of interest. Standard data quality tools address this well.
For a federated study combining data from multiple sites, fitness for purpose requires an additional demonstration: that the data from different sites is measuring the same clinical phenomenon. A dataset that is complete and accurate at every site but semantically inconsistent across sites is not fit for purpose for a study that treats those sites as equivalent.
FDA guidance expects sponsors to assess whether data from multiple contributing sites is consistent in how key concepts are defined and applied. That assessment needs to be documented and available for review.
Most federated RWE study submissions document data quality at the site level. The typical approach is to run the OHDSI Data Quality Dashboard at each site, report that all sites passed, and include summary DQD results in the study documentation.
This documents within-site structural quality. It does not document cross-site semantic consistency. The question of whether sites were defining and applying key concepts in the same way is typically either not addressed or addressed with a qualitative statement that assumes rather than demonstrates equivalence.
As regulatory scrutiny of RWE submissions has increased, this gap has become more visible. Reviewers who ask how sponsors know their sites were measuring the same thing increasingly expect a documented answer, not an assumption.
A regulatory-ready data quality documentation package for a federated RWE study should address both layers of quality:
DQD results for each site, confirming completeness, conformance, and plausibility of the OMOP data at each contributing institution.
A concept-level evaluation of whether sites are applying key study concepts to equivalent patient populations, covering exposure, outcome, and primary covariate concepts.
Documentation of any concepts found to diverge across sites, the nature and magnitude of the divergence, the qualified reviewer's assessment, and the decision made about how to handle it.
A tamper-evident record of the consistency evaluation, reviewer identities, decision timestamps, and rationale, traceable back to the specific divergence findings it addresses.
Reference to the cross-site consistency evaluation in the study protocol and statistical analysis plan, including the criteria used to assess acceptability of divergent findings.
Vercori is designed to generate items two through four in the documentation package above. The evaluation is intended to run before the study analysis, during the data characterization phase. Each site generates a semantic fingerprint of its local OMOP data for the study concepts. No patient data leaves any site.
Vercori is built to compare fingerprints across all sites across six measurement dimensions for each concept: source code distribution, co-occurrence patterns, measurement availability, demographic profile, drug co-prescription, and specialty mix. Each concept is designed to receive a divergence score and a classification. Qualified reviewers assess divergent findings and record their decisions.
The output is designed to be a concept-level consistency report with a complete tamper-evident audit log, built to attach to a regulatory submission package or reference in a study methods section. The goal is to answer the question directly: which concepts were evaluated, what was found, what was decided about it, and by whom.
Three converging factors make cross-site semantic consistency documentation more important today than it was five years ago.
First, FDA's guidance has moved in this direction in concrete terms. FDA's July 2024 final guidance on assessing electronic health records and medical claims data for drug and biological products, and its December 2025 final guidance on real-world evidence for medical devices, both address data quality assurance requirements that go beyond structural correctness. The medical devices guidance specifically names assessment of completeness, accuracy, and consistency across sites and over time as part of what FDA recommends sponsors document. The direction of travel across both guidances is the same: demonstrating that data is fit for the specific regulatory purpose, which for multi-site studies includes addressing how consistently key concepts are applied across sites.
Second, the scale of federated networks has grown. Studies that combine data from ten or twenty sites across diverse institution types carry substantially more cross-site semantic risk than smaller studies of similar institutions. The scope of the problem has grown alongside the ambition of the studies.
Third, the cost of finding a problem late is significant. A post-submission data quality challenge that requires re-analysis, additional site evaluation, or protocol revision can delay approval by months and cost substantially more than addressing the consistency question before submission.
One element of regulatory-ready RWE documentation that is frequently underweighted is the audit trail. It is not sufficient to produce a consistency assessment. The assessment needs to be traceable: who evaluated each finding, when the evaluation occurred, what decision was made, and what the rationale was.
Vercori's reviewer decision workflow is designed to produce exactly this. Each divergence finding is intended to be reviewed by a qualified person whose identity is recorded. The decision and its rationale are designed to be time-stamped and written to a tamper-evident log, included in the final report. The goal is a complete, verifiable chain from raw divergence data to reviewer decision to documented rationale.
That is what "regulatory-ready" means in the context of a cross-site consistency evaluation. Not just that the assessment was performed, but that there is a documented, verifiable record of who assessed it and what they decided.
Each institution runs its own local analysis. Vercori receives per-site, per-concept fingerprints, never patient data, never source codes. The platform is designed to:
Designed to score every concept by comparing how it is actually recorded across sites and quantifying divergence at the individual concept level.
Designed to flag every concept with unresolved divergence before it reaches your analysis, holding results until a qualified reviewer has documented a resolution.
Designed to record the full chain including classifications, reviewer decisions, resolution rationales, and gating actions in a tamper-evident log packaged for regulatory submission.
FDA guidance does not prescribe a specific methodology for demonstrating cross-site consistency. It does expect sponsors to demonstrate that their data is fit for purpose, which for multi-site studies includes addressing whether sites define key concepts consistently. The specific documentation required depends on the study, the data sources, and the regulatory context. The trend in FDA feedback has been toward expecting more, not less, on this question.
Before data is locked and analysis begins. The consistency evaluation is part of the data characterization phase. Finding a divergent concept after the analysis is complete means either accepting the limitation, adding a caveat to the findings, or re-running the analysis with protocol adjustments. Finding it before means fixing it before it matters.
The protocol should specify that a cross-site semantic consistency evaluation will be conducted before analysis begins, identify which concepts will be evaluated, and define the criteria for acceptable divergence. A Vercori report is designed to document that the evaluation was performed as specified and to record the findings and reviewer decisions. The two documents together provide the complete picture regulators need.
The report is designed to document the divergence, its magnitude, the reviewer assessment, and the decision made. If the decision is to proceed despite the divergence, the rationale and any protocol adjustments are documented. If the decision is to exclude a site or restrict a concept definition, that is also documented. The audit trail is designed to cover the full decision process, including decisions to accept known limitations with documented justification.
Yes. The report is designed for that purpose. It is intended to be formatted as a standalone document that can be attached to a submission package, referenced in a clinical study report, or provided in response to a data quality question from a reviewer. The tamper-evident audit log and reviewer signatures are designed to support its use as documentary evidence in a regulatory context.
Vercori is in active pilot development. We are working with a small number of founding partners to build and validate the platform against real OMOP network use cases. If you run multi-site OMOP studies, operate a network site, or advise pharma sponsors on real-world evidence, we want to hear from you. Pilot studies are scoped individually based on network size and use case.
Book a demo →Vercori is designed to produce submission-ready cross-site consistency documentation. Timeline is scoped individually with each pilot partner based on network size and study complexity.
Get in touch →