Real-World Evidence Regulatory Readiness: Data Consistency Documentation

The regulatory landscape for real-world evidence

FDA's framework for real-world evidence has matured substantially. The agency's guidance documents on RWE for drug and biological products and for medical devices have progressively raised the standard for what sponsors need to demonstrate about their data before it can support a regulatory decision.

The core question the agency asks is whether the real-world data used in a study is fit for purpose: whether it is reliable, relevant to the study question, and of sufficient quality to support the conclusions being drawn. For multi-site federated studies, that question has a dimension that most study teams have not been systematically addressing.

What "fit for purpose" means across sites

For a single-site study, fitness for purpose is largely a question of within-site data quality: completeness, accuracy, temporal coverage, and relevance to the population of interest. Standard data quality tools address this well.

For a federated study combining data from multiple sites, fitness for purpose requires an additional demonstration: that the data from different sites is measuring the same clinical phenomenon. A dataset that is complete and accurate at every site but semantically inconsistent across sites is not fit for purpose for a study that treats those sites as equivalent.

FDA guidance expects sponsors to assess whether data from multiple contributing sites is consistent in how key concepts are defined and applied. That assessment needs to be documented and available for review.

The documentation gap in current practice

Most federated RWE study submissions document data quality at the site level. The typical approach is to run the OHDSI Data Quality Dashboard at each site, report that all sites passed, and include summary DQD results in the study documentation.

This documents within-site structural quality. It does not document cross-site semantic consistency. The question of whether sites were defining and applying key concepts in the same way is typically either not addressed or addressed with a qualitative statement that assumes rather than demonstrates equivalence.

As regulatory scrutiny of RWE submissions has increased, this gap has become more visible. Reviewers who ask how sponsors know their sites were measuring the same thing increasingly expect a documented answer, not an assumption.

What a complete RWE data quality documentation package includes

A regulatory-ready data quality documentation package for a federated RWE study should address both layers of quality:

Within-site structural quality

DQD results for each site, confirming completeness, conformance, and plausibility of the OMOP data at each contributing institution.

Cross-site semantic consistency assessment

A concept-level evaluation of whether sites are applying key study concepts to equivalent patient populations, covering exposure, outcome, and primary covariate concepts.

Divergence findings and reviewer decisions

Documentation of any concepts found to diverge across sites, the nature and magnitude of the divergence, the qualified reviewer's assessment, and the decision made about how to handle it.

Audit trail

A tamper-evident record of the consistency evaluation, reviewer identities, decision timestamps, and rationale, traceable back to the specific divergence findings it addresses.

Protocol documentation of consistency evaluation

Reference to the cross-site consistency evaluation in the study protocol and statistical analysis plan, including the criteria used to assess acceptability of divergent findings.

What Vercori is designed to produce for regulatory submissions

Vercori is designed to generate items two through four in the documentation package above. The evaluation is intended to run before the study analysis, during the data characterization phase. Each site generates a semantic fingerprint of its local OMOP data for the study concepts. No patient data leaves any site.

Vercori is built to compare fingerprints across all sites across six measurement dimensions for each concept: source code distribution, co-occurrence patterns, measurement availability, demographic profile, drug co-prescription, and specialty mix. Each concept is designed to receive a divergence score and a classification. Qualified reviewers assess divergent findings and record their decisions.

The output is designed to be a concept-level consistency report with a complete tamper-evident audit log, built to attach to a regulatory submission package or reference in a study methods section. The goal is to answer the question directly: which concepts were evaluated, what was found, what was decided about it, and by whom.

Why this matters now

Three converging factors make cross-site semantic consistency documentation more important today than it was five years ago.

First, FDA's guidance has moved in this direction in concrete terms. FDA's July 2024 final guidance on assessing electronic health records and medical claims data for drug and biological products, and its December 2025 final guidance on real-world evidence for medical devices, both address data quality assurance requirements that go beyond structural correctness. The medical devices guidance specifically names assessment of completeness, accuracy, and consistency across sites and over time as part of what FDA recommends sponsors document. The direction of travel across both guidances is the same: demonstrating that data is fit for the specific regulatory purpose, which for multi-site studies includes addressing how consistently key concepts are applied across sites.

Second, the scale of federated networks has grown. Studies that combine data from ten or twenty sites across diverse institution types carry substantially more cross-site semantic risk than smaller studies of similar institutions. The scope of the problem has grown alongside the ambition of the studies.

Third, the cost of finding a problem late is significant. A post-submission data quality challenge that requires re-analysis, additional site evaluation, or protocol revision can delay approval by months and cost substantially more than addressing the consistency question before submission.

The audit trail requirement

One element of regulatory-ready RWE documentation that is frequently underweighted is the audit trail. It is not sufficient to produce a consistency assessment. The assessment needs to be traceable: who evaluated each finding, when the evaluation occurred, what decision was made, and what the rationale was.

Vercori's reviewer decision workflow is designed to produce exactly this. Each divergence finding is intended to be reviewed by a qualified person whose identity is recorded. The decision and its rationale are designed to be time-stamped and written to a tamper-evident log, included in the final report. The goal is a complete, verifiable chain from raw divergence data to reviewer decision to documented rationale.

That is what "regulatory-ready" means in the context of a cross-site consistency evaluation. Not just that the assessment was performed, but that there is a documented, verifiable record of who assessed it and what they decided.

Common Questions

RWE regulatory readiness: frequently asked questions

Does FDA require cross-site semantic consistency documentation for all RWE submissions?

FDA guidance does not prescribe a specific methodology for demonstrating cross-site consistency. It does expect sponsors to demonstrate that their data is fit for purpose, which for multi-site studies includes addressing whether sites define key concepts consistently. The specific documentation required depends on the study, the data sources, and the regulatory context. The trend in FDA feedback has been toward expecting more, not less, on this question.

At what stage of study development should cross-site consistency be evaluated?

Before data is locked and analysis begins. The consistency evaluation is part of the data characterization phase. Finding a divergent concept after the analysis is complete means either accepting the limitation, adding a caveat to the findings, or re-running the analysis with protocol adjustments. Finding it before means fixing it before it matters.

How does cross-site semantic consistency documentation relate to a study protocol?

The protocol should specify that a cross-site semantic consistency evaluation will be conducted before analysis begins, identify which concepts will be evaluated, and define the criteria for acceptable divergence. A Vercori report is designed to document that the evaluation was performed as specified and to record the findings and reviewer decisions. The two documents together provide the complete picture regulators need.

What if the consistency evaluation identifies a divergent concept that cannot be resolved?

The report is designed to document the divergence, its magnitude, the reviewer assessment, and the decision made. If the decision is to proceed despite the divergence, the rationale and any protocol adjustments are documented. If the decision is to exclude a site or restrict a concept definition, that is also documented. The audit trail is designed to cover the full decision process, including decisions to accept known limitations with documented justification.

Is a Vercori report designed to be shareable with FDA directly?

Yes. The report is designed for that purpose. It is intended to be formatted as a standalone document that can be attached to a submission package, referenced in a clinical study report, or provided in response to a data quality question from a reviewer. The tamper-evident audit log and reviewer signatures are designed to support its use as documentary evidence in a regulatory context.

FDA is asking a question
most RWE teams cannot answer.

The regulatory landscape for real-world evidence

What "fit for purpose" means across sites

The documentation gap in current practice

What a complete RWE data quality documentation package includes

Within-site structural quality

Cross-site semantic consistency assessment

Divergence findings and reviewer decisions

Audit trail

Protocol documentation of consistency evaluation

What Vercori is designed to produce for regulatory submissions

Why this matters now

The audit trail requirement

Built for the way federated networks actually operate.

RWE regulatory readiness: frequently asked questions

Does FDA require cross-site semantic consistency documentation for all RWE submissions?

At what stage of study development should cross-site consistency be evaluated?

How does cross-site semantic consistency documentation relate to a study protocol?

What if the consistency evaluation identifies a divergent concept that cannot be resolved?

Is a Vercori report designed to be shareable with FDA directly?

Looking for pilot partners.

Have a documented answer before the question is asked.

FDA is asking a questionmost RWE teams cannot answer.

The regulatory landscape for real-world evidence

What "fit for purpose" means across sites

The documentation gap in current practice

What a complete RWE data quality documentation package includes

Within-site structural quality

Cross-site semantic consistency assessment

Divergence findings and reviewer decisions

Audit trail

Protocol documentation of consistency evaluation

What Vercori is designed to produce for regulatory submissions

Why this matters now

The audit trail requirement

Built for the way federated networks actually operate.

RWE regulatory readiness: frequently asked questions

Does FDA require cross-site semantic consistency documentation for all RWE submissions?

At what stage of study development should cross-site consistency be evaluated?

How does cross-site semantic consistency documentation relate to a study protocol?

What if the consistency evaluation identifies a divergent concept that cannot be resolved?

Is a Vercori report designed to be shareable with FDA directly?

Looking for pilot partners.

Have a documented answer before the question is asked.

FDA is asking a question
most RWE teams cannot answer.