Missing Data refers to all data points in clinical research that are specified in the study protocol but are not present in the final dataset. This issue affects nearly every clinical trial and has far-reaching consequences for statistical validity, internal validity, and regulatory assessment of study results. A systematic approach to missing data is therefore a central requirement of Good Clinical Practice (GCP) and the relevant ICH guidelines. Despite all preventive measures, the complete absence of missing data cannot be ruled out in any study—what is critical is the planned, transparent approach to handling it.
Causes and Classification of Missing Data
Missing data arise from a variety of reasons that differ significantly in their statistical implications. The literature distinguishes three fundamental mechanisms:
- MCAR (Missing Completely At Random): The absence of a value is completely random and independent of all observed and unobserved variables. Example: A measuring device fails technically without any connection to the patient’s health status. MCAR data theoretically allow unbiased analysis of complete cases, but are rare in clinical practice, empirically difficult to demonstrate, and therefore must not be used as a default assumption.
- MAR (Missing At Random): The absence depends on observed, but not on unobserved, variables. Example: Older patients discontinue more frequently, but age is documented in the dataset. Under the MAR assumption, methods such as Multiple Imputation or MMRM provide unbiased estimates.
- MNAR (Missing Not At Random): The absence depends on the missing value itself—that is, on unobserved quantities. Example: Patients with the most severe pain discontinue the study but do not report this. This mechanism is the most difficult to address and requires sensitivity analyses and worst-case scenarios.
The distinction between these mechanisms is not merely academic: it directly determines which statistical method is regulatorily acceptable and which sensitivity analyses are expected.
Impact on Clinical Trials
Missing data can bias the results of a clinical trial on multiple levels. First, they reduce the effective sample size and thus the statistical power of the study, which can lead to false negative results. Second, depending on the missing data mechanism, they can introduce systematic bias into the efficacy or safety analysis. Third, they affect the interpretability of the results: regulators such as the EMA and BfArM assess the completeness of the primary endpoint as a central quality criterion of a marketing authorization application.
Missing data are particularly critical for the primary efficacy endpoint. A high proportion of missing values (typically above 10–20%) can lead to requirements for additional sensitivity analyses or even rejection of the application, even with a formally positive study result. The EMA Guideline on Missing Data from 2010 establishes the regulatory framework for this and explicitly requires sponsors to specify a priori how they will handle missing data—including justified assumptions about the underlying mechanism and presentation of planned sensitivity scenarios.
Prevention Strategies and Study Design
The most effective approach to missing data is their prevention. Various measures can be implemented during study design to minimize the proportion of missing data:
- Selection of a realistic, patient-friendly visit schedule with as few mandatory visits as possible
- Use of remote or decentralized study formats (Decentralised Clinical Trials) to reduce discontinuations due to travel burden
- Proactive patient retention management through regular contact points and motivating communication
- Use of electronic patient-reported outcomes (ePRO) for daily measurements that can partially compensate for missed visits
- Early identification of discontinuation risks in risk-based monitoring and targeted countermeasures at the investigational site level
Relevance for clinical trials
Missing data is one of the most frequent points of criticism from regulatory authorities when reviewing clinical trial data. A well-conceived missing data plan that documents mechanism assumptions, imputation methods, and sensitivity analyses in advance is now standard for every Phase III trial and a critical quality criterion of the Statistical Analysis Plan. Full-service CROs such as mediconomics support sponsors in developing a robust missing data strategy that withstands regulatory requirements and ensures the integrity of study results.
Frequently Asked Questions (FAQ)
How much missing data is acceptable in a clinical trial?
There is no fixed regulatory threshold. As a rule of thumb: more than 10% missing data for the primary endpoint requires explanation, more than 20% is considered critical and requires robust sensitivity analyses. The EMA evaluates this on a case-by-case basis, taking into account the therapeutic area, study duration, and type of endpoint.
Must the missing data plan be defined in advance?
Yes. The strategy for handling missing data—including assumptions about the mechanism, imputation method, and sensitivity analyses—must be documented in the Statistical Analysis Plan before database lock. Post hoc decisions are classified by regulators as potentially results-driven and critically evaluated.
What is the difference between missing data and protocol deviations?
Missing data refer to measurements not collected, while protocol deviations represent violations of the study protocol (e.g., incorrect dosing, non-compliance with time windows). Both can lead to exclusion from the per-protocol population, but are treated separately from a statistical and regulatory perspective. Missing primary endpoint data typically affect the intent-to-treat analysis, not only the per-protocol evaluation.
Regulatory References
- EMA Guideline on Missing Data in Confirmatory Clinical Trials (EMA/CPMP/EWP/1776/99 Rev. 1, 2010)
- ICH E9(R1) – Addendum on Estimands and Sensitivity Analysis in Clinical Trials (2019)
- ICH E6(R3) – Good Clinical Practice (2023): Data completeness and data quality
- EU Regulation No. 536/2014 (CTR): Documentation of protocol deviations and data gaps
- FDA Guidance: Missing Data in Clinical Trials – A Framework for Drug Development (2010)