Imputation - mediconomics.com

In clinical research, imputation refers to the statistical procedure used to replace missing data values with plausible estimates. Since measurements are frequently missing in clinical trials despite careful planning—due to factors such as study discontinuation, missed visits, or technical errors—a proper imputation strategy is essential for the integrity of the study results. The choice of method directly influences the validity of the primary efficacy analysis and, consequently, the approvability of a medicinal product or medical device.

Causes of Missing Data in Clinical Trials

Missing data arise for various reasons, which differ significantly in their clinical significance and statistical consequences. In principle, statistical theory distinguishes between three mechanisms that are difficult to separate in practice:

MCAR (Missing Completely At Random): The absence is entirely random and independent of all observed and unobserved variables. Example: technical failure of a measuring device at a study center. In principle, this mechanism allows for a complete case analysis without bias, but it is rarely demonstrable in practice.
MAR (Missing At Random): The absence depends on observed, but not unobserved, variables. Example: older patients drop out more frequently, but the reason for dropout is documented. Multiple Imputation and MMRM are statistically valid under this assumption.
MNAR (Missing Not At Random): The absence depends on the missing value itself. Example: patients with severe side effects drop out without fully reporting the side effect. This mechanism is the most difficult to treat and requires sensitivity analyses under various MNAR assumptions.

The identification of the underlying mechanism directly influences the choice of imputation method and must be specified and justified in advance in the Statistical Analysis Plan. A retrospective decision is considered potentially bias-inducing.

Common Imputation Methods

Biostatistics offers a wide range of imputation procedures, which are used depending on the data situation, study design, and regulatory requirements. Each method presupposes certain assumptions about the missing data mechanism:

Last Observation Carried Forward (LOCF): The last available measurement of a patient is adopted for all subsequent missing time points. The method is simple, but biased in non-stable disease progressions and is often no longer accepted as a primary procedure in modern marketing authorization applications.
Baseline Observation Carried Forward (BOCF): The baseline value is used as a substitute for missing values. It is often used as a conservative sensitivity analysis, as it assumes the worst possible outcome.
Multiple Imputation (MI): Missing values are replaced multiple times (typically 20–100 datasets) using a regression model. Each completed dataset is analyzed separately, and the results are then combined according to Rubin’s Rules. Multiple Imputation is considered the most statistically robust procedure under the MAR assumption and is preferred by regulatory authorities.
Mixed Models for Repeated Measures (MMRM): Based on the full likelihood principle, it utilizes all available data without explicit imputation. This is the preferred method in psychiatric, neurological, and other indications with longitudinal data.
Hot-Deck Imputation: Missing values are replaced by observed values from similar subjects within the same dataset. The method preserves distributional properties but is limited with small sample sizes.

Estimands and the New ICH E9(R1) Perspective

The ICH E9(R1) addendum on estimands and sensitivity analyses has fundamentally changed the regulatory discussion regarding missing data. The focus is no longer primarily on the imputation method itself, but on the precise definition of the estimand—the scientific question that the study is intended to answer.

Depending on the estimand strategy, missing values are treated differently: The so-called “treatment policy strategy” considers all observed data regardless of whether the patient continued the treatment. The “hypothetical strategy” asks what the result would have been if all patients had received the treatment in full. The definition of the estimand directly dictates which imputation method is statistically coherent. Since the introduction of the addendum, regulators in the EU and at the BfArM explicitly expect this logical consistency in marketing authorization applications.

Relevance for clinical trials

Inadequate handling of missing data can significantly jeopardize the internal validity of a study and lead to biased efficacy or safety statements. Regulatory authorities such as the EMA and the BfArM evaluate the imputation strategy as a critical element of the statistical integrity of a marketing authorization application. Full-service CROs like mediconomics support sponsors in statistical study planning, the definition of suitable estimands, and the development of a regulatorily robust imputation strategy that is anchored in the study protocol and the Statistical Analysis Plan from the outset.

Frequently Asked Questions (FAQ)

Which imputation method is preferred by regulators?

There is no universally preferred method. The EMA and ICH require a method that fits the respective estimand and the missing data assumption. Multiple Imputation and MMRM are currently considered the most scientifically robust procedures, but they must be supported by sensitivity analyses. LOCF is no longer acceptable as a primary procedure in many modern submissions.

Must the imputation strategy be determined before the start of the study?

Yes. The imputation strategy must be defined and justified in the Statistical Analysis Plan before the database lock. Subsequent changes are considered potentially bias-inducing and must be documented as protocol deviations. Regulators view unannounced strategy adjustments critically, especially if they influence the primary study result.

What is the difference between primary imputation and sensitivity analysis?

Primary imputation defines the main analysis and is determined in advance. Sensitivity analyses test whether the results remain stable under other plausible missing data assumptions. Typical scenarios include BOCF as a conservative alternative or tipping-point analyses, which show at what extent of missing data the conclusion would shift.

Regulatory References

ICH E9(R1) – Addendum on Estimands and Sensitivity Analysis in Clinical Trials (2019)
EMA Guideline on Missing Data in Confirmatory Clinical Trials (EMA/CPMP/EWP/1776/99 Rev. 1, 2010)
ICH E6(R3) – Good Clinical Practice (2023): Requirements for data completeness and quality
EU Regulation No. 536/2014 (CTR): Documentation obligations for protocol deviations and data completeness
FDA Guidance for Industry: Missing Data in Clinical Trials (2010) – international reference for comparative submissions