Confidence Interval - mediconomics.com

A confidence interval (CI) describes the range of values within which an unknown parameter of the population—for example, a mean blood pressure reduction or a hazard ratio—can lie with a predefined level of certainty. In clinical trials, the CI is typically reported together with a point estimate because it reveals both the direction and the precision of an effect estimate. For sponsors, CROs, and regulatory authorities, the confidence interval is therefore often more informative than an isolated p-value.

What a Confidence Interval Indicates in Clinical Trials

A 95% CI is constructed such that, in a large number of conceptually repeated studies with identical design, approximately 95% of these studies would yield a calculated interval that covers the true parameter value. Importantly, this does not mean that the true value lies “with 95% probability” within this particular interval; in classical frequentist statistics, the parameter is considered fixed, while the interval is random. In practice, however, the CI is used as a measure of uncertainty and serves to interpret clinical relevance.

For relative measures such as odds ratio or hazard ratio, the CI is frequently calculated on the log scale and subsequently back-transformed. This results in intervals that are usually asymmetric. For continuous endpoints (e.g., change in a score), CIs are often approximately symmetric when normality assumptions are plausible or large sample sizes are available.

Relationship to p-Value and Significance

For many standard tests, the following applies: if a two-sided 95% confidence interval does not contain the null value (e.g., difference = 0 or ratio = 1), this corresponds to a p-value less than 0.05. However, the CI additionally provides information on how large the effect can plausibly be: a narrow CI indicates a precise estimate; a wide CI indicates high uncertainty, for instance due to small sample size, high variability, or rare events. Particularly for safety endpoints, wide CIs are common and must be transparently addressed in the benefit-risk assessment.

In regulatory dossiers and clinical study reports, both the p-value and the CI are therefore frequently tabulated. For internal decision-making processes (go/no-go, dose selection, study continuation), CIs are especially useful because they illustrate the range of potential effects.

Typical Applications in Superiority, Non-Inferiority, and Equivalence Trials

In superiority trials, the CI is used to assess whether the data are compatible with a clinically relevant benefit. In non-inferiority trials, the comparison with the non-inferiority margin is paramount: the critical factor is that the “worst” plausible effect (depending on the effect measure, the lower or upper CI boundary) does not exceed the margin. Equivalence trials typically require that the entire CI lies within a predefined equivalence range.

This logic is closely linked to the statistical test plan (specification of alpha, one- or two-sided consideration, hierarchies). Changes to evaluation windows, populations (ITT/FAS vs. PP), or model assumptions can alter CIs and should be documented consistently in the statistical analysis plan and the CSR.

Interpretation Pitfalls and Practical Guidance

A common misinterpretation is equating “not significant” with “no effect.” A CI that includes the null value may still not exclude a clinically relevant effect if it is wide. Conversely, a very narrow CI may be statistically significant but only allow for a small, clinically less relevant difference. Therefore, CIs should always be discussed in the context of the minimal clinically important difference, endpoint definition, and measurement accuracy.

Further pitfalls include multiple comparisons and data-driven subgroups. Without appropriate adjustment, reported CIs may convey an overly optimistic picture of precision. In practice, sensitivity analyses and robust variance estimators are therefore employed to examine the stability of CIs.

For project teams, it is also relevant how confidence intervals are interpreted in conjunction with protocol deviations and missing data. For example, if a model makes an assumption about missing follow-up, the CI can become considerably narrower or wider without the point estimate changing substantially. Therefore, sponsors should specify in the SAP which imputation or modeling approaches are primary and which serve as sensitivity analyses.

A practical tip for medical writing: do not only describe whether the CI exceeds the null value, but also explain which effect sizes are plausibly supported or excluded by the data. This facilitates the discussion of clinical relevance, particularly when the study was designed for a specific minimal difference.

In adaptive designs or interim analyses, confidence intervals are frequently adjusted to the alpha-spending strategy. In such cases, 95% CIs may be replaced by other confidence levels to control the overall error probability. These details should be presented consistently in the protocol and the CSR to ensure traceability for regulatory authorities.

Regulatory Context and Reporting Practice

Regulatory authorities and notified bodies expect a transparent presentation of effect sizes and uncertainties. This applies to both medicinal product trials under Regulation (EU) No. 536/2014 and clinical investigations of medical devices under MDR 2017/745. Within the framework of ICH E9 (statistical principles) and ICH E6(R3) (GCP modernization), transparent analysis and reporting chains are central, including the question of which confidence levels are used and how sensitivity analyses address uncertainty.

FAQ

Why Is a 95% Confidence Interval Typically Used?

95% is a historically established conventional level that aligns well with a two-sided alpha of 0.05. Depending on the context (e.g., interim analyses or multiple endpoints), other levels may be appropriate.

Can a Confidence Interval Be “Incorrect”?

The interval follows from model assumptions. If these are violated (e.g., strong deviation from distributional assumptions, informative censoring), the CI may underestimate uncertainty. In such cases, alternative models or robust methods are indicated.

How Are CIs Used for Clinical Relevance?

The CI boundaries are compared with predefined thresholds (e.g., non-inferiority margin or clinically relevant difference). This reveals which effect sizes are compatible with the data.

Regulatory References: ICH E9 (Statistical Principles for Clinical Trials), ICH E6(R3) Guideline for Good Clinical Practice, Regulation (EU) No. 536/2014 (Clinical Trials Regulation), Regulation (EU) 2017/745 (MDR).