top of page
Search

Blinded Sample Size Re-estimation for Continuous Endpoints - Part 1

  • Writer: Andrew Yan
    Andrew Yan
  • Sep 5, 2025
  • 3 min read

Updated: Dec 6, 2025

Sample size re-estimation (SSR) is often performed in clinical trials to address uncertainty in design-stage assumptions (e.g., effect size). Unblinded SSR is straightforward but frequently subject to regulatory scrutiny. Blinded SSR is generally more acceptable to regulators, but it typically focuses on

re-estimating nuisance parameters such as the overall variance, response rate, or event rate, while assuming that the treatment effect is known - a questionable assumption, particularly in superiority trials. In this series, I will evaluate the statistical issues in blinded SSR to further enhance understanding of its inherent limitations. It's important to note that any sample size increase based on re-estimation of the treatment effect, whether blinded or unblinded, can potentially inflate the type 1 error probability of the final analysis. The discussion in this series focuses solely on the re-estimation itself.

Consider a parallel study with two equally sized groups (treatment and control) and a continuous endpoint. Suppose the data in the two groups follow normal distributions with means µ₁ and µ₂ and a common variance σ², i.e., 𝑁 (µ₁, σ²) for the treatment group and 𝑁 (µ₂, σ²) for the control group. The combined data from both groups can then be viewed as a random sample from a two-component Gaussian mixture model with a known mixture proportion. Specifically, let 𝑋 be an observation from the combined data then

where 𝜔 is the mixture proportion (i.e., 𝜔 = 1/2). Let 𝛿 = µ₁ - µ₂, then the variance of the mixture distribution is

Eq. (1) implies that, if either σ² or 𝛿² can be reliably obtained from historical studies, then the other can be estimated based on the sample variance of the combined data. That is,

and

with variance

and

respectively, where σ₀² and 𝛿₀² are the historical estimates of σ² and 𝛿², and 𝑛 denotes the combined sample size. It should be noted that the right-hand side of Eq. (2) and Eq. (3) can be negative, and, in such a case, the corresponding estimate is forced to be zero. Since 𝛿² ≤ σ² generally holds in clinical trials, the following patterns are expected:


  • Bias: The estimator for 𝛿² in Eq. (2) is highly sensitive to misspecification of σ₀², whereas the estimator for σ² in Eq. (3) is less sensitive to misspecification of 𝛿₀².

  • Variability: The estimator for 𝛿² is substantially more unstable than the estimator for σ².


Simulations can help shed more light on these patterns. For simplicity, we assume 𝛿 > 0 and a moderate (true) effect size of 𝛿/σ = 0.5 (it suffices to assume that 𝛿 = 0.5 and σ =1). We consider a total sample size of 𝑛 = 172, which is expected to provide approximately 90% power (based on a two-sample t-test and a two-side significance level of 0.05) for the trial. A blinded SSR is planned when approximately 75% of study subjects (𝑛  130) completed the trial. Additionally, the assumed σ₀ ranges from 0.90 to 1.10, and 0.45 to 0.55 for 𝛿₀. Simulation results (based on 1000 replications) are shown in the following two tables, including the mean (Mean) and standard deviation (SD) of the estimated parameter values.



As expected, the estimator for 𝛿² in Eq. (2) is highly sensitive to misspecification of σ₀. The apparent bias when σ₀ = 1 likely reflects small-sample effects - larger samples are required to demonstrate consistency of the estimator. This estimator also exhibits substantial variability. On the other hand, the estimator for σ² in Eq. (3) appears robust against misspecification of 𝛿₀ (at least within the range considered here) and is markedly more precise.

This creates a practical dilemma for blinded SSR: historical variance σ₀² is often readily available, but the poor performance of the estimator for 𝛿² in Eq. (2) makes it unattractive; conversely, the estimator for σ² in Eq. (3) is appealing, yet historical information 𝛿₀² is seldom reliable. A natural question follows: can we construct an estimator that does not rely on historical data? This will be addressed in Part 2 of this series.


 
 
 

Recent Posts

See All
A Bridge Between Regression and ANOVA Thinking

In dose-response studies, the dose level can be treated either as a classification variable in an ANOVA-type model or as a continuous variable in a regression model. There is a fun little bridge betwe

 
 
 
What Defines a Good Clinical Statistician?

I recently attended a leadership training session where a colleague shared an experience involving a physician on a Data Monitoring Committee (DMC) who questioned the qualifications of the committee s

 
 
 

Comments


bottom of page