top of page
Search

A Cautionary Note on Correlation Analysis Using Post-Treatment Predictors

  • Writer: Andrew Yan
    Andrew Yan
  • Aug 23, 2024
  • 2 min read

Updated: May 4

When a clinical trial demonstrates positive efficacy results, there is often interest in conducting various post-hoc analyses, including exploring correlations between specific biomarkers and efficacy endpoints. Although controversial, using post-treatment biomarker data (e.g., change from baseline) in these analyses is a common practice endorsed by many medical professionals.

A statistical issue with such analyses is that correlations observed using post-treatment biomarker data are conditional on the specific study treatment and therefore may not apply in future trials with different treatments. This post presents a simple example to illustrate this phenomenon in statistics.

Assume Ⅹ and Υ are joint normal random variables with 𝐸(Ⅹ) = 𝐸(Υ) = 0, 𝑉𝑎𝑟(Ⅹ) = 𝑉𝑎𝑟(Υ) = 1 and a nontrivial correlation ρ (0 < ρ < 1), then the distribution of Υ given Ⅹ = 𝑥, denoted by Υ|Ⅹ = 𝑥, is a normal distribution with 𝐸(Υ|Ⅹ = 𝑥) = ρ𝑥 and 𝑉𝑎𝑟(Υ|Ⅹ = 𝑥) = 1². Let

then it is straightforward to verify the following covariances:

  • 𝐶𝑜𝑣(Ⅹ, 𝑍) = √(1²)

  • 𝐶𝑜𝑣(Υ, 𝑍) = 0

  • 𝐶𝑜𝑣(Υ, 𝑍|Ⅹ = 𝑥) = -ρ√(1²)

These results imply that:

  • Ⅹ is correlated with both Υ and 𝑍.

  • Υ and 𝑍 are (marginally) uncorrelated.

  • Υ and 𝑍 are conditionally correlated given Ⅹ = 𝑥.

We may consider Ⅹ as the treatment effect in a study, Υ as the post-treatment biomarker change, and 𝑍 as the efficacy response. Under the above assumptions, Υ and 𝑍 are uncorrelated, although they are conditionally correlated given the specific treatment effect Ⅹ = 𝑥 in the study.

From a slightly different perspective, Equation (1) can be viewed as a linear regression model with two correlated predictors, Ⅹ and Υ. It can be shown that the multiple correlation coefficient between the response 𝑍 and the predictors (Ⅹ, Υ) is √(1²), which is identical to the pairwise correlation between 𝑍 and Ⅹ. This implies that including Υ in the model, in addition to Ⅹ, does not provide any more information beyond Ⅹ alone for predicting 𝑍. Therefore, Υ is entirely redundant in the model, which is not a surprise since Υ and 𝑍 are marginally uncorrelated. While the correlation between Ⅹ and Υ does not affect the model's predictive performance, it could lead to very unstable parameter estimates for Ⅹ and Υ themselves - an issue known as "variance inflation" due to collinearity in regression analysis. This is why adjustment for post-treatment covariates is strongly discouraged in regulatory guidelines.






 
 
 

Recent Posts

See All
A Taste of Optimal Designs

Suppose we want to use a two-pan balance (no bias) to weigh four different fruits: an apple, a pearl, an orange, and a banana (see...

 
 
 
Don't Compromise Control Groups

Randomized, double-blind, controlled trials (RCTs) are widely considered the gold standard for modern intervention-based clinical...

 
 
 

Comments


Andrew Yan

© 2025 by Andrew Yan

Powered and secured by Wix

Contact 

Ask me something

Thanks for submitting!

bottom of page