A Cautionary Note on Correlation Analysis Using Post-Treatment Predictors

Andrew Yan
Aug 23, 2024
2 min read

Updated: May 4

When a clinical trial demonstrates positive efficacy results, there is often interest in conducting various post-hoc analyses, including exploring correlations between specific biomarkers and efficacy endpoints. Although controversial, using post-treatment biomarker data (e.g., change from baseline) in these analyses is a common practice endorsed by many medical professionals.

A statistical issue with such analyses is that correlations observed using post-treatment biomarker data are conditional on the specific study treatment and therefore may not apply in future trials with different treatments. This post presents a simple example to illustrate this phenomenon in statistics.

Assume Ⅹ and Υ are joint normal random variables with 𝐸(Ⅹ) = 𝐸(Υ) = 0, 𝑉𝑎𝑟(Ⅹ) = 𝑉𝑎𝑟(Υ) = 1 and a nontrivial correlation ρ (0 < ρ < 1), then the distribution of Υ given Ⅹ = 𝑥, denoted by Υ|Ⅹ = 𝑥, is a normal distribution with 𝐸(Υ|Ⅹ = 𝑥) = ρ𝑥 and 𝑉𝑎𝑟(Υ|Ⅹ = 𝑥) = 1-ρ². Let

then it is straightforward to verify the following covariances:

𝐶𝑜𝑣(Ⅹ, 𝑍) = √(1-ρ²)
𝐶𝑜𝑣(Υ, 𝑍) = 0
𝐶𝑜𝑣(Υ, 𝑍|Ⅹ = 𝑥) = -ρ√(1-ρ²)

These results imply that:

Ⅹ is correlated with both Υ and 𝑍.
Υ and 𝑍 are (marginally) uncorrelated.
Υ and 𝑍 are conditionally correlated given Ⅹ = 𝑥.

We may consider Ⅹ as the treatment effect in a study, Υ as the post-treatment biomarker change, and 𝑍 as the efficacy response. Under the above assumptions, Υ and 𝑍 are uncorrelated, although they are conditionally correlated given the specific treatment effect Ⅹ = 𝑥 in the study.

From a slightly different perspective, Equation (1) can be viewed as a linear regression model with two correlated predictors, Ⅹ and Υ. It can be shown that the multiple correlation coefficient between the response 𝑍 and the predictors (Ⅹ, Υ) is √(1-ρ²), which is identical to the pairwise correlation between 𝑍 and Ⅹ. This implies that including Υ in the model, in addition to Ⅹ, does not provide any more information beyond Ⅹ alone for predicting 𝑍. Therefore, Υ is entirely redundant in the model, which is not a surprise since Υ and 𝑍 are marginally uncorrelated. While the correlation between Ⅹ and Υ does not affect the model's predictive performance, it could lead to very unstable parameter estimates for Ⅹ and Υ themselves - an issue known as "variance inflation" due to collinearity in regression analysis. This is why adjustment for post-treatment covariates is strongly discouraged in regulatory guidelines.

A Cautionary Note on Correlation Analysis Using Post-Treatment Predictors

Recent Posts

Comments

Contact