Randomization Is Not Just About Balance
- Andrew Yan

- May 2
- 3 min read
Randomization in clinical trials is often perceived as a tool to “balance covariates” between treatment groups. While this view is correct, it is incomplete and somewhat misleading. Randomization is not primarily about balance, but rather, it provides a design-based foundation for valid causal inference.
Where Does the Probability Come From?
In a randomized clinical trial (RCT), probability is not introduced through modeling assumptions such as normality, nor is it fundamentally about sampling from a target population. Instead, probability is introduced by design.
Consider the potential outcomes framework. For each subject 𝑖, let
𝑌ᵢ (1): the outcome under treatment
𝑌ᵢ (0): the outcome under control
Under the sharp null hypothesis (pioneered by Ronald A. Fisher) of no treatment effect, 𝑌ᵢ (1) = 𝑌ᵢ (0), ∀𝑖.
In this framework, the potential outcomes are fixed, reflecting a finite-population view, and the only source of randomness is the treatment assignment. As a result, the null distribution of any test statistic is induced entirely by the randomization mechanism.
Randomization Creates the Reference Distribution
Because treatment assignment is randomized, we can construct a randomization distribution as follows:
Enumerate, or approximate, all possible treatment assignments consistent with the design.
Compute the test statistic (e.g., difference in means) for each assignment.
Compare the observed statistic to this distribution.
This yields exact finite-sample inference with:
no distributional assumptions,
no large-sample approximations, and
no reliance on variance estimation.
This is the foundation of Fisher's randomization test and permutation-based inference in randomized experiments. In other words, we do not need to assume a bell-shaped curve or impose any statistical model.
Classical Statistical Tests Are Only Approximations
Procedures such as t-tests, Wald tests and log-rank tests are often presented as the basis for inference. In RCTs, however, these tests are better viewed as large-sample approximations to the underlying randomization-based inference. Their validity ultimately traces back to the randomization mechanism, not to assumptions such as normality or proportional hazards.
Balance Is a Consequence, Not the Goal of Randomization
Randomization is often motivated by its ability to balance covariates. This is, at best, a secondary consequence.
Balance holds in expectation, not necessarily in every realized sample.
Chance imbalances are inevitable in finite samples.
The observed imbalances do not invalidate the statistical analysis.
Covariate adjustment, such as ANCOVA, in RCTs is therefore not required for validity, but rather, it is primarily used to improve statistical efficiency (FDA, 2023).
Protection Against the Unknown
A central strength of randomization is that it protects against unmeasured confounding and model misspecification. Statistical adjustment can only address what is measured and appropriately modeled, while randomization requires neither. In other words, you can only adjust for what you know and measure. Randomization helps protect against what you do not know.
Randomization vs. Modeling
It is important to distinguish between two different paradigms: randomized trials and observational studies.
Randomized Trials:
Inference justified by design
Minimum assumptions
Robust to model misspecification
Observational Studies:
Inference justified by assumptions
Requires no unmeasured confounding
Sensitive to model specifications
Statistical models are optional in randomized trials but indispensable in observational studies.
Randomization Creates the Mechanism for Hypothesis Testing
Randomization does more than reduce bias - it creates the mechanism for hypothesis testing. Under the null hypothesis, outcomes are fixed, and the test statistic varies only through the random treatment assignment. This provides a well-defined reference distribution for p-values and confidence intervals. Without randomization, such a distribution is not available without additional assumptions. In this sense, randomization provides a principled basis for assessing whether the observed treatment effect could plausibly have arisen by chance.
Practical Considerations
In practice, randomization is often implemented with additional structure, such as blocking and/or stratification. These approaches improve finite-sample balance and can therefore increase statistical efficiency. However, they do not change the fundamental role of randomization.
Regulatory Perspectives
Regulatory agencies such as the FDA and EMA place strong emphasis on randomized evidence. This reflects the fact that:
Causal interpretation is design-based.
Assumptions are minimized.
Results are more robust and reproducible.
In conclusion, the fundamental role of randomization is to create the basis for valid causal inference.
References
FDA (2023). FDA Guidance Documents: Adjusting for Covariates in Randomized Clinical Trials for Drugs and Biological Products.
Comments