top of page
Search

Misconceptions About Linear Regression Assumptions

  • Writer: Andrew Yan
    Andrew Yan
  • Apr 21
  • 2 min read

Updated: May 13

I recently came across a LinkedIn post discussing the statistical assumptions of linear regression. Because the misconceptions in that post seem to be quite common, even among statisticians, I feel strongly compelled to write about them. The author claimed that the validity of linear regression depends on several key assumptions, namely:


  1. Linearity: The relationship between the dependent variable Y and the independent variable(s) X must be linear.

  2. Independence: The observations in the dataset should be independent of each other.

  3. Homoscedasticity: The variance of the residuals should be constant across all levels of the independent variable(s).

  4. Normality: The residuals (errors) of the model should be normally distributed.

  5. No multicollinearity: The independent variables should not be highly correlated.


Surprisingly, all of these assumptions, or at least the way they are interpreted, are either inaccurate or misleading. Here is why.


  1. Linearity: This is misstated by the author. Linear regression assumes linearity in model parameters, not necessarily a linear relationship between the dependent variable and the independent variable(s).

  2. Independence: Dependent observations can still yield valid inference if the dependence structure is properly taken into account.

  3. Homoscedasticity: Homoscedasticity is not required for unbiasedness. Ordinary least squares (OLS) estimators remain unbiased and consistent even under heteroscedasticity, although they may no longer be efficient.

  4. Normality: The importance of normality is overstated here. Normality is essential for hypothesis testing and confidence intervals, but not for estimation. Moreover, it is only required for exact finite-sample inference.

  5. No multicollinearity: Multicollinearity does not invalidate a linear regression model, and the Gauss-Markov theory still holds in the presence of multicollinearity. Although it can inflate the variance of parameter estimates, it does not inherently reduce the model’s predictive performance.






 
 
 

Recent Posts

See All
The Propensity Score Controversy

Propensity score (PS) methods are probably the most widely used statistical tools for causal inference in observational studies. In medical research, epidemiology, economics, political science, and, i

 
 
 
Randomization Is Not Just About Balance

Randomization in clinical trials is often perceived as a tool to “balance covariates” between treatment groups. While this view is correct, it is incomplete and somewhat misleading. Randomization is n

 
 
 

Comments


Andrew Yan

© 2026 by Andrew Yan

Powered and secured by Wix

Contact 

Ask me something

Thanks for submitting!

bottom of page