top of page
Search

Misconceptions About Linear Regression Assumptions

  • Writer: Andrew Yan
    Andrew Yan
  • 3 days ago
  • 2 min read

Updated: 2 days ago

I recently came across a LinkedIn post discussing the statistical assumptions of linear regression. Because the misconceptions in that post seem to be quite common, even among statisticians, I feel strongly compelled to write about them. The author claimed that the validity of linear regression depends on several key assumptions, namely:


  1. Linearity: The relationship between the dependent variable Y and the independent variable(s) X must be linear.

  2. Independence: The observations in the dataset should be independent of each other.

  3. Homoscedasticity: The variance of the residuals should be constant across all levels of the independent variable(s).

  4. Normality: The residuals (errors) of the model should be normally distributed.

  5. No multicollinearity: The independent variables should not be highly correlated.


Surprisingly, all of these assumptions, or at least the way they are interpreted, are either inaccurate or misleading. Here is why.


  1. Linearity: This is misstated by the author. Linear regression assumes linearity in model parameters, not necessarily a linear relationship between the dependent variable and the independent variable(s).

  2. Independence: Dependent observations can still yield valid inference if the dependence structure is properly taken into account.

  3. Homoscedasticity: Homoscedasticity is not required for unbiasedness. Ordinary least squares (OLS) estimators remain unbiased and consistent even under heteroscedasticity, although they may no longer be efficient.

  4. Normality: The importance of normality is overstated here. Normality is essential for hypothesis testing and confidence intervals, but not for estimation. Moreover, it is only required for exact finite-sample inference.

  5. No multicollinearity: Multicollinearity does not invalidate a linear regression model, and the Gauss-Markov theory still holds in the presence of multicollinearity. Although it can inflate the variance of parameter estimates, it does not inherently reduce the model’s predictive performance.


Even more surprising is that the author lists the title “Professor of Data Science and Machine Learning” on his LinkedIn profile, which appears to be inflated or, at the very least, presented in a misleading way to attract clients to his consulting business.




 
 
 

Recent Posts

See All
A Bridge Between Regression and ANOVA Thinking

In dose-response studies, the dose level can be treated either as a classification variable in an ANOVA-type model or as a continuous variable in a regression model. There is a fun little bridge betwe

 
 
 
What Defines a Good Clinical Statistician?

I recently attended a leadership training session where a colleague shared an experience involving a physician on a Data Monitoring Committee (DMC) who questioned the qualifications of the committee s

 
 
 

Comments


Andrew Yan

© 2026 by Andrew Yan

Powered and secured by Wix

Contact 

Ask me something

Thanks for submitting!

bottom of page