top of page
Search

The Expectation–Maximization (EM) Algorithm

  • Writer: Andrew Yan
    Andrew Yan
  • May 31, 2024
  • 2 min read

Updated: Jun 1, 2024

The EM algorithm is an iterative optimization method used for parameter estimation, such as maximum likelihood estimation (MLE), in the presence of hidden (missing or latent) variables. It is one of the most influential developments in modern statistics and has applications in many areas, including missing data, mixture models, and Bayesian statistics, etc.

Consider a statistical model that generates observed variables Υ and hidden variables Ζ, along with a probability density function 𝑝(𝑦,𝑧|𝜃), where 𝜃 ∈ Θ is the unknown parameter(s) of interest. The MLE of 𝜃 is determined by maximizing 𝓁(𝜃|𝑦) = 𝗅𝗈𝗀𝑝(𝑦|𝜃) = 𝗅𝗈𝗀𝑝(𝑦,𝑧|𝜃)𝑑𝑧, the marginal log-likelihood of the observed data Υ = 𝑦. This problem, however, is often intractable due to the presence of the hidden variables Ζ and the integral (sum for discrete Ζ) inside the log function. The EM algorithm circumvents this challenge by iteratively maximizing a tight lower bound (a local approximation) of 𝓁(𝜃|𝑦), which is much easier to optimize than 𝓁(𝜃|𝑦) itself. Specifically, let 𝜃⁽ⁿ⁾ be the estimate at the 𝑛th iteration, then we have


The function 𝙶(𝜃|𝜃⁽ⁿ⁾) gives a lower-bound (tight at 𝜃⁽ⁿ⁾) on the marginal log-likelihood 𝓁(𝜃|𝑦) that we’re trying to maximize. Moreover, Inequality (4) implies that an improved estimate of 𝜃 at iteration (𝑛+1) can be obtained by maximizing the function 𝑄(𝜃|𝜃⁽ⁿ⁾), the conditional expectation of the complete-data log-likelihood with respect to the hidden variables Ζ, given the observed data Υ = 𝑦 and the current estimate 𝜃⁽ⁿ⁾. The improvement in 𝓁(𝜃|𝑦) is at least as much as that in 𝑄(𝜃|𝜃⁽ⁿ⁾). The EM algorithm iterates between the E-step (expectation) and the M-step (maximization) as follows:

  • Initialization: 𝑛 = 0, 𝜃 = 𝜃⁽⁰⁾

  • E-step: compute 𝑝(𝑧|𝑦,𝜃⁽ⁿ⁾) and then the "𝑄-function" 𝑄(𝜃|𝜃⁽ⁿ⁾)

  • M-step: find a new 𝜃 = 𝜃⁽ⁿ⁺¹⁾ that maximizes 𝑄(𝜃|𝜃⁽ⁿ⁾), i.e., 𝜃⁽ⁿ⁺¹⁾ = argmax𝜃∈Θ 𝑄(𝜃|𝜃⁽ⁿ⁾)

The E-step and M-step repeat until convergence. The two key properties of the EM algorithm are:

  1. Monotonicity: 𝓁(𝜃⁽⁰⁾|𝑦) ≤ 𝓁(𝜃¹|𝑦) ≤ ... ≤ 𝓁(𝜃⁽ⁿ⁾|𝑦) ≤ 𝓁(𝜃⁽ⁿ⁺¹|𝑦)

  2. Convergence: In general, it converges to a stationary point, such as a local maximum.

The iterative process of the EM algorithm is illustrated in the following figure.


 
 
 

Recent Posts

See All
A Bridge Between Regression and ANOVA Thinking

In dose-response studies, the dose level can be treated either as a classification variable in an ANOVA-type model or as a continuous variable in a regression model. There is a fun little bridge betwe

 
 
 
What Defines a Good Clinical Statistician?

I recently attended a leadership training session where a colleague shared an experience involving a physician on a Data Monitoring Committee (DMC) who questioned the qualifications of the committee s

 
 
 

Comments


bottom of page