The Expectation–Maximization (EM) Algorithm
- Andrew Yan

- May 31, 2024
- 2 min read
Updated: Jun 1, 2024
The EM algorithm is an iterative optimization method used for parameter estimation, such as maximum likelihood estimation (MLE), in the presence of hidden (missing or latent) variables. It is one of the most influential developments in modern statistics and has applications in many areas, including missing data, mixture models, and Bayesian statistics, etc.
Consider a statistical model that generates observed variables Υ and hidden variables Ζ, along with a probability density function 𝑝(𝑦,𝑧|𝜃), where 𝜃 ∈ Θ is the unknown parameter(s) of interest. The MLE of 𝜃 is determined by maximizing 𝓁(𝜃|𝑦) = 𝗅𝗈𝗀𝑝(𝑦|𝜃) = 𝗅𝗈𝗀∫𝑝(𝑦,𝑧|𝜃)𝑑𝑧, the marginal log-likelihood of the observed data Υ = 𝑦. This problem, however, is often intractable due to the presence of the hidden variables Ζ and the integral (sum for discrete Ζ) inside the log function. The EM algorithm circumvents this challenge by iteratively maximizing a tight lower bound (a local approximation) of 𝓁(𝜃|𝑦), which is much easier to optimize than 𝓁(𝜃|𝑦) itself. Specifically, let 𝜃⁽ⁿ⁾ be the estimate at the 𝑛th iteration, then we have

The function 𝙶(𝜃|𝜃⁽ⁿ⁾) gives a lower-bound (tight at 𝜃⁽ⁿ⁾) on the marginal log-likelihood 𝓁(𝜃|𝑦) that we’re trying to maximize. Moreover, Inequality (4) implies that an improved estimate of 𝜃 at iteration (𝑛+1) can be obtained by maximizing the function 𝑄(𝜃|𝜃⁽ⁿ⁾), the conditional expectation of the complete-data log-likelihood with respect to the hidden variables Ζ, given the observed data Υ = 𝑦 and the current estimate 𝜃⁽ⁿ⁾. The improvement in 𝓁(𝜃|𝑦) is at least as much as that in 𝑄(𝜃|𝜃⁽ⁿ⁾). The EM algorithm iterates between the E-step (expectation) and the M-step (maximization) as follows:
Initialization: 𝑛 = 0, 𝜃 = 𝜃⁽⁰⁾
E-step: compute 𝑝(𝑧|𝑦,𝜃⁽ⁿ⁾) and then the "𝑄-function" 𝑄(𝜃|𝜃⁽ⁿ⁾)
M-step: find a new 𝜃 = 𝜃⁽ⁿ⁺¹⁾ that maximizes 𝑄(𝜃|𝜃⁽ⁿ⁾), i.e., 𝜃⁽ⁿ⁺¹⁾ = argmax𝜃∈Θ 𝑄(𝜃|𝜃⁽ⁿ⁾)
The E-step and M-step repeat until convergence. The two key properties of the EM algorithm are:
Monotonicity: 𝓁(𝜃⁽⁰⁾|𝑦) ≤ 𝓁(𝜃⁽¹⁾|𝑦) ≤ ... ≤ 𝓁(𝜃⁽ⁿ⁾|𝑦) ≤ 𝓁(𝜃⁽ⁿ⁺¹⁾|𝑦)
Convergence: In general, it converges to a stationary point, such as a local maximum.
The iterative process of the EM algorithm is illustrated in the following figure.

Comments