Maximum a Posteriori (MAP) estimation

Maximum a Posteriori (MAP) estimation is a popular method in Bayesian statistics to estimate an unknown quantity.

MAP Estimation

MAP estimation aims to find the mode of the posterior distribution. Given a set of observed data $ \mathbf{X} = {x_1, x_2, \ldots, x_n} $ and a parameter $ \theta $ to be estimated, the MAP estimate is defined as:

$$ \hat{\theta}{\text{MAP}} = \arg\max{\theta} P(\theta \mid \mathbf{X}) $$

Using Bayes’ theorem, the posterior distribution $ P(\theta \mid \mathbf{X}) $ can be expressed as:

$$ P(\theta \mid \mathbf{X}) = \frac{P(\mathbf{X} \mid \theta) P(\theta)}{P(\mathbf{X})} $$

Since $ P(\mathbf{X}) $ is a constant with respect to $ \theta $, the MAP estimate simplifies to:

$$ \hat{\theta}{\text{MAP}} = \arg\max{\theta} P(\mathbf{X} \mid \theta) P(\theta) $$

Thus, we need to maximize the product of the likelihood $ P(\mathbf{X} \mid \theta) $ and the prior $ P(\theta) $.

Detailed Proof

Let’s consider the logarithm of the posterior distribution for computational convenience. The logarithm is a monotonic function, so the value of $ \theta $ that maximizes the posterior will also maximize the log-posterior:

$$\hat{\theta}{\text{MAP}} = \arg\max{\theta} \log P(\mathbf{X} \mid \theta) + \log P(\theta)$$

Likelihood $ P(\mathbf{X} \mid \theta) $: The likelihood function is the probability of the observed data given the parameter $ \theta $.
$$ P(\mathbf{X} \mid \theta) = \prod_{i=1}^{n} P(x_i \mid \theta)
$$ Taking the logarithm, we get:
$$\log P(\mathbf{X} \mid \theta) = \sum_{i=1}^{n} \log P(x_i \mid \theta)
$$
Prior $ P(\theta) $: The prior distribution reflects our beliefs about $ \theta $ before observing the data. Common choices for priors include uniform, Gaussian, or other distributions depending on the problem context.
Posterior Maximization: Combining the log-likelihood and the log-prior, we have: $$ \hat{\theta}{\text{MAP}} = \arg\max{\theta} \left( \sum_{i=1}^{n} \log P(x_i \mid \theta) + \log P(\theta) \right)$$ This equation shows that the MAP estimate balances the fit of the model to the data (through the likelihood) with the prior belief about $ \theta $.

Example

Suppose we have a Gaussian likelihood and a Gaussian prior. The observed data $ x_1, x_2, \ldots, x_n $ are assumed to be normally distributed with mean $ \mu $ and known variance $ \sigma^2 $:

$$ P(x_i \mid \mu) = \frac{1}{\sqrt{2 \pi \sigma^2}} \exp \left( -\frac{(x_i – \mu)^2}{2 \sigma^2} \right) $$

If the prior distribution for $ \mu $ is also Gaussian with mean $ \mu_0 $ and variance $ \tau^2 $:

$$ P(\mu) = \frac{1}{\sqrt{2 \pi \tau^2}} \exp \left( -\frac{(\mu – \mu_0)^2}{2 \tau^2} \right) $$

The log-posterior is then:

$$ \log P(\mathbf{X} \mid \mu) + \log P(\mu) = -\frac{n}{2} \log(2 \pi \sigma^2) – \frac{1}{2 \sigma^2} \sum_{i=1}^{n} (x_i – \mu)^2 – \frac{1}{2} \log(2 \pi \tau^2) – \frac{1}{2 \tau^2} (\mu – \mu_0)^2 $$

Ignoring constant terms and simplifying, we get:

$$ \hat{\mu}{\text{MAP}} = \arg\max{\mu} \left( -\frac{1}{2 \sigma^2} \sum_{i=1}^{n} (x_i – \mu)^2 – \frac{1}{2 \tau^2} (\mu – \mu_0)^2 \right) $$

This can be solved by taking the derivative with respect to $ \mu $, setting it to zero, and solving for $ \mu $:

$$ \frac{\partial}{\partial \mu} \left( -\frac{1}{2 \sigma^2} \sum_{i=1}^{n} (x_i – \mu)^2 – \frac{1}{2 \tau^2} (\mu – \mu_0)^2 \right) = 0 $$

$$ \frac{1}{\sigma^2} \sum_{i=1}^{n} (x_i – \mu) + \frac{1}{\tau^2} (\mu – \mu_0) = 0 $$

$$ \mu \left( \frac{n}{\sigma^2} + \frac{1}{\tau^2} \right) = \frac{1}{\sigma^2} \sum_{i=1}^{n} x_i + \frac{\mu_0}{\tau^2} $$

$$ \hat{\mu}{\text{MAP}} = \frac{\frac{1}{\sigma^2} \sum{i=1}^{n} x_i + \frac{\mu_0}{\tau^2}}{\frac{n}{\sigma^2} + \frac{1}{\tau^2}} $$

This result shows that the MAP estimate is a weighted average of the sample mean and the prior mean, with weights inversely proportional to their variances.

Maximum a Posteriori (MAP) estimation

MAP Estimation

Detailed Proof

Example

Maximum Likelihood Estimation (MLE)

Maxwell's Equations

Linear Algebra

Archives

Categories