Maximum Likelihood Estimation (MLE) is a method used to estimate the parameters of a statistical model. The goal of MLE is to find the parameter values that maximize the likelihood function.
Likelihood Function
Suppose we have a set of observations \( \{x_1, x_2, \ldots, x_n\} \) that are independently and identically distributed (i.i.d) according to some probability distribution with parameter \( \theta \). The likelihood function \( L(\theta) \) is defined as the joint probability (or probability density) of the observed data as a function of \( \theta \):
$$ L(\theta) = P(x_1, x_2, \ldots, x_n \mid \theta) = \prod_{i=1}^n P(x_i \mid \theta) $$
Log-Likelihood Function
In practice, it is often easier to work with the natural logarithm of the likelihood function, known as the log-likelihood function:
$$ \ell(\theta) = \log L(\theta) = \log \left( \prod_{i=1}^n P(x_i \mid \theta) \right) = \sum_{i=1}^n \log P(x_i \mid \theta) $$
Maximum Likelihood Estimate
The maximum likelihood estimate (MLE) of the parameter \( \theta \) is the value that maximizes the log-likelihood function:
$$ \hat{\theta} = \arg \max_\theta \ell(\theta) $$
Example: Normal Distribution
Consider a set of i.i.d observations \( \{x_1, x_2, \ldots, x_n\} \) from a normal distribution with unknown mean \( \mu \) and known variance \( \sigma^2 \). The probability density function is:
$$ f(x \mid \mu) = \frac{1}{\sqrt{2 \pi \sigma^2}} \exp \left( -\frac{(x – \mu)^2}{2\sigma^2} \right) $$
The log-likelihood function is:
$$ \ell(\mu) = \sum_{i=1}^n \log f(x_i \mid \mu) = -\frac{n}{2} \log (2 \pi \sigma^2) – \frac{1}{2\sigma^2} \sum_{i=1}^n (x_i – \mu)^2 $$
To find the MLE, we take the derivative of \( \ell(\mu) \) with respect to \( \mu \) and set it to zero:
$$ \frac{d\ell(\mu)}{d\mu} = \frac{1}{\sigma^2} \sum_{i=1}^n (x_i – \mu) = 0 $$
Solving for \( \mu \), we get the MLE:
$$ \hat{\mu} = \frac{1}{n} \sum_{i=1}^n x_i $$
Properties of MLE
Under certain regularity conditions, the MLE has several desirable properties:
Consistency:
The MLE converges in probability to the true parameter value as the sample size increases.
Asymptotic Normality:
The distribution of the MLE approaches a normal distribution as the sample size increases.
Efficiency:
The MLE achieves the Cramér-Rao lower bound, meaning it has the lowest possible variance among unbiased estimators.