Maximum Likelihood Estimation (MLE)

Machine Learning, Econometrics

Fill out this field
Please enter a valid email address.
Fill out this field

What is Maximum likelihood estimation?

Maximum likelihood estimation (MLE) is a method used to estimate the parameters of a statistical model by maximizing the likelihood function. The basic idea behind MLE is to find the parameter values that make the observed data most probable, given the assumed probability distribution of the model.

As a leading machine learning consultancy in Dubai, we have experience with using MLE in linear regression models as well as in logistic regression, neural networks and more complex machine learning models.

Let’s consider linear regression. In traditional linear regression, we aim to fit a line to the data by minimizing the sum of squared residuals. However, in MLE-based linear regression, we approach the problem differently. We start by assuming a probability distribution for the errors (often normal distribution) and then seek parameter values that maximize the likelihood of observing the actual data given these assumptions.

The likelihood function represents the probability of observing the data given the parameters of the model.

How to find maximum likelihood estimates?

Let’s consider a simple linear regression model:

Y_i = \alpha + \beta X_i + \epsilon_i

where:
Y_i is the observed response variable for observation i .
X_iis the predictor variable for observation i .
\alpha and \beta are the parameters to be estimated.
\epsilon_i  is the error term assumed to be normally distributed with mean zero and constant variance \sigma^2 .

The likelihood function for a single observation i is the probability density function (PDF) of the normal distribution:

f(Y_i | X_i, \alpha, \beta) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp \left( -\frac{(Y_i - \alpha - \beta X_i)^2}{2\sigma^2} \right)

The likelihood function for the entire dataset of n observations is the product of the likelihoods for each individual observation:

L(\alpha, \beta | \mathbf{Y}, \mathbf{X}) = \prod_{i=1}^{n} f(Y_i | X_i, \alpha, \beta)

Taking the logarithm of the likelihood function (log-likelihood) simplifies the computation and does not change the location of the maximum:

\ell(\alpha, \beta | \mathbf{Y}, \mathbf{X}) = \log L(\alpha, \beta | \mathbf{Y}, \mathbf{X}) = \sum_{i=1}^{n} \log f(Y_i | X_i, \alpha, \beta)

Maximizing the log-likelihood function with respect to the parameters α and β is equivalent to finding the values of α and β that minimize the negative log-likelihood function:

\hat{\alpha}, \hat{\beta} = \arg \min_{\alpha, \beta} \left( -\ell(\alpha, \beta | \mathbf{Y}, \mathbf{X}) \right)

In practice, numerical optimization techniques such as gradient descent, Newton-Raphson method, or other optimization algorithms are used to find the values of \alpha and \beta that minimize the negative log-likelihood function, providing estimates for the parameters of the linear regression model.

Why use maximum likelihood estimation?

Maximum Likelihood Estimation (MLE) is often preferred over Ordinary Least Squares (OLS) in scenarios where the assumptions of OLS are violated or when more flexibility is required in modeling the error structure. For example, when the errors are not normally distributed or when the variance of the errors is not constant across observations, OLS may provide biased or inefficient parameter estimates. In such cases, MLE offers a robust alternative by allowing for more flexible error distributions and accommodating heterogeneous error structures.

Additionally, MLE is particularly advantageous in the context of nonlinear models where OLS cannot be directly applied. By maximizing the likelihood of observing the data given the model assumptions, MLE provides a versatile framework for parameter estimation that can adapt to a wide range of data distributions and modeling scenarios. Therefore, researchers and practitioners often turn to MLE when seeking more robust and flexible methods for parameter estimation, especially in complex modeling tasks where the assumptions of OLS may not hold.

Overall, Maximum Likelihood Estimation (MLE) is a powerful parameter estimation tool that is used not only in Linear Regression models but also in other models including complex deep learning models. As Econometrics & Machine Learning consultants, we at Marketways Arabia regularly use MLE models for our client projects. Hope you found this to be a good intro or refresher to MLE models.