Normal Equation

In contrast to Gradient descent, a method that solves for the parameter theta iteratively, normal equations solve for theta analytically, through calculus methods. For a simple polynomial equation(where theta is a Real Number), one obtains the derivative of the equation by theta and setting the value of theta so that the derivative equals zero.

For a multidimensional vector theta, you obtain the partial derivative for each separate value of theta to solve for the inclusive vector.

pinv(X' * X) * X' * y

// Take the inverse of X transposed & X, multiply that matrix with the transpose of X, then y
// Use pinv() instead of inv() in octave to obtain answers where the matrix is degenerate
  • Compare and contrast with Gradient Descent
    • Gradient Descent
      • Need to explicitly choose the value of alpha (learning rate) through a trial-and error process
      • requires iterations
      • works well for a large n (time complexity: O(kn^2))
      • requires feature scaling for efficient iterations
    • Normal equations
      • solving for n is slow for large n, since the time complexity of solving for an inverse matrix is high
      • time complexity of O(n^3)


Somtimes the matrix may not be invertible (These are called degenerate/singular matrices)

  • Redundant features (features that are linearly dependent to one another)
  • too many features (m < n)