ML #4: Normal Equations

Normal Equation

In contrast to Gradient descent, a method that solves for the parameter theta iteratively, normal equations solve for theta analytically, through calculus methods. For a simple polynomial equation(where theta is a Real Number), one obtains the derivative of the equation by theta and setting the value of theta so that the derivative equals zero.

For a multidimensional vector theta, you obtain the partial derivative for each separate value of theta to solve for the inclusive vector.

pinv(X' * X) * X' * y

// Take the inverse of X transposed & X, multiply that matrix with the transpose of X, then y
// Use pinv() instead of inv() in octave to obtain answers where the matrix is degenerate

Compare and contrast with Gradient Descent
- Gradient Descent
  - Need to explicitly choose the value of alpha (learning rate) through a trial-and error process
  - requires iterations
  - works well for a large n (time complexity: O(kn^2))
  - requires feature scaling for efficient iterations
- Normal equations
  - solving for n is slow for large n, since the time complexity of solving for an inverse matrix is high
  - time complexity of O(n^3)

Non-invertability

Somtimes the matrix may not be invertible (These are called degenerate/singular matrices)

Redundant features (features that are linearly dependent to one another)
too many features (m < n)