ML #4: Normal Equations
Normal Equation
In contrast to Gradient descent, a method that solves for the parameter theta iteratively, normal equations solve for theta analytically, through calculus methods. For a simple polynomial equation(where theta is a Real Number), one obtains the derivative of the equation by theta and setting the value of theta so that the derivative equals zero.
For a multidimensional vector theta, you obtain the partial derivative for each separate value of theta to solve for the inclusive vector.
pinv(X' * X) * X' * y
// Take the inverse of X transposed & X, multiply that matrix with the transpose of X, then y
// Use pinv() instead of inv() in octave to obtain answers where the matrix is degenerate
- Compare and contrast with Gradient Descent
- Gradient Descent
- Need to explicitly choose the value of alpha (learning rate) through a trial-and error process
- requires iterations
- works well for a large n (time complexity: O(kn^2))
- requires feature scaling for efficient iterations
- Normal equations
- solving for n is slow for large n, since the time complexity of solving for an inverse matrix is high
- time complexity of O(n^3)
- Gradient Descent
Non-invertability
Somtimes the matrix may not be invertible (These are called degenerate/singular matrices)
- Redundant features (features that are linearly dependent to one another)
- too many features (m < n)