Linear Regression with multiple variables

Multiple variables (features) come together to output a single output variable

Need an update in notations:

  • n : Number of features
  • x(i) : Features of the ith training example (an n-dimensional vector)
  • x_j(i): Value of feature j in the ith training example

An hypothesis with n features is expressed by (Theta transposed) * (X) where

  • Theta is the n+1 dimensional vector comprised of the parameters
  • X is the n+1 dimensional vector comprised of the features
    • for notation purposes, x_0 = 1, so that theta_0 is kept and preserved

Gradient descent for multivariate regression

Similar to univariate gradient descent, but there is a nested sum variable to stand for the hypothesis that now allows for multiple variable slots.

The repeat until convergence step is also preserved, except to iterate for all n features of the set

  • add cost function equations here

Methods to facilitate gradient descent in multivariate regression

  • Feature Scaling
    • Make sure the features are on a similar scale when using gradient descent
    • Skewed models oscillate, which lead to inefficient convergence
    • match the features in an approximately -1 <= x <= 1 scale
      • rule of thumb may vary
  • Mean Normalization
    • allow the features to have approximately zero mean to account for variation scales
    • divide by range or standard deviation

Debugging Gradient Descent

  • Automatic Convergence test
    • conclude for convergence if the change in cost function value is less than an arbitrary epsilon?
    • basically checking for convergence

For a sufficiently small learning rate (alpha), the cost function value will decrease on every iteration.

  • if the learning rate is too small, the convergence may be too slow
  • if too large, may diverge
    • if the cost function value is diverging over iteration, one must decrease the learning rate to ensure convergence

Polynomial regression

  • Add arbitrary features that are polynomial values of the original feature value
  • must consider for feature scaling and mean normalization since scales multiply in a geometric scale