Locally Weighted Regression
Our final method combines advantages of parametric methods with non-parametric. The idea is to fit a regression model locally, weighting examples by the kernel K.
Locally Weighted Regression Algorithm
- Given training data {$D=\{\mathbf{x}_i,y_i\}$}, Kernel function {$K(\cdot,\cdot)$} and input {$\mathbf{x}$}
- Fit weighted regression {$\hat{\mathbf{w}}(\mathbf{x}) = \arg\min_w \sum_{i=1}^{n} K(\mathbf{x}, \mathbf{x}_i) (\mathbf{w}^\top \mathbf{x}_i - y_{i})^2$}
- Return regression prediction {$\hat{\mathbf{w}}(\mathbf{x})^\top \mathbf{x}$}.
Note that we can do the same for classification, fitting a locally weighted logistic regression:
Locally Weighted Logistic Regression Algorithm
- Given training data {$D=\{\mathbf{x}_i,y_i\}$}, Kernel function {$K(\cdot,\cdot)$} and input {$\mathbf{x}$}
- Fit weighted logistic regression {$\hat{\mathbf{w}}(\mathbf{x}) = \arg\min_w \sum_{i=1}^{n} K(\mathbf{x}, \mathbf{x}_i) \log(1+\exp\{-y_i\mathbf{w}^\top \mathbf{x}_i\})$}
- Return logistic regression prediction {$sign(\hat{\mathbf{w}}(\mathbf{x})^\top \mathbf{x})$}.
The difference between regular linear regression and locally weighted linear regression can be visualized as follows:
Linear regression uses the same parameters for all queries and all errors affect the learned linear prediction. Locally weighted regression learns a linear prediction that is only good locally, since far away errors do not weigh much in comparison to local ones.
Here’s a result of using good kernel width on our regression examples (1/32, 1/32 and 1/16 of x-axis width, respectively):
Back to Lectures