CIS520 Machine Learning | Lectures / Penalized Regression PCA

PCA and can also be reframed as a type of linear regression: For each {$i$}, denote by {$z_i = u_iD_{ii}$} the ith principal component. Consider a positive λ and the ridge estimates {$\hat{w}^{ridge}$} given by

{$\hat{w_i}^{ridge} = argmin_w ~~(|| z_i - Xw||^2 +\lambda ||w||^2)$}

all norms are, of course {$L_2$}. Note what we are doing here: instead of predicting the y’s, we are predicting the coefficients of each right singular vector.

Let {$\hat{v} = \hat{w_i}^{ridge}/ ||\hat{w_i}^{ridge}||$} then {$\hat{v} =V_i$}.

The weights that best do this, when normalized, give back the original singular vectors. (Yes, is a bit recursive.) Note also that the resulting {$\hat{v}$} does not depend on {$\lambda$}.

How would you use this approach to get sparse PCA? Details here

Back to Lectures