Glossary

B

Bayes risk: Optimal (smallest) risk achievable by any predictor on a problem.

C

Classification error or loss (aka 0/1 error or loss): Used in classification to measure quality of predictor h(x): $\ell_{0/1}(h,y) = \textbf{1}(h(x) \ne y)$. We are often interested in 0-1 risk: $L_{0/1}(h) = \textbf{E}_(x,y)[\textbf{1}(h(x) \ne y)]$.
Conditional Independence: X is conditionally independent of Y given Z if $P(X=x \mid Y=y, Z=z)= P(X=x \mid Z=z), \; \forall x,y,z$ or equivalently, $P(X=x,Y=y\mid Z=z) = P(X=x\mid Z=z)P(Y=y\mid Z=z), \; \forall x,y,z$.

D

Decision stump: A decision tree with one internal node.

I

Independence: X is independent of Y if $P(X=x \mid Y=y)= P(X=x), \; \forall x,y$ or equivalently, $P(X=x,Y=y) = P(X=x)P(Y=y), \; \forall x,y$.

L

Linear separability

A dataset $\{\textbf{x}_i,y_i\}_{i=1}^{n}$ is linearly separable if $$\exists w_0, \textbf{w}, {\rm such} \; {\rm that}: \left\{ \begin{align} w_0 + \textbf{w}^\top \textbf{x}_i > 0 &\; {\rm if}\; y_i=1
w_0 + \textbf{w}^\top \textbf{x}_i < 0 &\; {\rm if}\; y_i=0 \end{align} \right. $$

Loss: A function used to measure error of a predictor h(x), $\ell(h,y)$, for example, squared error or 0-1 error.

R

Risk: Expected loss of a predictor h(x): $L(h) = \textbf{E}_(x,y)[\ell(h(x),y)]$.

S

Squared error or loss: Often used in regression to measure quality of predictor h(x): $\ell_{2}(h,y) = (h(x) - y)^2$. We are usually interested in expected squared error, or risk: $L_2(h) = \textbf{E}_(x,y)[(h(x) - y)^2]$.

Back to Lectures