# B

Bayes risk
Optimal (smallest) risk achievable by any predictor on a problem.

# C

Classification error or loss (aka 0/1 error or loss)
Used in classification to measure quality of predictor h(x): $\ell_{0/1}(h,y) = \textbf{1}(h(x) \ne y)$. We are often interested in 0-1 risk: $L_{0/1}(h) = \textbf{E}_(x,y)[\textbf{1}(h(x) \ne y)]$.
Conditional Independence
X is conditionally independent of Y given Z if $P(X=x \mid Y=y, Z=z)= P(X=x \mid Z=z), \; \forall x,y,z$ or equivalently, $P(X=x,Y=y\mid Z=z) = P(X=x\mid Z=z)P(Y=y\mid Z=z), \; \forall x,y,z$.

# D

Decision stump
A decision tree with one internal node.

# I

Independence
X is independent of Y if $P(X=x \mid Y=y)= P(X=x), \; \forall x,y$ or equivalently, $P(X=x,Y=y) = P(X=x)P(Y=y), \; \forall x,y$.

# L

Linear separability

A dataset $\{\textbf{x}_i,y_i\}_{i=1}^{n}$ is linearly separable if \exists w_0, \textbf{w}, {\rm such} \; {\rm that}: \left\{ \begin{align} w_0 + \textbf{w}^\top \textbf{x}_i > 0 &\; {\rm if}\; y_i=1 w_0 + \textbf{w}^\top \textbf{x}_i < 0 &\; {\rm if}\; y_i=0 \end{align} \right.

Loss
A function used to measure error of a predictor h(x), $\ell(h,y)$, for example, squared error or 0-1 error.

# R

Risk
Expected loss of a predictor h(x): $L(h) = \textbf{E}_(x,y)[\ell(h(x),y)]$.

# S

Squared error or loss
Often used in regression to measure quality of predictor h(x): $\ell_{2}(h,y) = (h(x) - y)^2$. We are usually interested in expected squared error, or risk: $L_2(h) = \textbf{E}_(x,y)[(h(x) - y)^2]$.