Other

What we didn't (much) cover

Hypothesis testing
- p-values: the p-value is the probability of obtaining a test statistic at least as extreme as the one that was actually observed, assuming that the null hypothesis is true
  - Bonferroni correction: divide the p-value by the number of hypotheses tested
- Confidence intervals, standard error estimates
Randomized search for MLE/MAP
- Gibbs Sampling /MCMC (alternative to EM)
Metric learning
- {$x^\top A x$}
Domain adaptation
- adapt model from one distribution {$p(x,y)$} to another
Structured data
- {$x$} can be a graph: use graph kernels, graph Laplacians
- {$y$} can be a structure (e.g. a parse tree)
Meta-learning (auto-ML)
- Search over hyper-parameters, network architectures, ...

We only touched on

Multitask learning
- simultaneously predict multiple {$y$}s from the same features. (e.g. CCA)
Reinforcement learning
- Choose sequence of actions (or policy) to maximize expected reward
- Markov Decision Processes (MDP, POMDP)
Time series in deep learning
- GNNs, LSTM's generalize HMMs