Chapter 11: Practical Methodology
High-level design process:
- Determine goals: error metric, target error value
- Build pipeline to estimate metrics
- Instrument systems, diagnose performance bottlenecks
- Incrementally improve algorithm
11.1 Performance Metrics
For the target error value, consider
- What errors have been achieved on previously published benchmark results?
- What is the maximum threshold error value the problem allows?
For the choice of target metric, consider
- Which mistakes are more costly than others (e.g. false positive vs false negative)?
- If the event is rare, perhaps use precision/recall instead of accuracy, or F-score for a single metric for both.
- If the model can estimate its confidence in a decision, coverage, i.e. how many inputs the model can correctly process, may be used.
11.2 Default Baseline Models
- Choose category/complexity of model based on structure/complexity of problem, respectively
- Start with Adam or a similar optimization algorithm
- Include some mild regularization, e.g. dropout, batch normalization. Early stopping should always be used!
11.3 Determining Whether to Gather More Data
Go through a checklist
- Training set performance is poor Improve model
- High capacity model and optimization algorithms failing Collect better quality data
- Once acceptable training set performance, measure test set performance. If poor Gather more data!
Note that, if gathering more data is expensive, it may be useful to add regularization, adjust hyperparameters, etc. before resorting to gathering data.
11.4 Selecting Hyperparameters
Note that, generally, the built-in defaults are good.
Manual Hyperparameter Tuning
The goal of manual hyperparameter tuning is to adjust effective capacity to match problem complexity. Effective capacity is constrained by 3 factors:
- Representational capacity ()
- Capability of learning algorithm to minimize cost function ()
- Degree of regularization ()
As usual, the optimal hyperparameters usually lie in some middle ground between underfitting and overfitting.
The learning rate is the most important hyperparameter, typically!
If training set error is higher than target error rate, increase effective capacity. Otherwise, if test error is too high, decrease effective capacity.
Automatic Hyperparameter Optimization Algorithms
Essentially, wrap a model with another model that learns the hyperparameters. Except, that outside model has its own hyperparameters... these hyperparameters may be easier to choose, however!
Grid Search
When there are hyperparameters, one may perform grid search, i.e. develop a set of possible values for each hyperparameter, and then iterate over/brute-force all combinations of hyperparameter values. With more hyperparameters, this quickly becomes intractable.
Random Search
Instead of brute-forcing all hyperparameter values, we define a marginal distribution for each hyperparameter. Over several iterations, we randomly sample from the distributions, and choose the best configuration.
Model-Based Hyperparameter Optimization
Similar to automatic hyperparameter optimization.
11.5 Debugging Strategies
To debug model performance
- Directly observe model on random examples
- Directly observe model on worst mistakes
To debug software implementation bugs
- If high training error, fit a small dataset (should always fit if no bug)
- Compare back-propagated derivatives to numerical derivatives
- Create histograms of activations and gradients during training