Hyperparameter Optimization (HPO) aims at finding a well-performing hyperparameter configuration of a given machine learning model on a dataset at hand, including the machine learning model, its hyperparameters, and other data processing steps. Thus, HPO frees the human expert from a tedious and error-prone hyperparameter tuning process. Below, we list some broad approaches to tackle HPO.
The loss landscape of an HPO problem is typically unknown (e.g., we need to solve a black-box function) and expensive to evaluate. Bayesian Optimization (BO) is designed as a global optimization strategy for expensive black-box functions. BO first estimates the shape of the target loss landscape with a surrogate model and then suggests the configuration to be evaluated in the next iteration. By trading off exploitation and exploration based on the surrogate model, it is well known for its sample efficiency.
- SMAC is a versatile tool for optimizing algorithm hyperparameters and implementing different surrogate models, acquisition functions, and model transformations.
- BOHB implements a variant of TPE as a BO approach.
- JES (Joint Entropy Search) is a new information theory-based acquisition function for BO.
- Extensions to Tree Parzen Estimators (TPEs)
- c-TPE shows how constraints can be applied to the popular TPE model in BO for HPO.
- MO-TPE meta learns a mult-objective TPE that won the Multi-objective HPO for Transformers competition in AutoML-conf 2022.
Combined Algorithms Selection and Hyperparameter Optimization (CASH)
An AutoML system needs to select not only the optimal hyperparameter configuration of a given model but also which model to be used. This problem can be regarded as a single HPO problem with a hierarchy configuration space, where the top-level hyperparameter decides which algorithm to choose and all other hyperparameters depend on this one. To deal with such complex and structured configuration spaces, we apply for example random forests as surrogate models in Bayesian Optimization.
- Auto-sklearn provides out-of-the-box supervised machine learning by modeling the search space as a CASH problem.
- Auto-Pytorch is a framework for automatically searching neural network architecture and its hyperparameters and also makes use of structured configuration space.
- SMAC implements a random forest as a surrogate model which can efficiently deal with structured search spaces.
The increasing data size and model complexity makes it even harder to find a reasonable configuration within a limited computational or time budget. Multi-Fidelity techniques in general approximate the true value of an expensive black box function with a cheap (maybe noisy) evaluation proxy and thus, increase the efficiency of HPO approaches substantially. For example, we can use a small subset of the dataset or train a DNN for only a few epochs.
- Auto-sklearn increased its efficiency in version 2.0 by using multi-fidelity optimization.
- Auto-Pytorch was designed as a multi-fidelity approach from the first moment and demonstrates how important it is for AutoDL.
- SMAC implements the approach of BOHB, by combining Hyperband as a multi-fidelity approach and Bayesian Optimization.
- DEHB replaces random search in Hyperband with Differential Evolution to show strong performance for multi-fidelity HPO.
Expert-prior aided HPO
HPO in practice can be made more efficient if the search is guided through often available expert’s domain knowledge and intuition. Expert priors can be interfaced with the HPO problem in multiple ways such as leveraging past evaluations of different hyperparameter settings or the expert’s explicit belief of good hyperparameters encoded as a prior distribution over the search space. We list certain works and packages that present methods to do so.
- NePS allows HPO for expensive machine-learning pipelines with algorithms that are able to leverage expert prior input.
- Hyperparameter Transfer Across Developer Adjustments shows how previously found good hyperparameter settings can be used for improved HPO efficiency.
- πBO presents a simple method to use an expert prior belief with BO.
- PriorBand allows an expert prior interface to Hyperband and other multi-fidelity algorithms including BO extensions.
Evaluation of AutoML and especially of HPO faces many challenges. For example, many repeated runs of HPO can be computationally expensive, the benchmarks can be fairly noisy, and it is often not clear which benchmarks are representative of typical HPO applications. Therefore, we develop HPO benchmark collections that improve reproducibility and decrease the computational burden on researchers.
- HPOBench (formerly HPOlib) is a benchmark collection for HPO benchmarks.
- ACLib is a benchmark collection for algorithm configuration
The book “AutoML: Methods, System, Challengers” provides a concise overview about HPO.