AutoML in the Age of Pretrained Models

With the ever-growing sizes of models, the size and type of data used for training and the increasing need for large compute resources require a realignment of perspectives for AutoML. Given the impressive performance of large pretrained models, lack of easy access to compute and data for training them, and infeasible costs for hyperparameter tuning for training these pretrained models and fine-tuning them for downstream tasks, requires a realignment of perspective on AutoML.

This presents 2 broad directions of interfacing AutoML and large model training.

AutoML for Pretrained models

Under this view, we look at leveraging existing or developing new methods for efficient HPO:

  • ZAP-HPO: Meta-learns over a meta dataset of pretrained models with their fine-tuning hyperparameters for zero-shot hyperparameter prediction on an unseen test task.
  • Quick-Tune: proposes an efficient method to select an appropriate model and its finetuning hyperparameters through a meta-learned multi-fidelity performance predictor.
  • PriorBand: Allows expert prior interface to HyperBand for efficient, robust multi-fidelity HPO under short compute budgets.

Pretrained models for AutoML

We also look at leveraging the strong performance of pretrained models for various downstream applications in AutoML:

  • TabPFN: We utilized the power of in-context learning, a specific form of meta-learning, to pretrain a foundational model designed specifically for tabular data. This transformer-based model efficiently processes the tabular training dataset and swiftly generates predictions for the test data, all within a single forward pass. Notably, this model seamlessly integrates with the familiar scikit-learn interface, making it a convenient and effective replacement for traditional AutoML systems when working with tabular data.
  • PFNs4BO: We utilize a pre-trained model to behave like a surrogate through in-context learning. This allows us to use novel priors for BO and build extensions that were not previously possible.
  • LC-PFN: Uses a transformer pretrained on synthetic data for a Bayesian Learning Curve Exptrapolation applied for early stopping in multi-fidelity HPO.
  • CAAFE: We use automated prompting with GPT-4 for feature engineering on tabular datasets. We tell GPT-4 about a tabular dataset and ask it what operations it would perform on the dataset before feeding it to a standard ML algorithm. We build a loop out of this, feeding back the change in cross-validation performance of the last operation performed. CAAFE comes up with very interesting approaches to improve performance, e.g. it splits string attributes into multiple categorical features with lower cardinality or bins ages into relevant subgroups.