A hyperparameter optimization library for reproducible research


Recent algorithmic advances and hardware innovations have made it possible to train deep neural networks with billions of parameters. The networks’ performance, however, depends in part on hyperparameters such as the learning rate and the number and width of network layers.

Syne Tune provides implementations of a broad range of synchronous and asynchronous HPO algorithms and three execution backends with a general interface.

Tuning hyperparameters is difficult and time-consuming, even for experts, and criteria like latency or cost often play a role in deciding the winning hyperparameter configuration. To make latest deep-learning technology practical for nonexperts, it is essential to automate hyperparameter tuning.

At the first International Conference on Automated Machine Learning (AutoML), we presented Syne Tune, an open-source library for large-scale hyperparameter optimization (HPO) with an emphasis on enabling reproducible machine learning research. It simplifies, standardizes, and accelerates the evaluation of a wide variety of HPO algorithms.

Related content

Amazon scientist’s award-winning paper predates — but later found applications in — the deep-learning revolution.

These algorithms are implemented on top of common modules and aim to remove implementation bias to enable fair comparisons. By supporting different execution backends, the library also enables researchers and engineers to effortlessly move from simulation and small-scale experimentation to large-scale distributed tuning on the cloud.

In this post, we will give an overview of the execution backends supported in Syne Tune and benchmark state-of-the-art asynchronous HPO algorithms, including transfer learning baselines.

Supported execution backends

Syne Tune provides a general interface for backends and three implementations: one to evaluate trials on a local machine, one to evaluate trials in the cloud, and one to simulate tuning with tabulated benchmarks to reduce run time. Switching between different backends is a matter of simply passing a different trial_backend parameter to the tuner, as shown in the code examples below. The backend API has been kept lean on purpose, and adding new backends requires little effort.

Local backend

This backend evaluates trials concurrently on a single machine by using subprocesses. We support rotating multiple GPUs on the machine, assigning the next trial to the least busy GPU (e.g., the GPU with the fewest number of trials currently running). Trial checkpoints and logs are stored to local files.

How to tune a training script with Bayesian optimization in Syne Tune.

Cloud backend

Running on a single machine limits the number of trials that can run concurrently. Moreover, neural-network training may require many GPUs, even distributed across several nodes, or multi-GPU devices. For those use cases, we provide an Amazon SageMaker backend that can run multiple trials in parallel.

Changing the trial backend to run on cloud machines.

Simulation backend

A growing number of tabulated benchmarks are available for HPO and neural-architecture-search (NAS) research. The simulation backend allows the execution of realistic experiments with such benchmarks on a single CPU instance, paying real time for the decision-making only.

Related content

Method presented to ICML workshop works with any machine learning model and fairness criterion.

To this end, we use a timekeeper to manage simulated time and a priority queue of time-stamped events (e.g., reporting-metric values for running trials), which work together to ensure that interactions between trials and the scheduler happen in the right ordering, whatever the experimental setup may be. The simulator correctly handles any number of workers, and delay due to model-based decision-making is taken into account.

Changing the trial backend to run on parallel and asynchronous simulations based on tabulated benchmarks.

Comparing asynchronous tuning algorithms

Syne Tune provides implementations of a broad range of synchronous and asynchronous HPO algorithms. In our experiments, we consider single-fidelity HPO algorithms, which require entire training runs to evaluate a candidate hyperparameter configuration. Random search (RS), regularized evolution for architecture search (REA), and Bayesian-optimization variants (e.g., Gaussian-process-based (GP) and density-ratio-based (BORE), of which TPE is a special case) fall in this category.

Related content

System enables efficient updating and parallelization and stable scaling.

We also consider multi-fidelity HPO algorithms, which stop unpromising training runs early. The median stopping rule (MSR), asynchronous successive halving (ASHA), and asynchronous Bayesian-optimization variants (e.g., BOHB and MOB) are prominent examples.

The table below shows the normalized rank, averaged over wall-clock time, of these single- and multi-fidelity optimizers on three publicly available neural-architecture-search benchmarks: FCNet, from Klein and Hutter (2019); NAS201, from Dong and Yang (2020); and LCBench, from Zimmer et al. (2021).

Multi-fidelity algorithms are in general superior to single-fidelity algorithms, which is expected, as they make more efficient use of the computational resources available to them. These results are also consistent with previous results reported in the literature. It should be noted that among the multi-fidelity algorithms, MSR is the only one not using successive halving, and it performs worst.

Related content

New transferability metric is more accurate and more generally applicable than predecessors.

The table also shows the average normalized rank of transfer learning approaches. Hyperparameter transfer learning uses evaluation data from past HPO tasks in order to warmstart the current HPO task, which can result in significant speed-ups in practice.

Syne Tune supports transfer-learning-based HPO via an abstraction that maps a scheduler and transfer learning data to a warmstarted instance of the former. We consider the bounding-box and quantile-based ASHA, respectively referred to as ASHA-BB and ASHA-CTS. We also consider a zero-shot approach (ZS), which greedily selects hyperparameter configurations that complement previously considered ones, based on historical performances; and RUSH, which warmstarts ASHA with the best configurations found for previous tasks. As expected, we find that transfer learning approaches accelerate HPO.

Average normalized rank (lower is better) of algorithms across time and benchmarks. Best results per category are indicated in bold.

Our experiments show that Syne Tune makes research on automated machine learning more efficient, reliable, and trustworthy. By making simulation on tabulated benchmarks a first-class citizen, it makes hyperparameter optimization accessible to researchers without massive computation budgets. By supporting advanced use cases, such as hyperparameter transfer learning, it allows better problem solving in practice.

To learn more about the library and contribute to it, please check out the paper and our GitHub repo for documentation. We just released the 0.3 version, with new HPO algorithms, new benchmarks, tensorboard visualization, and more.

Source link


Please enter your comment!
Please enter your name here

Share post:


More like this

Gemma: Introducing new state-of-the-art open models

Responsible by designGemma is designed with our AI...

Don’t dry your iPhone in a bag of rice, says Apple

The popular remedy risks "small particles" entering the...