Amazon SageMaker Autopilot: Up to 8x Faster with Ensemble Training Powered by AutoGluon

Proud of the team for shipping this one! Amazon SageMaker Autopilot now supports a new ensemble training mode powered by AutoGluon, delivering up to 8x faster training with improved accuracy.

What’s New

The new ensemble training mode trains several base models and combines their predictions using model stacking. For datasets under 100 MB, it builds ML models up to eight times faster than hyperparameter optimization (HPO) with 250 trials, and up to 5.8 times faster than HPO with 100 trials.

It supports a wide range of algorithms including LightGBM, CatBoost, XGBoost, Random Forest, Extra Trees, linear models, and neural networks based on PyTorch and FastAI.

How It Works

AutoGluon-Tabular (AGT) is an open-source AutoML framework that trains highly accurate models on tabular datasets. Unlike frameworks that primarily focus on model and hyperparameter selection, AGT succeeds by ensembling multiple models and stacking them in multiple layers:

Given a dataset, AGT trains various base models ranging from boosted trees to customized neural networks
Predictions from the base models are used as features to build a stacking model, which learns the appropriate weight of each base model
The stacking model combines the base model predictions and returns the final set of predictions

Autopilot selects an optimal set of AGT configurations and runs multiple trials in parallel to find the best model in terms of objective metrics or inference latency.

Results

Benchmarking against OpenML datasets showed consistent improvements across classification and regression tasks, with runtime improvements of up to 8.1x for small datasets and accuracy gains across all problem types and dataset sizes.

Read the full post on AWS