Amazon SageMaker Autopilot: Up to 8x Faster with Ensemble Training Powered by AutoGluon
Proud of the team for shipping this one! Amazon SageMaker Autopilot now supports a new ensemble training mode powered by AutoGluon, delivering up to 8x faster training with improved accuracy.
What’s New
The new ensemble training mode trains several base models and combines their predictions using model stacking. For datasets under 100 MB, it builds ML models up to eight times faster than hyperparameter optimization (HPO) with 250 trials, and up to 5.8 times faster than HPO with 100 trials.
It supports a wide range of algorithms including LightGBM, CatBoost, XGBoost, Random Forest, Extra Trees, linear models, and neural networks based on PyTorch and FastAI.
How It Works
AutoGluon-Tabular (AGT) is an open-source AutoML framework that trains highly accurate models on tabular datasets. Unlike frameworks that primarily focus on model and hyperparameter selection, AGT succeeds by ensembling multiple models and stacking them in multiple layers:
- Given a dataset, AGT trains various base models ranging from boosted trees to customized neural networks
- Predictions from the base models are used as features to build a stacking model, which learns the appropriate weight of each base model
- The stacking model combines the base model predictions and returns the final set of predictions
Autopilot selects an optimal set of AGT configurations and runs multiple trials in parallel to find the best model in terms of objective metrics or inference latency.
Results
Benchmarking against OpenML datasets showed consistent improvements across classification and regression tasks, with runtime improvements of up to 8.1x for small datasets and accuracy gains across all problem types and dataset sizes.