Bort: A Version of BERT That's 20 Times as Fast

Just released by the Amazon Science team — Bort, a compressed version of BERT that is 16% the size and about 20 times as fast, while actually improving performance on 20 of 23 natural language understanding tasks.
How It Works
The key innovation behind Bort is an approach called Optimal Subarchitecture Extraction (OSE). Rather than using weight pruning — which removes individual connections from a network — OSE fundamentally restructures the network by identifying the optimal architectural parameters such as layer counts and node counts. The idea is that instead of starting with a large model and trimming it down, you algorithmically determine the best smaller architecture from the start.
The team designed a fully polynomial-time approximation scheme (FPTAS) that guarantees a Pareto-optimal solution. This means that any improvement in speed necessarily involves trade-offs with size or accuracy, and vice versa — you’re guaranteed to be on the efficient frontier.
Fine-Tuning with Agora
A second algorithm called Agora addresses the challenge of fine-tuning on small datasets. It works by sampling difficult examples from development sets, generating nearby points in the representation space, and incorporating them into the training data. This data augmentation strategy improves generalization, which is especially important when working with a smaller model on limited data.
Results
The results are impressive:
- Network size reduced by 84% (effective size is just 5.5% of the original BERT)
- Inference speed improvement of up to 20x faster on CPU
- Performance gains of up to 31% absolute improvement on some benchmarks
- Maintains BERT’s generalizability across a wide range of NLP tasks
This research demonstrates that principled, algorithmic approaches to model compression can significantly outperform heuristic methods. Instead of guessing at architectures, using mathematical optimization to find the right structure leads to models that are not just smaller and faster, but often better.