This two-part post looks at how to make state of the art NLP more efficient by exploring modifications to the popular but computationally demanding Transformer-based language modelling techniques.
The previous post:
- Explained why the Transformer’s self-attention mechanism has a high computational workload.
- Presented alternative attention mechanisms which are more efficient to run without significantly compromising performance.
This post will:
- Explore methods which train small models to reproduce the outputs of large models.
- Explain how to fine-tune language models efficiently.
- Provide our recommendations for scenarios in which to use the different efficient Transformer approaches.