2 posts tagged with "efficiency" | Communications Mining Docs

Efficient Transformers II: Knowledge Distillation & Fine-Tuning

April 11, 2022 · 11 min read

Machine Learning Research Scientist

This two-part post looks at how to make state of the art NLP more efficient by exploring modifications to the popular but computationally demanding Transformer-based language modelling techniques.

The previous post:

Explained why the Transformer’s self-attention mechanism has a high computational workload.
Presented alternative attention mechanisms which are more efficient to run without significantly compromising performance.

This post will:

Explore methods which train small models to reproduce the outputs of large models.
Explain how to fine-tune language models efficiently.
Provide our recommendations for scenarios in which to use the different efficient Transformer approaches.

Efficient Transformers I: Attention Mechanisms

April 4, 2022 · 10 min read

Harshil Shah

Machine Learning Research Scientist

Business runs on communications. Customers reach out when they need something. Colleagues connect to get work done. At Re:infer, our mission is to fundamentally change the economics of service work in the enterprise—to unlock the value in every interaction and make service efficient and scalable. We do this by democratising access to state of the art NLP and NLU.

Specifically, Re:infer models use deep learning architectures called Transformers. Transformers facilitate huge improvements in NLU performance. However, they are also highly compute intensive—both in training the models to learn new concepts and using them to make predictions. This two part series will look at multiple techniques to increase the speed and reduce the compute cost of using these large Transformer architectures.

This post will:

Present a brief history of embedding models in NLP.
Explain why the Transformer’s self-attention mechanism has a high computational workload.
Review modifications to the traditional Transformer architecture that are more computationally efficient to train and run without significantly compromising performance.

The next post, will look at additional computational and approximation techniques that yield further efficiency gains. The next post will:

Explore distillation techniques, where smaller models are trained to approximate the performance of larger models.
Explain efficient fine tuning techniques, where parameter updates are restricted.
Provide our recommendations for when to use each of these methods.