Re:infer's machine learning algorithms are based on pre-trained Transformer models, which learn semantically informative representations of sequences of text, known as embeddings. Over the past few years, Transformer models have achieved state of the art results on the majority of common natural language processing (NLP) tasks.
But how did we get here? What has led to the Transformer being the model of choice for training embeddings? Over the past decade, the biggest improvements in NLP have been due to advances in learning unsupervised pre-trained embeddings of text. In this post, we look at the history of embedding methods, and how they have improved over time.
This post will
- Explain what embeddings are and how they are used in common NLP applications.
- Present a history of popular methods for training embeddings, including traditional methods like word2vec and modern Transformer-based methods such as BERT.
- Discuss the weaknesses of embedding methods, and how they can be addressed.