LSTM is dead, long live Transformers

By Leo Dirac. From Sea-ADL’s first meetup on November 12, 2019, hosted by Moz.

Leo Dirac talks about how Transformer models like BERT and GPT2 have taken the natural language processing (NLP) community by storm, and effectively replaced LSTM models for most practical applications. The talk covers:

  • Traditional NLP. Background on Natural Language Processing, why sequence modeling is difficult for standard supervised machine learning approaches, and showing bag-of-words as a way to solve a document classification problem.
  • Neural document processing: Vanilla RNN, LSTM. How neural networks process sequences with simple recurrent neural networks, and LSTM as the standard improvement upon them by solving vanishing and exploding gradients by effectively using a residual-network approach.
  • Transformers. How transformer networks work: what attention mechanisms look like visually and in pseudo-code, and how positional encoding takes it beyond a bag-of-words. How transformers benefit from modern ReLU activations.
  • Code. The most important advantage of transformers over LSTM is that transfer learning works, allowing you to fine-tune a large pre-trained model for your task. Shows how to do this in 12 lines of python.

Published by seaadl

We're a new community for tech professionals applying AI to solve real world problems.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: