Meetup in January 29, 2020

Out next event will be January 29, 2020, hosted by Uber in their downtown Seattle office. Seats are limited, and you’ll need a ticket. Get yours at We’ll have two talks:

Fardin Abdi from Uber will present “An introduction to Horovod” describing techniques to scale deep learning training jobs to lots of GPUs. This will be an excellent mix of software engineering techniques for distributed computing along with the math of deep learning.

Leo Dirac will present “Geometric Intuition of Configuration Space” about Bayesian Optimization as the basis for AutoML, its fundamental limitations and ways to work around them.

Brushing up on Linear Algebra

For techies diving into machine learning and deep learning, the math can be daunting. While you can get a lot done without being fluent in vector spaces and linear transformations, understanding these conceptually can go a long way to making you be effective. Both by understanding what’s possible, and in interacting with scientists.

We highly recommend the video series by the amazing educator 3blue1brown (a.k.a Grant Sanderson). If you maybe took a linear algebra class a while ago and don’t remember all of it, or even if you haven’t at all, this is a super efficient way to strengthen your grasp of the basics. Or maybe that linear algebra class left you confused — fair chance these videos will explain things more clearly than your professor did. (No offense, professor.)

We’ve collected the entire series into a playlist for you here…

Linear Algebra video lecture series by 3blue1brown

Or if you have more time on your hands and want to go deeper, try a full MOOC, like one of these:

Have some other suggestions for great material to (re-)learn linear algebra? Leave a comment!

LSTM is dead, long live Transformers

By Leo Dirac. From Sea-ADL’s first meetup on November 12, 2019, hosted by Moz.

Leo Dirac talks about how Transformer models like BERT and GPT2 have taken the natural language processing (NLP) community by storm, and effectively replaced LSTM models for most practical applications. The talk covers:

  • Traditional NLP. Background on Natural Language Processing, why sequence modeling is difficult for standard supervised machine learning approaches, and showing bag-of-words as a way to solve a document classification problem.
  • Neural document processing: Vanilla RNN, LSTM. How neural networks process sequences with simple recurrent neural networks, and LSTM as the standard improvement upon them by solving vanishing and exploding gradients by effectively using a residual-network approach.
  • Transformers. How transformer networks work: what attention mechanisms look like visually and in pseudo-code, and how positional encoding takes it beyond a bag-of-words. How transformers benefit from modern ReLU activations.
  • Code. The most important advantage of transformers over LSTM is that transfer learning works, allowing you to fine-tune a large pre-trained model for your task. Shows how to do this in 12 lines of python.

Geometric Intuition for Training Neural Networks

By Leo Dirac. From Sea-ADL’s first meetup on November 12, 2019, hosted by Moz.

Leo Dirac explains what it looks like to train a neural network through geometric intuition. The structure of the talk is:

  • Supervised learning. What does a decision boundary look for a simple binary classification problem, and how do the data interact with it during the training process. What is a loss surface, and how does SGD find its way to the bottom of it.
  • Training Deep Neural Networks with Non-Convex Optimization. How neural networks make the decision boundary more complex, and what a non-convex loss surface looks like. Then some recent research into the shapes of these loss surfaces, starting with how the sharp minima theory implies we should seek a wide valley in the loss surface. Then research implying that all local minima are equivalent and connected, and a couple of algorithms including Entropy-SGD and SWA to take advantage of this structure.
  • Practical applications with Code. Code samples showing how to apply SWA using PyTorch or TensorFlow.