Leo Dirac explains what it looks like to train a neural network through geometric intuition. The structure of the talk is:
- Supervised learning. What does a decision boundary look for a simple binary classification problem, and how do the data interact with it during the training process. What is a loss surface, and how does SGD find its way to the bottom of it.
- Training Deep Neural Networks with Non-Convex Optimization. How neural networks make the decision boundary more complex, and what a non-convex loss surface looks like. Then some recent research into the shapes of these loss surfaces, starting with how the sharp minima theory implies we should seek a wide valley in the loss surface. Then research implying that all local minima are equivalent and connected, and a couple of algorithms including Entropy-SGD and SWA to take advantage of this structure.
- Practical applications with Code. Code samples showing how to apply SWA using PyTorch or TensorFlow.