Lesson Video:

Grab the related library we will need

from fastai.basics import *

Below you will find the exact imports for everything we use today

import torch
from torch import nn

import numpy as np

import matplotlib.pyplot as plt

from fastai.torch_core import tensor

Stochastic Gradient Descent (SGD):

Optimization technique (optimizer)
Commonly used in neural networks
Example with linear regression

Linear Regression

Fit a line on 100 points

n = 100

Generate our data

x = torch.ones(n,2)

len(x), x[:5]

(100, tensor([[1., 1.],
         [1., 1.],
         [1., 1.],
         [1., 1.],
         [1., 1.]]))

Randomize it in a uniform distribution from -1 to 1

x[:,0].uniform_(-1., 1)
x[:5], x.shape

(tensor([[-0.7631,  1.0000],
         [ 0.8743,  1.0000],
         [ 0.3916,  1.0000],
         [ 0.8608,  1.0000],
         [ 0.2030,  1.0000]]), torch.Size([100, 2]))

Any linear model is y=mx+b
m, x, and b are matrices
We have x

m = tensor(3.,2); m, m.shape

(tensor([3., 2.]), torch.Size([2]))

b is a random bias

b = torch.rand(n); b[:5], b.shape

(tensor([0.1767, 0.8454, 0.4767, 0.6628, 0.1358]), torch.Size([100]))

Now we can make our y

Matrix multiplication is denoted with @

y = x@m + b

We'll know if we got a size wrong if:

m@x + b

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-9-ac53957f9814> in <module>()
----> 1 m@x + b

RuntimeError: mat1 and mat2 shapes cannot be multiplied (1x2 and 100x2)

Plot our results

plt.scatter(x[:,0], y)

<matplotlib.collections.PathCollection at 0x7f78991b6be0>

Our weights from last lesson should minimize the distance between points and our line.

mean squared error: Take distance from pred and y, square, then average

def mse(y_hat, y): return ((y_hat-y)**2).mean()

When we run our model, we are trying to predict m

For example, say a = (0.5, 0.75).

Make a prediction
Calculate the error

a = tensor(.5, .75)

Make prediction

y_pred = x@a

Calculate error

mse(y_pred, y)

tensor(5.8721)

What does that mean? Let's plot it

plt.scatter(x[:,0],y)
plt.scatter(x[:,0],y_pred)

<matplotlib.collections.PathCollection at 0x7f7899162cc0>

Model doesn't seen to quite fit. What's next? Optimization

Walking down Gradient Descent

Goal: Minimize the loss function (mse)
Gradient Descent:
- Starts with parameters
- Moves towards new parameters to minimize the function
- Take steps in the negative direction of gradient function

First let's make this parameter

a = nn.Parameter(a); a

Parameter containing:
tensor([0.5000, 0.7500], requires_grad=True)

Next let's create an update function to check if the current a improved. If so, move even closer.

We'll print out every 10 iterations to see how we are doing

def update():
  y_hat = x@a
  loss = mse(y, y_hat)
  if i % 10 == 0: print(loss)
  loss.backward()
  with torch.no_grad():
    a.sub_(lr * a.grad)
    a.grad.zero_()

torch.no_grad: No back propogation (no updating of our weights)
sub_: Subtracts some value (lr * our gradient)
grad.zero_: Zeros our gradients

lr = 1e-1

for i in range(100): update()

tensor(5.8721, grad_fn=<MeanBackward0>)
tensor(0.6027, grad_fn=<MeanBackward0>)
tensor(0.1875, grad_fn=<MeanBackward0>)
tensor(0.1074, grad_fn=<MeanBackward0>)
tensor(0.0905, grad_fn=<MeanBackward0>)
tensor(0.0870, grad_fn=<MeanBackward0>)
tensor(0.0862, grad_fn=<MeanBackward0>)
tensor(0.0860, grad_fn=<MeanBackward0>)
tensor(0.0860, grad_fn=<MeanBackward0>)
tensor(0.0860, grad_fn=<MeanBackward0>)

Now let's see how this new a compares.

Detach removes all gradients

plt.scatter(x[:,0],y)
plt.scatter(x[:,0], (x@a).detach())
plt.scatter(x[:,0],y_pred)

<matplotlib.collections.PathCollection at 0x7f7898c8aeb8>

We fit our line much better here

Animate the process

from matplotlib import animation, rc
rc('animation', html='jshtml')

Let's redo the process and animate our y closing in

a = nn.Parameter(tensor(0.5, 0.75)); a

Parameter containing:
tensor([0.5000, 0.7500], requires_grad=True)

We'll want to set a new y to our x@a

def animate(i):
  update()
  line.set_ydata((x@a).detach())
  return line,

Let's create a base figure

fig = plt.figure()
plt.scatter(x[:,0], y, c='orange')
line, = plt.plot(x[:,0], (x@a).detach())
plt.close()

And animate!

animation.FuncAnimation(fig, animate, np.arange(0,100), interval=20)

Ideally we split things up into batches of data to fit, and then work with all those batches (else we'd run out of memory!

If this were a classification problem, we would want to use Cross Entropy Loss, where we penalize incorrect confident predictions along with correct unconfident predictions. It's also called negative loss likelihood

Lesson 2 - Stochastic Gradient Descent

Lesson Video:

Linear Regression

Walking down Gradient Descent

Animate the process