Curiousily

Training a Neural Network from Scratch with Gradient Descent in JavaScript

13.07.2020 — Deep Learning, Machine Learning, Neural Network, JavaScript — 4 min read

TL;DR In this part, you’ll implement a Neural Network and train it with an algorithm called Gradient Descent from scratch.

The problem you’re trying to solve is to predict the number of infected (with a novel virus) patients for the next day, based on historical data. You’ll train a tiny Neural Network to do it!

Run the complete source code on CodeSandbox

In this part, you’ll learn how to:

Measure the error of your model predictions
Implement a simple way to find good weights for your model
Implement Gradient Descent - efficient way to find good weight values

First, we need to set a goal that we want to achieve. Let’s formulate that!

Learning is error reduction

How good your model predictions are? This question is vitally important. It gives you a starting point upon which to improve.

The problem of finding good weight parameters transforms into driving an error measurement down to 0. Here’s one way to measure the error between the predictions and true values:

1const weight = 0.5
2const data = 3
3
4const prediction = data * weight
5
6const trueInfectedCount = 4
7
8const error = (prediction - trueInfectedCount) ** 2
9
10console.log(error)

16.25

We subtract the true value from the prediction and square the result. This is known as squared error. An error of 0 indicates that the prediction is perfect (the same as the true value).

But why the squaring? This ensures that the error is always non-negative and increases the error when there are larger deviations from the true values. And this is a good thing - we want to change our model more aggressively when it makes huge errors.

Learning by going up and down

You have a way to evaluate how good your Neural Network predictions are. How can you use that information to find good weights?

Simple
Inefficient
Impossible to predict exact value

Remember the “Guess the number” game you were playing as a kid? You need to find a number based on greater than or less than feedback.

We’ll choose a step value and calculate the error for going up or down with that step. We’ll take the direction in which the error is smaller.

This sounds simple enough. Let’s try it out:

1var weight = 0.5
2const data = 3
3
4const neuralNet = (data, weight) => data * weight
5
6const error = (prediction, trueValue) => (prediction - trueValue) ** 2
7
8const trueInfectedCount = 4
9
10const STEP_CHANGE = 0.05
11
12for (const i of Array(20).keys()) {
13  const prediction = neuralNet(data, weight)
14
15  const currentError = error(prediction, trueInfectedCount)
16
17  console.log(
18    `iteration ${i + 1} error: ${currentError} prediction: ${prediction}`
19  )
20
21  const upPrediction = neuralNet(data, weight + STEP_CHANGE)
22  const upError = error(upPrediction, trueInfectedCount)
23
24  const downPrediction = neuralNet(data, weight - STEP_CHANGE)
25  const downError = error(downPrediction, trueInfectedCount)
26
27  if (upError < downError) {
28    weight += STEP_CHANGE
29  } else {
30    weight -= STEP_CHANGE
31  }
32}

1iteration 1 error: 6.25 prediction: 1.5
2iteration 2 error: 5.522499999999998 prediction: 1.6500000000000001
3iteration 3 error: 4.839999999999999 prediction: 1.8000000000000003
4iteration 4 error: 4.2025 prediction: 1.9500000000000004
5iteration 5 error: 3.609999999999998 prediction: 2.1000000000000005
6...
7iteration 19 error: 0.009999999999999573 prediction: 3.900000000000002
8iteration 20 error: 0.0025000000000002486 prediction: 4.0500000000000025 5

Slowly but surely, this method gets the job done! Unfortunately, for practical purposes, this is way too slow. Why?

The step you take is of fixed size. No matter how far away you’re from the minimum of the function (where the error is 0), you still take the same step. This is slow and might cause you to overshoot (miss the minimum error).

We need a way to make the step size dynamic - larger when away from the error minimum and smaller when closeby. How can we do that?

Learning with Gradient Descent

You need to minimize the error and have a control over only one thing - the weight value(s). In what direction and how much should you change it?

Your goal is to find weight value(s) that move the error to (as close as possible) 0. We can only wiggle the weight and need to understand its relationship with the error. But we already know that:

1error = (data * weight - trueInfectedCount) ** 2

How can we use this relationship to move the error in the right direction (close to 0)?

Fortunately, the derivative of a function allows you to know in which direction and by how much to change a variable when you change another. What is a derivative of a function? Aren’t those things scary? I mean Calculus scary?

The derivative is just the slope at some point in the function. Don’t worry! We’ll dive deeper to understand what this means. Let’s have a look at the first derivative of the error function:

1errorPrime = 2 * data * (data * weight - trueInfectedCount)

You can get the first derivative by using standard derivative tables (or using an online derivative calculator such as this one). All looks great, let’s remove that constant $2$ in front of the equation. It is not mathematically precise, but it will keep the results more or less the same:

1errorPrime = data * (data * weight - trueInfectedCount)

It might help you better understand the error function and its first derivative by graphing them. Our function is cubic and looks like this (move the mouse over the chart to see the slope):

The derivative simplifies any formula and lets you see which direction you should take to reduce the error further. You also get an idea of how far away you’re from the minimum (based on how steep the curve is).

The good thing is that when using Neural Network libraries (like TensorFlow.js), you won’t need to deal with derivatives. Those get calculated for you.

All right, you can use all that knowledge to implement gradient descent:

1var weight = 0.5
2const data = 3.0
3const trueInfectedCount = 4.0
4
5const neuralNet = (data, weight) => data * weight
6
7const error = (prediction, trueValue) => (prediction - trueValue) ** 2
8
9for (const i of Array(20).keys()) {
10  const prediction = neuralNet(data, weight)
11
12  const currentError = error(prediction, trueInfectedCount)
13
14  const errorPrime = (data * weight - trueInfectedCount) * data
15
16  weight -= errorPrime
17
18  console.log(
19    `iteration ${i + 1} error: ${currentError} prediction: ${prediction}`
20  )
21}

1iteration 1 error: 6.25 prediction: 1.5
2iteration 2 error: 400 prediction: 24
3iteration 3 error: 25600 prediction: -156
4iteration 4 error: 1638400 prediction: 1284
5iteration 5 error: 104857600 prediction: -10236
6iteration 6 error: 6710886400 prediction: 81924
7...
8iteration 18 error: 3.1691265005705735e+31 prediction: 5629499534213124
9iteration 19 error: 2.028240960365167e+33 prediction: -45035996273704960
10iteration 20 error: 1.2980742146337069e+35 prediction: 360287970189639700

Gradient descent iteratively adjusts the Neural Network weight using the magnitude and direction provided by the derivative of our error function.

But what about that output? Those predictions seem to be wrong, by a lot!

Looks like the weight updates are far too aggressive (large). The algorithm simply overshoots the bottom of the U-shaped error function and goes for the stars. Let’s address this issue next!

Slowing down the learning process

Our Neural Network learns way too fast for its own good. We’ll introduce another parameter $\alpha$ that controls how much our model should learn on each step. That way, we’ll cut down the huge updates (overshooting).

1var weight = 0.5
2const data = 3.0
3const trueInfectedCount = 4.0
4const ALPHA = 0.1
5
6const neuralNet = (data, weight) => data * weight
7
8const error = (prediction, trueValue) => (prediction - trueValue) ** 2
9
10for (const i of Array(20).keys()) {
11  const prediction = neuralNet(data, weight)
12
13  const currentError = error(prediction, trueInfectedCount)
14
15  const errorPrime = (data * weight - trueInfectedCount) * data
16
17  weight -= ALPHA * errorPrime
18
19  console.log(
20    `iteration ${i + 1} error: ${currentError} prediction: ${prediction}`
21  )
22}

1iteration 1 error: 6.25 prediction: 1.5
2iteration 2 error: 0.0625 prediction: 3.75
3iteration 3 error: 0.0006250000000000178 prediction: 3.9749999999999996
4iteration 4 error: 0.0000062499999999997335 prediction: 3.9975
5iteration 5 error: 6.249999999993073e-8 prediction: 3.99975
6...
7iteration 16 error: 4.930380657631324e-30 prediction: 3.999999999999998
8iteration 17 error: 0 prediction: 4
9iteration 18 error: 0 prediction: 4
10iteration 19 error: 0 prediction: 4
11iteration 20 error: 0 prediction: 4

This looks much better! Note that we no longer need to specify the direction or the amount of the update! Everything is taken care of thanks to the usage of derivatives!

But what about that $\alpha$ value? How can you come up with it? In Machine Learning lingo, this is a hyperparameter. All this means is that you’ll have to find a good value on your own (mostly by trial and error). Sad I know, but there are more sophisticated ways to handle this issue.

Summary

Great job! You have a tiny Neural Network for which you found good weight values! At least, it looks this way, given it makes a correct prediction.

Run the complete source code on CodeSandbox

In this part, you learned how to:

Measure the error of your model predictions
Implement a simple way to find good weights for your model
Implement Gradient Descent - efficient way to find good weight values

Yes, we looked at a toy example. In the real world, you (hopefully) have more data and need a more general way to find good weight values for your Neural Networks. We’ll have a look at Generalized Gradient Descent next!

References

Gradient descent on Wikipedia

Want to be a Machine Learning expert?

Join the weekly newsletter on Data Science, Deep Learning and Machine Learning in your inbox, curated by me! Chosen by 10,000+ Machine Learning practitioners. (There might be some exclusive content, too!)

You'll never get spam from me

Hacker's Guide to Neural Networks in JavaScript

Build Machine Learning models (especially Deep Neural Networks) that you can easily integrate with existing or new web apps. Think of your ReactJs, Vue, or Angular app enhanced with the power of Machine Learning models.

Get SH*T Done with PyTorch

Learn how to solve real-world problems with Deep Learning models (NLP, Computer Vision, and Time Series). Go from prototyping to deployment with PyTorch and Python!

Hacker's Guide to Machine Learning with Python

This book brings the fundamentals of Machine Learning to you, using tools and techniques used to solve real-world problems in Computer Vision, Natural Language Processing, and Time Series analysis. The skills taught in this book will lay the foundation for you to advance your journey to Machine Learning Mastery!

Hands-On Machine Learning from Scratch

This book will guide you on your journey to deeper Machine Learning understanding by developing algorithms in Python from scratch! Learn why and when Machine learning is the right tool for the job and how to improve low performing models!