# Curiousily

## Training a Neural Network from Scratch with Gradient Descent in JavaScript

Deep Learning, Machine Learning, Neural Network, JavaScript4 min read

Share

TL;DR In this part, you’ll implement a Neural Network and train it with an algorithm called Gradient Descent from scratch.

The problem you’re trying to solve is to predict the number of infected (with a novel virus) patients for the next day, based on historical data. You’ll train a tiny Neural Network to do it!

• Run the complete source code on CodeSandbox

In this part, you’ll learn how to:

• Measure the error of your model predictions
• Implement a simple way to find good weights for your model
• Implement Gradient Descent - efficient way to find good weight values

First, we need to set a goal that we want to achieve. Let’s formulate that!

## Learning is error reduction

How good your model predictions are? This question is vitally important. It gives you a starting point upon which to improve.

The problem of finding good weight parameters transforms into driving an error measurement down to 0. Here’s one way to measure the error between the predictions and true values:

1const weight = 0.52const data = 334const prediction = data * weight56const trueInfectedCount = 478const error = (prediction - trueInfectedCount) ** 2910console.log(error)

16.25

We subtract the true value from the prediction and square the result. This is known as squared error. An error of 0 indicates that the prediction is perfect (the same as the true value).

But why the squaring? This ensures that the error is always non-negative and increases the error when there are larger deviations from the true values. And this is a good thing - we want to change our model more aggressively when it makes huge errors.

## Learning by going up and down

You have a way to evaluate how good your Neural Network predictions are. How can you use that information to find good weights?

• Simple
• Inefficient
• Impossible to predict exact value

Remember the “Guess the number” game you were playing as a kid? You need to find a number based on greater than or less than feedback.

We’ll choose a step value and calculate the error for going up or down with that step. We’ll take the direction in which the error is smaller.

This sounds simple enough. Let’s try it out:

1var weight = 0.52const data = 334const neuralNet = (data, weight) => data * weight56const error = (prediction, trueValue) => (prediction - trueValue) ** 278const trueInfectedCount = 4910const STEP_CHANGE = 0.051112for (const i of Array(20).keys()) {13  const prediction = neuralNet(data, weight)1415  const currentError = error(prediction, trueInfectedCount)1617  console.log(18    iteration ${i + 1} error:${currentError} prediction: ${prediction}19 )2021 const upPrediction = neuralNet(data, weight + STEP_CHANGE)22 const upError = error(upPrediction, trueInfectedCount)2324 const downPrediction = neuralNet(data, weight - STEP_CHANGE)25 const downError = error(downPrediction, trueInfectedCount)2627 if (upError < downError) {28 weight += STEP_CHANGE29 } else {30 weight -= STEP_CHANGE31 }32} 1iteration 1 error: 6.25 prediction: 1.52iteration 2 error: 5.522499999999998 prediction: 1.65000000000000013iteration 3 error: 4.839999999999999 prediction: 1.80000000000000034iteration 4 error: 4.2025 prediction: 1.95000000000000045iteration 5 error: 3.609999999999998 prediction: 2.10000000000000056...7iteration 19 error: 0.009999999999999573 prediction: 3.9000000000000028iteration 20 error: 0.0025000000000002486 prediction: 4.0500000000000025 5 Slowly but surely, this method gets the job done! Unfortunately, for practical purposes, this is way too slow. Why? The step you take is of fixed size. No matter how far away you’re from the minimum of the function (where the error is 0), you still take the same step. This is slow and might cause you to overshoot (miss the minimum error). We need a way to make the step size dynamic - larger when away from the error minimum and smaller when closeby. How can we do that? ## Learning with Gradient Descent You need to minimize the error and have a control over only one thing - the weight value(s). In what direction and how much should you change it? Your goal is to find weight value(s) that move the error to (as close as possible) 0. We can only wiggle the weight and need to understand its relationship with the error. But we already know that: 1error = (data * weight - trueInfectedCount) ** 2 How can we use this relationship to move the error in the right direction (close to 0)? Fortunately, the derivative of a function allows you to know in which direction and by how much to change a variable when you change another. What is a derivative of a function? Aren’t those things scary? I mean Calculus scary? The derivative is just the slope at some point in the function. Don’t worry! We’ll dive deeper to understand what this means. Let’s have a look at the first derivative of the error function: 1errorPrime = 2 * data * (data * weight - trueInfectedCount) You can get the first derivative by using standard derivative tables (or using an online derivative calculator such as this one). All looks great, let’s remove that constant $2$ in front of the equation. It is not mathematically precise, but it will keep the results more or less the same: 1errorPrime = data * (data * weight - trueInfectedCount) It might help you better understand the error function and its first derivative by graphing them. Our function is cubic and looks like this (move the mouse over the chart to see the slope): The derivative simplifies any formula and lets you see which direction you should take to reduce the error further. You also get an idea of how far away you’re from the minimum (based on how steep the curve is). The good thing is that when using Neural Network libraries (like TensorFlow.js), you won’t need to deal with derivatives. Those get calculated for you. All right, you can use all that knowledge to implement gradient descent: 1var weight = 0.52const data = 3.03const trueInfectedCount = 4.045const neuralNet = (data, weight) => data * weight67const error = (prediction, trueValue) => (prediction - trueValue) ** 289for (const i of Array(20).keys()) {10 const prediction = neuralNet(data, weight)1112 const currentError = error(prediction, trueInfectedCount)1314 const errorPrime = (data * weight - trueInfectedCount) * data1516 weight -= errorPrime1718 console.log(19 iteration${i + 1} error: ${currentError} prediction:${prediction}20  )21}

1iteration 1 error: 6.25 prediction: 1.52iteration 2 error: 400 prediction: 243iteration 3 error: 25600 prediction: -1564iteration 4 error: 1638400 prediction: 12845iteration 5 error: 104857600 prediction: -102366iteration 6 error: 6710886400 prediction: 819247...8iteration 18 error: 3.1691265005705735e+31 prediction: 56294995342131249iteration 19 error: 2.028240960365167e+33 prediction: -4503599627370496010iteration 20 error: 1.2980742146337069e+35 prediction: 360287970189639700

Gradient descent iteratively adjusts the Neural Network weight using the magnitude and direction provided by the derivative of our error function.

But what about that output? Those predictions seem to be wrong, by a lot!

Looks like the weight updates are far too aggressive (large). The algorithm simply overshoots the bottom of the U-shaped error function and goes for the stars. Let’s address this issue next!

### Slowing down the learning process

Our Neural Network learns way too fast for its own good. We’ll introduce another parameter $\alpha$ that controls how much our model should learn on each step. That way, we’ll cut down the huge updates (overshooting).

1var weight = 0.52const data = 3.03const trueInfectedCount = 4.04const ALPHA = 0.156const neuralNet = (data, weight) => data * weight78const error = (prediction, trueValue) => (prediction - trueValue) ** 2910for (const i of Array(20).keys()) {11  const prediction = neuralNet(data, weight)1213  const currentError = error(prediction, trueInfectedCount)1415  const errorPrime = (data * weight - trueInfectedCount) * data1617  weight -= ALPHA * errorPrime1819  console.log(20    iteration ${i + 1} error:${currentError} prediction: \${prediction}21  )22}

1iteration 1 error: 6.25 prediction: 1.52iteration 2 error: 0.0625 prediction: 3.753iteration 3 error: 0.0006250000000000178 prediction: 3.97499999999999964iteration 4 error: 0.0000062499999999997335 prediction: 3.99755iteration 5 error: 6.249999999993073e-8 prediction: 3.999756...7iteration 16 error: 4.930380657631324e-30 prediction: 3.9999999999999988iteration 17 error: 0 prediction: 49iteration 18 error: 0 prediction: 410iteration 19 error: 0 prediction: 411iteration 20 error: 0 prediction: 4

This looks much better! Note that we no longer need to specify the direction or the amount of the update! Everything is taken care of thanks to the usage of derivatives!

But what about that $\alpha$ value? How can you come up with it? In Machine Learning lingo, this is a hyperparameter. All this means is that you’ll have to find a good value on your own (mostly by trial and error). Sad I know, but there are more sophisticated ways to handle this issue.

## Summary

Great job! You have a tiny Neural Network for which you found good weight values! At least, it looks this way, given it makes a correct prediction.

In this part, you learned how to:

• Measure the error of your model predictions
• Implement a simple way to find good weights for your model
• Implement Gradient Descent - efficient way to find good weight values

Yes, we looked at a toy example. In the real world, you (hopefully) have more data and need a more general way to find good weight values for your Neural Networks. We’ll have a look at Generalized Gradient Descent next!

© 2021 Curiousily by Venelin Valkov