— Neural Networks, Deep Learning, TensorFlow, Machine Learning, JavaScript — 6 min read
Share
TL;DR Learn about Deep Learning and create Deep Neural Network model to predict customer churn using TensorFlow.js. Learn how to preprocess string categorical data.
First day! You’ve landed this Data Scientist intern job at a large telecom company. You can’t stop dreaming about the Lambos and designer clothes you’re going to get once you’re a Senior Data Scientist.
Even your mom is calling to remind you to put your Ph.D. in Statistics diploma on the wall. This is the life, who cares about that you’re in your mid-30s and this is your first job ever.
Your team lead comes around, asking how do you enjoy the job and saying that he might have a task for you! You start imagining implementing complex statistical models from scratch, doing research, and adding cutting-edge methods but… Well, the reality is slightly different. He sent you a link to a CSV fail and asks you to predict customer churn. He suggests that you might to try to apply Deep Learning to the problem.
Your dream is starting now. Time do to some work!
Run the complete source code for this tutorial right in your browser:
Our dataset Telco Customer Churn comes from Kaggle.
“Predict behavior to retain customers. You can analyze all relevant customer data and develop focused customer retention programs.” [IBM Sample Data Sets]
The data set includes information about:
It has 7,044 examples and 21 variables:
We’ll use Papa Parse to load the data:
1const prepareData = async () => {2 const csv = await Papa.parsePromise(3 "https://raw.githubusercontent.com/curiousily/Customer-Churn-Detection-with-TensorFlow-js/master/src/data/customer-churn.csv"4 )56 const data = csv.data7 return data.slice(0, data.length - 1)8}
Note that we ignore the last row since it is empty.
Let’s get a feeling of our dataset. How many of the customers churned?
About 74% of the customers are still using the company services. We have a very unbalanced dataset.
Does gender play a role in losing customers?
Seems like it doesn’t. We have about the same amount of female and male customers. How about seniority?
About 20% of the customers are senior, and they are much more likely to churn, compared to the nonseniors.
For how long customers stay with the company?
Seems like the more you stay, the more likely you’re to stay with the company.
How do monthly charges affect churn?
A customer with low monthly charges (< $30) produce is much more likely to be retained.
How about the total amount charged per customer?
The higher the total amount charged by the company, the more likely it is for this customer to be retained.
Our dataset has a total of 21 features, and we didn’t look through all of those. However, we found some interesting stuff.
We’ve learned that SeniorCitizen, tenure, MonthlyCharges, and TotalCharges are somewhat correlated with the churn status. We’ll use them for our model!
Deep Learning is a subset of Machine Learning, using Deep Artificial Neural Networks as a primary model to solve a variety of tasks.
To obtain a Deep Neural Network, take a Neural Network with one hidden layer (shallow Neural Network) and add more layers. That’s the definition of a Deep Neural Network - Neural Network with more than one hidden layer!
In Deep Neural Networks, each layer of neurons is trained on the features/outputs of the previous layer. Thus, you can create a feature hierarchy of increasing abstraction and learn complex concepts.
These networks are very good at discovering patterns within raw data (images, texts, video, and audio recordings), which is the most amounts of data we have. For example, Deep Learning can take millions of images and categorize them into photos of your grandma, funny cats, and delicious cakes.
Deep Neural Nets are holding state-of-the-art scores on a variety of important problems. Examples are image recognition, image segmentation, sound recognition, recommender systems, natural language processing, etc.
So basically, Deep Learning is Large Neural Networks. Why now? Why Deep Learning wasn’t practical before?
Most real-world applications of Deep Learning require large amounts of labeled data: developing a driverless car might require thousands of hours of video.
Training models with large amounts of parameters (weights) requires substantial computing power: special purpose hardware in the form of GPUs and TPUs offers massively parallel computations, suitable for Deep Learning.
Big companies have been storing your data for a while now: they want to monetize it.
We learned (kinda) how to initialize the weights of the neurons in the Neural Network models: mostly using small random values
We have better regularization techniques (e.g. Dropout)
Last but not least, we have software that is performant and (sometimes) easy to use. Libraries like TensorFlow, PyTorch, MXNet and Chainer allows practitioners to develop, analyze, test and deploy models of varying complexity and reuse work done by other practitioners and researchers.
Let’s use the “all-powerful” Deep Learning machinery to predict which customers are going to churn. First, we need to do some data preprocessing since a lot of the features are categorical.
We’ll use all numerical (except customerID) and the following categorical features:
1const categoricalFeatures = new Set([2 "TechSupport",3 "Contract",4 "PaymentMethod",5 "gender",6 "Partner",7 "InternetService",8 "Dependents",9 "PhoneService",10 "TechSupport",11 "StreamingTV",12 "PaperlessBilling",13])
Let’s create training and testing datasets from our data:
1const [xTrain, xTest, yTrain, yTest] = toTensors(data, categoricalFeatures, 0.1)
Here’s how we create our Tensors:
1const toTensors = (data, categoricalFeatures, testSize) => {2 const categoricalData = {}3 categoricalFeatures.forEach(f => {4 categoricalData[f] = toCategorical(data, f)5 })67 const features = [8 "SeniorCitizen",9 "tenure",10 "MonthlyCharges",11 "TotalCharges",12 ].concat(Array.from(categoricalFeatures))1314 const X = data.map((r, i) =>15 features.flatMap(f => {16 if (categoricalFeatures.has(f)) {17 return categoricalData[f][i]18 }1920 return r[f]21 })22 )2324 const X_t = normalize(tf.tensor2d(X))2526 const y = tf.tensor(toCategorical(data, "Churn"))2728 const splitIdx = parseInt((1 - testSize) * data.length, 10)2930 const [xTrain, xTest] = tf.split(X_t, [splitIdx, data.length - splitIdx])31 const [yTrain, yTest] = tf.split(y, [splitIdx, data.length - splitIdx])3233 return [xTrain, xTest, yTrain, yTest]34}
First, we use the function toCategorical()
to convert categorical features into one-hot encoded vectors. We do that by converting the string values into numbers and use tf.oneHot() to create the vectors.
We create a 2-dimensional Tensor from our features (categorical and numerical) and normalize it. Another, one-hot encoded, Tensor is made from the Churn column.
Finally, we split the data into training and testing datasets and return the results. How do we encode categorical variables?
1const toCategorical = (data, column) => {2 const values = data.map(r => r[column])3 const uniqueValues = new Set(values)45 const mapping = {}67 Array.from(uniqueValues).forEach((i, v) => {8 mapping[i] = v9 })1011 const encoded = values12 .map(v => {13 if (!v) {14 return 015 }16 return mapping[v]17 })18 .map(v => oneHot(v, uniqueValues.size))1920 return encoded21}
First, we extract a vector of all values for the feature. Next, we obtain the unique values and create a string to int mapping from it.
Note that we check for missing values and encode those as 0. Finally, we one-hot encode each value.
Here are the remaining utility functions:
1// normalized = (value − min_value) / (max_value − min_value)2const normalize = tensor =>3 tf.div(tf.sub(tensor, tf.min(tensor)), tf.sub(tf.max(tensor), tf.min(tensor)))45const oneHot = (val, categoryCount) =>6 Array.from(tf.oneHot(val, categoryCount).dataSync())
We’ll wrap the building and training of our model into a function called trainModel()
:
1const trainModel = async (xTrain, yTrain) => {2 ...3}
Let’s create a Deep Neural Network using the sequential model API in TensorFlow:
1const model = tf.sequential()2model.add(3 tf.layers.dense({4 units: 32,5 activation: "relu",6 inputShape: [xTrain.shape[1]],7 })8)910model.add(11 tf.layers.dense({12 units: 64,13 activation: "relu",14 })15)1617model.add(tf.layers.dense({ units: 2, activation: "softmax" }))
Our Deep Neural Network has two hidden layers with 32 and 64 neurons, respectively. Each layer has a ReLU activation function.
Time to compile our model:
1model.compile({2 optimizer: tf.train.adam(0.001),3 loss: "binaryCrossentropy",4 metrics: ["accuracy"],5})
We’ll train our model using the Adam optimizer and measure our error using Binary Crossentropy.
Finally, we’ll pass the training data to the fit method of our model and train for 100 epochs, shuffle the data, and use 10% of it for validation. We’ll visualize the training progress using tfjs-vis:
1const lossContainer = document.getElementById("loss-cont")23await model.fit(xTrain, yTrain, {4 batchSize: 32,5 epochs: 100,6 shuffle: true,7 validationSplit: 0.1,8 callbacks: tfvis.show.fitCallbacks(9 lossContainer,10 ["loss", "val_loss", "acc", "val_acc"],11 {12 callbacks: ["onEpochEnd"],13 }14 ),15})
Let’s train our model:
1const model = await trainModel(xTrain, yTrain)
It seems like our model is learning during the first ten epochs and plateaus after that.
Let’s evaluate our model on the test data:
1const result = model.evaluate(xTest, yTest, {2 batchSize: 32,3})45// loss6result[0].print()78// accuracy9result[1].print()
1Tensor2 0.448080241680145263Tensor4 0.7929078340530396
The model has an accuracy of 79.2% on the test data. Let’s have a look at what kind of mistakes it makes using the confusion matrix:
1const preds = model.predict(xTest).argMax(-1)2const labels = yTest.argMax(-1)3const confusionMatrix = await tfvis.metrics.confusionMatrix(labels, preds)4const container = document.getElementById("confusion-matrix")5tfvis.render.confusionMatrix(container, {6 values: confusionMatrix,7 tickLabels: ["Retained", "Churned"],8})
It seems like our model is overconfident in predicting retained customers. Depending on your needs, you might try to tune the model, and predict retained customers better.
Great job! You just built a Deep Neural Network that predicts customer churn with ~80% accuracy. Here’s what you’ve learned:
But can it be that Deep Learning is even more powerful? So powerful that it can understand images?
Run the complete source code for this tutorial right in your browser:
Share
You'll never get spam from me