Neural Network Starter Kit, Part 1

Jeff Spagnola
5 min readFeb 1, 2021

--

Building a Beginner Neural Network with Tensorflow

When it comes to the subject of Machine Learning, there are few topics more talked about and more exciting than Neural Networks. Deep Learning with Neural Networks has been at the forefront of the tech community for several years now and this week’s blog will be for those of you looking to jump into building your first Neural Networks.

Building a Neural Network is an EXTREMELY iterative process and one that requires a whole bunch of experimentation in order to gain an intuition into how to best build them. For this walkthrough, we’re going to be using the Keras package from the Tensorflow library. If you want to take a look at the documentation before we begin, click here. Also, for those of you math-folk who want to take a bit more of an in depth dive under the hood of a neural network, I suggest checking out Andrew Ng’s course on Coursera.

Anyway, let’s dig in.

What is a Neural Network?

Before we dive into the code and architecture of building a Neural Network, let’s take a second to look at what these things actually are.

In the loosest terms, a Neural Network is a machine learning algorithm that mimics the patterns of a human brain. It’s a series of algorithms that recognizes underly relationships in datasets that, often, a human being isn’t capable of recognizing. Much like a human brain, a neural network is a series of interconnected neurons that can recognize patterns, find relationships, and classify data. The one aspect that sets them apart from other algorithms is that they have the ability to “learn” and can continue to improve.

Neural networks are being applied to many real-life problems today, including speech and image recognition, spam email filtering, finance, and medical diagnosis, to name a few.

Neural Network Building Blocks

Basic Neural Network Diagram

The diagram above is a super super simplified representation of how a neural network actually works. The yellow dots of the input layer represent the data points being fed into the model. The green dots represent the neurons of the hidden layers…and this is where the magic happens. Each neuron of the hidden layer is processing the data and has a corresponding weight, a bias, and an activation function. The red dot represents the output layer which is…well…the layer that represents our output. If none of this makes sense to you, don’t worry.

Let’s start by defining some terms:

Neuron (Node, Perceptron): Basic unit of a layer that “simply” receives an input and computes an output.

Input Layer: Initial data being fed into the neural network. Requires defining the input shape and an activation function.

Hidden Layers: The intermediate layer between the input and the output where all the computational heavy lifting is done. Requires an activation function.

Output Layer: The layer that produces the result for given inputs. Requires an activation function.

Activation Function: Function that defines the output of a node given an input or set of inputs. This is vague, but we’ll see more about this shortly.

There’s a bunch more terms that will pop up throughout this blog, but these are the main ones to understand for now. The rest, we’ll explain as we go.

Model Architecture

Now on to the fun stuff. Let’s import some packages and get this thing going!

from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras import regularizers
from tensorflow.keras.callbacks import EarlyStopping

First, we need to instantiate our model. For the purposes of this demo, we’ll be using a Sequential model. There are several model types to choose from and feel free to check out the Tensorflow documentation for more information.

# Create base modelmodel = Sequential()

Next, we need to set up our Input Layer. Pay attention to the syntax here as it will be similar for most of the layers of this model. We’ll primarily be using Dense layers for this model. The general structure of a layer is as follows:

model.add(Dense(# of neurons, activation function)

In the Input Layer, we also need to establish the input shape of our data. In most cases, we can pass the X_train.shape into this parameter.

We also must add an activation function to this layer. Feel free to experiment with any activation function as they can greatly influence the performance of the model.

# Input layermodel.add(Dense(64, input_shape = X_train_df.shape, activation = 'relu'))

Now we can set up our hidden layers. Like with everything else mentioned, each parameter of these layers are open for experimentation and can greatly affect the outcome of the model.

# Hidden layersmodel.add(Dense(32, activation = 'relu'))
model.add(Dense(16, activation = 'relu'))
model.add(Dense(8, activation = 'relu'))

Next, we need to add an output layer in order to get our results. In this exercise, we’re creating a binary classifier, so we only need one neuron on the output layer. As such, it is customary to also use the Sigmoid activation function as it returns a number between 1 and 0.

# Output layermodel.add(Dense(1, activation = 'sigmoid'))

Before we’re able to fit the model, we first need to compile it. Within the compile layer, we must set an optimizer, define our loss function, and decide what metric we want to use for evaluation. We’ll get more into the tuning of these hyperparameters in the part 2 of this blog.

# Compile model.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = 'accuracy')

Finally, we’re able to fit our model. In this set, we feed in the X_train, y_train, the batch size and the number of epochs we want run. Again, we’ll get more into these hyperparameters in part 2 of this blog.

# Fit the modelhistory = model.fit(X_train, y_train, batch_size = 16, epochs = 25)

Now that we have all that squared away, let’s put it all together and see what we have.

# Create base model
model = Sequential()
# Input layer
model.add(Dense(64, input_shape = X_train_df.shape, activation = 'relu'))
# Hidden layers
model.add(Dense(32, activation = 'relu'))
model.add(Dense(16, activation = 'relu'))
model.add(Dense(8, activation = 'relu'))
# Output layer
model.add(Dense(1, activation = 'sigmoid'))
# Compile
model.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = 'accuracy')
# Fit the model
history = model.fit(X_train_df, y_train, batch_size = 16, epochs = 25)

Next Time…

Remember this movie?

I think that this is a good place to hit the pause button and let all this info sink in. In Part 2 of this blog, we’re going to take a deep dive into hyperparameter tuning, model evaluation, and how to deal with bias/variance in your model.

Until next time…

--

--

Jeff Spagnola
Jeff Spagnola

Written by Jeff Spagnola

A mildly sarcastic, often enthusiastic Data Scientist based in central Florida. If you’ve come expecting blogs about machine learning, future science, space, AI

No responses yet