BigSnarf blog

Infosec FTW

Neural Network from scratch in Python

So you want to teach a computer to recognize handwritten digits? You want to code this out in Python? You understand a little about Machine Learning? You wanna build a neural network?

Let’s try and implement a simple 3-layer neural network (NN) from scratch. I won’t get into the math because I suck at math, let alone trying to teach it.  I can also point to moar math resources if you read up on the details.

I assume you’re familiar with basic Machine Learning concepts like classification and regularization. Oh, and how optimization techniques like gradient descent work.

So, why not teach you Tensorflow or some other deep learning framework? I found that I learn best when I see the code, and learn the basics of the implementation. I find it helps me with intuition in choosing each part of the model. Of course, there are some AutoML solutions that could get me quicker ways to a baseline, but I still wouldn’t know anything. I’m trying to get out of just running the code like a script kiddie.

So let’s get started!

For the past few months (thanks Arvin),  I have learned to appreciate both Classic Machine Learning (prior 2012) and Deep Learning techniques to model Kaggle competition data.

The handwritten digits competition was my first attempt at deep learning. So, I think it’s appropriate that it’s your first example to do deep learning. I remember this important gotcha moment. It was seeing the relationships between the data and pictures. It helped me to imagine the deep learning concepts visually.

What does the data look like?

We’re going to use the classic visual recognition challenge data set, called the MNIST data set. Kaggle competitions are awesome because you can self score your solutions and they provide data in simple clean CSV files.  If successful, we should have a deep learning solution that should be the able to classify 25,000 images with a correct label. Let’s look at the CSV data.

Using a Jupyter notebook, let’s dump the data into a numpy matrix, and reshape it back into a picture. Each digit has been normalized to a 28 by 28 matrix.

The goal is to take the training data as an input (handwritten digit), pump it through the deep learning model, and predict if the data is a 0, 1, 2, 3, 4, 5, 6, 7, 8, or 9.

Architecture of a Simple Neural Network

1. Picking the shape of the neural network. I’m gonna choose a simple NN consisting of three layers:

  • First Layer: Input layer (784 neurons)
  • Second Layer: Hidden layer (n = 15 neurons)
  • Third Layer: Output layer

Here’s a look of the 3 layer network proposed above:

Basic Structure of the code

// **************** basic structure **********************
import random
import numpy as np
def sigmoid_function
def sigmoid_prime
class Network(object):
num_layers
sizes
biases
weigths
def feedforward
def SGD
def update_mini_batch
def backprop
def evaluate
def cost_derivative

view raw
gistfile1.txt
hosted with ❤ by GitHub

Data structure to hold our data

2.  Picking the right matrix data structure. Nested python lists? CudaMAT? Python Dict? I’m choosing numpy because we’ll heavily use np.dot, np.reshape, np.random, np.zeros, np.argmax, and np.exp functions that I’m not really interested in implementing from scratch.

Simulating perceptrons using an Activation Function

3.  Picking the activation function for our hidden layer. The activation function transforms the inputs of the hidden layer into its outputs. Common choices for activation functions are tanh, the sigmoid function, or ReLUs. We’ll use the sigmoid function.

def sigmoid(z):
"""The sigmoid function."""
return 1.0/(1.0+np.exp(-z))
def sigmoid_prime(z):
"""Derivative of the sigmoid function."""
return sigmoid(z)*(1-sigmoid(z))

view raw
gistfile1.txt
hosted with ❤ by GitHub

Python Neural Network Object

In [1]: import numpy as np
In [2]: class Network(object):
…: def __init__(self, sizes):
…: self.num_layers = len(sizes)
…: self.sizes = sizes
…: self.biases = [np.random.randn(y, 1) for y in sizes[1:]]
…: self.weights = [np.random.randn(y, x) for x, y in zip(sizes[:-1], sizes[1:])]
…:
In [3]: neural_network = Network([1,20,10])
In [4]: neural_network.biases
Out[4]:
[array([[ 0.28482536],
[-0.89049548],
[ 0.20617518],
[ 0.4158359 ],
[-0.93796761],
[-1.49837658],
[-0.71753994],
[ 0.1593402 ],
[ 0.50075027],
[-0.91795465],
[ 1.16622691],
[ 0.07412679],
[ 0.95915247],
[ 1.27357302],
[-0.18081714],
[-0.68107571],
[ 0.25295953],
[ 0.04032309],
[-0.71716707],
[-0.46420026]]), array([[ 1.03793023],
[ 1.17470112],
[ 0.20570392],
[-2.21440845],
[ 0.58324405],
[ 0.4505373 ],
[ 0.58999162],
[-1.20247126],
[-0.79782343],
[ 1.04171305]])]
In [5]: neural_network.weights
Out[5]:
[array([[-1.25500579],
[-0.11754026],
[-0.52644551],
[ 0.70982054],
[-0.19753958],
[-0.30560159],
[-0.64869807],
[-0.0959351 ],
[ 0.00607763],
[-1.2932363 ],
[ 1.2917588 ],
[-0.9246978 ],
[-0.66108135],
[-1.74086206],
[-1.77074381],
[-0.82392728],
[ 0.48077431],
[ 0.69002335],
[-1.47798152],
[ 0.34058097]]),
array([[ -4.66379471e-01, 9.34679152e-01, 1.67660702e-01,
-1.12799513e+00, 3.81581590e-01, 1.47740106e+00,
1.44007712e+00, 3.16368089e-01, 4.77270298e-01,
-1.17552649e+00, 1.05535962e+00, 4.68589315e-01,
1.76161441e+00, 1.50704930e+00, 4.24731475e-01,
-2.00378066e-01, -9.89383268e-01, -1.61688372e+00,
1.54292169e+00, -7.94524978e-01],
[ -1.36887372e+00, -4.07464906e-01, -8.18249890e-02,
3.81270961e-01, -2.50761956e-01, 1.97130152e+00,
3.72098140e-01, 7.84729226e-01, -6.42591846e-01,
6.43485388e-01, -1.38265055e+00, 2.43183695e-01,
6.55665636e-01, 5.51403453e-01, 8.12486615e-01,
-6.83346668e-01, -8.08290747e-01, -8.68447206e-01,
-7.26645512e-01, 2.56793945e+00],
[ -2.04202733e+00, -1.05115965e+00, 9.16176896e-01,
5.98457440e-01, 5.23700857e-02, 1.37716760e+00,
-2.00414176e+00, -1.01167674e+00, -3.27884747e+00,
4.67357213e-01, -1.04381626e+00, 9.51630607e-02,
-8.21025097e-01, 1.34168886e+00, -4.69250266e-01,
-2.13058599e+00, -7.59169043e-01, 4.41636737e-01,
9.70997171e-01, 3.65910642e-01],
[ 2.96224393e-01, -8.51502886e-01, -3.63581410e-01,
1.40439562e+00, -2.14400305e-01, 4.79910147e-01,
-2.81209770e-01, 4.35715562e-01, 1.33283734e-01,
2.14274492e+00, -4.07300222e-01, 1.50805305e-01,
1.13887757e+00, -7.75897808e-02, -8.99781261e-01,
4.53669508e-01, -2.23501031e+00, 5.67555554e-01,
-6.21195162e-01, 2.32840516e-01],
[ 7.69594784e-01, 6.67373451e-01, -5.19098158e-01,
-9.90597228e-01, -9.02242307e-01, 1.60027781e+00,
-1.54443880e+00, 2.12182678e+00, -1.68228949e-01,
7.03017020e-01, -1.79920919e-01, 1.45734404e+00,
5.28024487e-01, -8.36041803e-01, 8.38291619e-01,
-3.02458855e-01, -9.72754311e-01, -2.23379185e-02,
6.29819167e-01, 4.71624228e-01],
[ -2.73827049e+00, -1.23137601e+00, -5.13667202e-01,
1.20490181e+00, 2.41071636e-01, -1.02243174e+00,
1.24807613e-01, -1.38978132e+00, -9.83408159e-01,
-7.95698065e-01, 1.37490138e+00, -1.61008374e+00,
2.50347152e+00, -3.12140404e-01, 8.86036952e-01,
1.26754490e+00, 8.30173086e-01, -5.42252782e-01,
1.60154518e+00, -3.01922053e-01],
[ 2.47038125e-01, -5.77586943e-01, -3.53221925e-04,
8.26414095e-01, -4.54193733e-01, 5.83379656e-01,
7.09285965e-01, 1.71015373e+00, -5.30156231e-01,
4.41905982e-01, 3.65858862e-01, 6.28863024e-01,
-4.51148215e-01, -4.96840730e-01, -8.88860502e-01,
-8.54983751e-01, -5.39272835e-01, 1.80314187e-01,
-1.63315792e+00, -1.18592337e-01],
[ -1.45691956e+00, 5.78914213e-01, 1.42205135e-01,
-3.99161926e-01, 1.99981936e-01, -1.99242627e-02,
-6.80270027e-01, 1.14376227e+00, 1.13379158e+00,
-1.62605085e+00, 3.02440462e-02, 8.12623415e-01,
-6.35188704e-01, 2.80390889e-01, -3.25566694e-01,
3.78058061e-01, -5.91146697e-01, -6.16020138e-01,
8.59975010e-01, 1.25717657e+00],
[ -6.17783563e-01, -1.11271300e-01, -6.95523468e-01,
1.07761134e+00, 3.06866575e-01, -1.00667752e+00,
-6.99460531e-02, 8.96042222e-01, -6.70529698e-01,
4.67376227e-01, 9.81872751e-01, 1.33395172e+00,
-5.89545746e-01, -1.27067551e+00, 1.17194488e+00,
-5.30943382e-01, 9.75559426e-01, 4.69853098e-01,
-7.55707525e-01, 2.27781822e-01],
[ 1.13163724e+00, 1.78187712e+00, -6.62599227e-01,
-2.68871029e-02, -5.87937695e-02, 1.44750129e-01,
-5.39669257e-01, 1.85152838e+00, 1.54515545e+00,
-5.96931160e-01, 1.20361780e-01, 6.12899657e-01,
5.40445280e-01, 5.77774584e-01, -1.83077535e-01,
-2.52571377e-01, -1.13763002e-01, -1.36721348e+00,
6.12752593e-01, -1.82517001e+00]])]

view raw
gistfile1.txt
hosted with ❤ by GitHub

Feed Forward Function

a.k.a The Forward Pass

The purpose of the feed forward function is to pass the input into the NN matrix and return the new activations.

def feedforward(self, a):
"""Return the output of the network if “a“ is input."""
for b, w in zip(self.biases, self.weights):
a = sigmoid(np.dot(w, a)+b)
return a

view raw
gistfile1.txt
hosted with ❤ by GitHub

Stochastic Gradient Descent function (SGD)

fff

def SGD(self, training_data, epochs, mini_batch_size, eta,
test_data=None):
"""Train the neural network using mini-batch stochastic
gradient descent. The “training_data“ is a list of tuples
“(x, y)“ representing the training inputs and the desired
outputs. The other non-optional parameters are
self-explanatory. If “test_data“ is provided then the
network will be evaluated against the test data after each
epoch, and partial progress printed out. This is useful for
tracking progress, but slows things down substantially."""
if test_data: n_test = len(test_data)
n = len(training_data)
for j in xrange(epochs):
random.shuffle(training_data)
mini_batches = [
training_data[k:k+mini_batch_size]
for k in xrange(0, n, mini_batch_size)]
for mini_batch in mini_batches:
self.update_mini_batch(mini_batch, eta)
if test_data:
print "Epoch {0}: {1} / {2}".format(
j, self.evaluate(test_data), n_test)
else:
print "Epoch {0} complete".format(j)

view raw
gistfile1.txt
hosted with ❤ by GitHub

Update Mini Batch Function

Mini-batch gradient descent can work a bit faster than stochastic gradient descent. In Batch gradient descent we will use all m examples in each generation. Whereas in Stochastic gradient descent we will use a single example in each generation. What Mini-batch gradient descent does is somewhere in between. Specifically, with this algorithm we’re going to use b examples in each iteration where b is a parameter called the “mini batch size” so the idea is that this is somewhat in-between Batch gradient descent and Stochastic gradient descent.

def update_mini_batch(self, mini_batch, eta):
"""Update the network's weights and biases by applying
gradient descent using backpropagation to a single mini batch.
The “mini_batch“ is a list of tuples “(x, y)“, and “eta“
is the learning rate."""
nabla_b = [np.zeros(b.shape) for b in self.biases]
nabla_w = [np.zeros(w.shape) for w in self.weights]
for x, y in mini_batch:
delta_nabla_b, delta_nabla_w = self.backprop(x, y)
nabla_b = [nb+dnb for nb, dnb in zip(nabla_b, delta_nabla_b)]
nabla_w = [nw+dnw for nw, dnw in zip(nabla_w, delta_nabla_w)]
self.weights = [w-(eta/len(mini_batch))*nw
for w, nw in zip(self.weights, nabla_w)]
self.biases = [b-(eta/len(mini_batch))*nb
for b, nb in zip(self.biases, nabla_b)]

view raw
gistfile1.txt
hosted with ❤ by GitHub

Back Prop Function

a.k.a The Backwards Pass

Our goal with back propagation is to update each of the weights in the network so that they cause the actual output to be closer the target output, thereby minimizing the error for each output neuron and the network as a whole.  Back prop is a method to stop us from overfitting our model, so the model is more generalized.

def backprop(self, x, y):
"""Return a tuple “(nabla_b, nabla_w)“ representing the
gradient for the cost function C_x. “nabla_b“ and
“nabla_w“ are layer-by-layer lists of numpy arrays, similar
to “self.biases“ and “self.weights“."""
nabla_b = [np.zeros(b.shape) for b in self.biases]
nabla_w = [np.zeros(w.shape) for w in self.weights]
# feedforward
activation = x
activations = [x] # list to store all the activations, layer by layer
zs = [] # list to store all the z vectors, layer by layer
for b, w in zip(self.biases, self.weights):
z = np.dot(w, activation)+b
zs.append(z)
activation = sigmoid(z)
activations.append(activation)
# backward pass
delta = self.cost_derivative(activations[-1], y) * \
sigmoid_prime(zs[-1])
nabla_b[-1] = delta
nabla_w[-1] = np.dot(delta, activations[-2].transpose())
# Note that the variable l in the loop below is used a little
# differently to the notation in Chapter 2 of the book. Here,
# l = 1 means the last layer of neurons, l = 2 is the
# second-last layer, and so on. It's a renumbering of the
# scheme in the book, used here to take advantage of the fact
# that Python can use negative indices in lists.
for l in xrange(2, self.num_layers):
z = zs[-l]
sp = sigmoid_prime(z)
delta = np.dot(self.weights[-l+1].transpose(), delta) * sp
nabla_b[-l] = delta
nabla_w[-l] = np.dot(delta, activations[-l-1].transpose())
return (nabla_b, nabla_w)

view raw
gistfile1.txt
hosted with ❤ by GitHub

Cost Derivative Function

So in gradient descent, you follow the negative of the gradient to the point where the cost is a minimum. If someone is talking about gradient descent in a machine learning context, the cost function is probably implied (it is the function to which you are applying the gradient descent algorithm).

def cost_derivative(self, output_activations, y):
"""Return the vector of partial derivatives \partial C_x /
\partial a for the output activations."""
return (output_activations-y)

view raw
gistfile1.txt
hosted with ❤ by GitHub

 Putting it all together – Network.py

"""
network.py
~~~~~~~~~~
A module to implement the stochastic gradient descent learning
algorithm for a feedforward neural network. Gradients are calculated
using backpropagation. Note that I have focused on making the code
simple, easily readable, and easily modifiable. It is not optimized,
and omits many desirable features.
"""
#### Libraries
# Standard library
import random
# Third-party libraries
import numpy as np
class Network(object):
def __init__(self, sizes):
"""The list “sizes“ contains the number of neurons in the
respective layers of the network. For example, if the list
was [2, 3, 1] then it would be a three-layer network, with the
first layer containing 2 neurons, the second layer 3 neurons,
and the third layer 1 neuron. The biases and weights for the
network are initialized randomly, using a Gaussian
distribution with mean 0, and variance 1. Note that the first
layer is assumed to be an input layer, and by convention we
won't set any biases for those neurons, since biases are only
ever used in computing the outputs from later layers."""
self.num_layers = len(sizes)
self.sizes = sizes
self.biases = [np.random.randn(y, 1) for y in sizes[1:]]
self.weights = [np.random.randn(y, x)
for x, y in zip(sizes[:1], sizes[1:])]
def feedforward(self, a):
"""Return the output of the network if “a“ is input."""
for b, w in zip(self.biases, self.weights):
a = sigmoid(np.dot(w, a)+b)
return a
def SGD(self, training_data, epochs, mini_batch_size, eta,
test_data=None):
"""Train the neural network using mini-batch stochastic
gradient descent. The “training_data“ is a list of tuples
“(x, y)“ representing the training inputs and the desired
outputs. The other non-optional parameters are
self-explanatory. If “test_data“ is provided then the
network will be evaluated against the test data after each
epoch, and partial progress printed out. This is useful for
tracking progress, but slows things down substantially."""
if test_data: n_test = len(test_data)
n = len(training_data)
for j in xrange(epochs):
random.shuffle(training_data)
mini_batches = [
training_data[k:k+mini_batch_size]
for k in xrange(0, n, mini_batch_size)]
for mini_batch in mini_batches:
self.update_mini_batch(mini_batch, eta)
if test_data:
print "Epoch {0}: {1} / {2}".format(
j, self.evaluate(test_data), n_test)
else:
print "Epoch {0} complete".format(j)
def update_mini_batch(self, mini_batch, eta):
"""Update the network's weights and biases by applying
gradient descent using backpropagation to a single mini batch.
The “mini_batch“ is a list of tuples “(x, y)“, and “eta“
is the learning rate."""
nabla_b = [np.zeros(b.shape) for b in self.biases]
nabla_w = [np.zeros(w.shape) for w in self.weights]
for x, y in mini_batch:
delta_nabla_b, delta_nabla_w = self.backprop(x, y)
nabla_b = [nb+dnb for nb, dnb in zip(nabla_b, delta_nabla_b)]
nabla_w = [nw+dnw for nw, dnw in zip(nabla_w, delta_nabla_w)]
self.weights = [w(eta/len(mini_batch))*nw
for w, nw in zip(self.weights, nabla_w)]
self.biases = [b(eta/len(mini_batch))*nb
for b, nb in zip(self.biases, nabla_b)]
def backprop(self, x, y):
"""Return a tuple “(nabla_b, nabla_w)“ representing the
gradient for the cost function C_x. “nabla_b“ and
“nabla_w“ are layer-by-layer lists of numpy arrays, similar
to “self.biases“ and “self.weights“."""
nabla_b = [np.zeros(b.shape) for b in self.biases]
nabla_w = [np.zeros(w.shape) for w in self.weights]
# feedforward
activation = x
activations = [x] # list to store all the activations, layer by layer
zs = [] # list to store all the z vectors, layer by layer
for b, w in zip(self.biases, self.weights):
z = np.dot(w, activation)+b
zs.append(z)
activation = sigmoid(z)
activations.append(activation)
# backward pass
delta = self.cost_derivative(activations[1], y) * \
sigmoid_prime(zs[1])
nabla_b[1] = delta
nabla_w[1] = np.dot(delta, activations[2].transpose())
# Note that the variable l in the loop below is used a little
# differently to the notation in Chapter 2 of the book. Here,
# l = 1 means the last layer of neurons, l = 2 is the
# second-last layer, and so on. It's a renumbering of the
# scheme in the book, used here to take advantage of the fact
# that Python can use negative indices in lists.
for l in xrange(2, self.num_layers):
z = zs[l]
sp = sigmoid_prime(z)
delta = np.dot(self.weights[l+1].transpose(), delta) * sp
nabla_b[l] = delta
nabla_w[l] = np.dot(delta, activations[l1].transpose())
return (nabla_b, nabla_w)
def evaluate(self, test_data):
"""Return the number of test inputs for which the neural
network outputs the correct result. Note that the neural
network's output is assumed to be the index of whichever
neuron in the final layer has the highest activation."""
test_results = [(np.argmax(self.feedforward(x)), y)
for (x, y) in test_data]
return sum(int(x == y) for (x, y) in test_results)
def cost_derivative(self, output_activations, y):
"""Return the vector of partial derivatives \partial C_x /
\partial a for the output activations."""
return (output_activationsy)
#### Miscellaneous functions
def sigmoid(z):
"""The sigmoid function."""
return 1.0/(1.0+np.exp(z))
def sigmoid_prime(z):
"""Derivative of the sigmoid function."""
return sigmoid(z)*(1sigmoid(z))

view raw
Network.py
hosted with ❤ by GitHub

Links

Flask Digits Classifier

"""
sudo pip3 install –upgrade https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.8.0rc0-cp34-cp34m-linux_x86_64.whl
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import shutil
from sklearn import datasets, metrics, cross_validation
from tensorflow.contrib import skflow
import numpy as np
from flask import Flask, abort, jsonify, request
#import cPickle as pickle
# basic toy nn pickled loader
#pkl_file = open('batch_model_2016_04_23.pkl', 'rb')
#latest_neural_network = pickle.load(pkl_file)
#pkl_file.close()
from sklearn import datasets, metrics, cross_validation
from tensorflow.contrib import skflow
iris = datasets.load_iris()
X_train, X_test, y_train, y_test = cross_validation.train_test_split(iris.data, iris.target,
test_size=0.2, random_state=42)
# trained batch offline saved skflow model (latest parameters and learned variables)
#classifier.save('/home/ubuntu/scratch/skflow_batch/batch_model_2016_04_23')
# restore skflow model from batch run 2016-04-23
new_classifier = skflow.TensorFlowEstimator.restore('/home/ubuntu/scratch/skflow_batch/batch_model_2016_04_23')
# check model load with test data
score = metrics.accuracy_score(y_test, new_classifier.predict(X_test))
print('Accuracy: {0:f}'.format(score))
app = Flask(__name__)
@app.route('/api', methods=['POST'])
def make_predict():
# incoming data converted from json
data = request.get_json(force=True)
# shove into array
predict_request = [data['sl'],data['sw'],data['pl'], data['pw']]
predict_request = np.array([predict_request])
# np array passed to toy neural network
# https://gist.github.com/bigsnarfdude/57ff7d6095f7ee83d4195d1fed26388b
y_hat = new_classifier.predict(predict_request)
output = str(y_hat[0])
# convert output to json
return jsonify(results=output)
if __name__ == '__main__':
app.run(port = 11111, debug = True)

view raw
gistfile1.txt
hosted with ❤ by GitHub

https://codelabs.developers.google.com/codelabs/tensorflow-for-poets/index.html#0

screen-shot-2016-10-14-at-6-47-02-pm

 

One response to “Neural Network from scratch in Python

  1. Pingback: Building your first neural network self driving car in Python | BigSnarf blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: