Machine learning neural networks python

Содержание

Neural Network from scratch in Python
Layer by Layer
What every layer should implement
Forward propagation
Gradient Descent
Backward propagation
But why ∂E/∂X ?
Diagram to understand backpropagation
Abstract Base Class : Layer
Forward Propagation
Backward Propagation
Coding the Fully Connected Layer
Forward Propagation
Backward Propagation
Coding the Activation Layer
Network Class
Building Neural Networks
Solve XOR
Result
Solve MNIST
Result
GitHub Repository & Google Colab
Deep Learning with Python: Neural Networks (complete tutorial)
Build, Plot & Explain Artificial Neural Networks with TensorFlow
Summary
Setup

Neural Network from scratch in Python

In this post we will go through the mathematics of machine learning and code from scratch, in Python, a small library to build neural networks with a variety of layers (Fully Connected, Convolutional, etc.). Eventually, we will be able to create networks in a modular fashion:

I’m assuming you already have some knowledge about neural networks. The purpose here is not to explain why we make these models, but to show how to make a proper implementation.

Layer by Layer

We need to keep in mind the big picture here :

We feed input data into the neural network.
The data flows from layer to layer until we have the output.
Once we have the output, we can calculate the error which is a scalar.
Finally we can adjust a given parameter (weight or bias) by subtracting the derivative of the error with respect to the parameter itself.
We iterate through that process.

The most important step is the 4th. We want to be able to have as many layers as we want, and of any type. But if we modify/add/remove one layer from the network, the output of the network is going to change, which is going to change the error, which is going to change the derivative of the error with respect to the parameters. We need to be able to compute the derivatives regardless of the network architecture, regardless of the activation functions, regardless of the loss we use.

In order to achieve that, we must implement each layer separately.

What every layer should implement

Every layer that we might create (fully connected, convolutional, maxpooling, dropout, etc.) have at least 2 things in common: input and output data.

Forward propagation

We can already emphasize one important point which is: the output of one layer is the input of the next one.

This is called forward propagation. Essentially, we give the input data to the first layer, then the output of every layer becomes the input of the next layer until we reach the end of the network. By comparing the result of the network (Y) with the desired output (let’s say Y*), we can calculate en error E. The goal is to minimize that error by changing the parameters in the network. That is backward propagation (backpropagation).

Gradient Descent

This is a quick reminder, if you need to learn more about gradient descent there are tons of resources on the internet.

Basically, we want to change some parameter in the network (call it w) so that the total error E decreases. There is a clever way to do it (not randomly) which is the following :

Where α is a parameter in the range [0,1] that we set and that is called the learning rate. Anyway, the important thing here is ∂E/∂w (the derivative of E with respect to w). We need to be able to find the value of that expression for any parameter of the network regardless of its architecture.

Backward propagation

Suppose that we give a layer the derivative of the error with respect to its output (∂E/∂Y), then it must be able to provide the derivative of the error with respect to its input (∂E/∂X).

Remember that E is a scalar (a number) and X and Y are matrices.

Let’s forget about ∂E/∂X for now. The trick here, is that if we have access to ∂E/∂Y we can very easily calculate ∂E/∂W (if the layer has any trainable parameters) without knowing anything about the network architecture ! We simply use the chain rule :

The unknown is ∂y_j/∂w which totally depends on how the layer is computing its output. So if every layer have access to ∂E/∂Y, where Y is its own output, then we can update our parameters !

But why ∂E/∂X ?

Don’t forget, the output of one layer is the input of the next layer. Which means ∂E/∂X for one layer is ∂E/∂Y for the previous layer ! That’s it ! It’s just a clever way to propagate the error ! Again, we can use the chain rule :

This is very important, it’s the key to understand backpropagation ! After that, we’ll be able to code a Deep Convolutional Neural Network from scratch in no time !

Diagram to understand backpropagation

This is what I described earlier. Layer 3 is going to update its parameters using ∂E/∂Y, and is then going to pass ∂E/∂H2 to the previous layer, which is its own “∂E/∂Y”. Layer 2 is then going to do the same, and so on and so forth.

This may seem abstract here, but it will get very clear when we will apply this to a specific type of layer. Speaking of abstract, now is a good time to write our first python class.

Abstract Base Class : Layer

The abstract class Layer, which all other layers will inherit from, handles simple properties which are an input, an output, and both a forward and backward methods.

Forward Propagation

The value of each output neuron can be calculated as the following :

With matrices, we can compute this formula for every output neuron in one shot using a dot product :

We’re done with the forward pass. Now let’s do the backward pass of the FC layer.

Note that I’m not using any activation function yet, that’s because we will implement it in a separate layer!

Backward Propagation

As we said, suppose we have a matrix containing the derivative of the error with respect to that layer’s output (∂E/∂Y). We need :

The derivative of the error with respect to the parameters (∂E/∂W, ∂E/∂B)
The derivative of the error with respect to the input (∂E/∂X)

Let’s calculate ∂E/∂W. This matrix should be the same size as W itself : ixj where i is the number of input neurons and j the number of output neurons. We need one gradient for every weight :

Using the chain rule stated earlier, we can write :

That’s it we have the first formula to update the weights! Now let’s calculate ∂E/∂B.

Again ∂E/∂B needs to be of the same size as B itself, one gradient per bias. We can use the chain rule again :

Now that we have ∂E/∂W and ∂E/∂B, we are left with ∂E/∂X which is very important as it will “act” as ∂E/∂Y for the layer before that one.

Again, using the chain rule,

Finally, we can write the whole matrix :

That’s it! We have the three formulas we needed for the FC layer!

Coding the Fully Connected Layer

We can now write some python code to bring this math to life!

Forward Propagation

As you will see, it is quite straightforward. For a given input X , the output is simply the activation function applied to every element of X . Which means input and output have the same dimensions.

Backward Propagation

Given ∂E/∂Y, we want to calculate ∂E/∂X.

Be careful, here we are using an element-wise multiplication between the two matrices (whereas in the formulas above, it was a dot product).

Coding the Activation Layer

The code for the activation layer is as straightforward.

Where y* and y denotes desired output and actual output respectively. You can think of the loss as a last layer which takes all the output neurons and squashes them into one single neuron. What we need now, as for every other layer, is to define ∂E/∂Y. Except now, we finally reached E !

These are simply two python functions that you can put in a separate file. They will be used when creating the network.

Network Class

Almost done ! We are going to make a Network class to create neural networks very easily akin the first picture !

I commented almost every part of the code, it shouldn’t be too complicated to understand if you grasped the previous steps. Nevertheless, leave a comment if you have any question, I will gladly answer !

Building Neural Networks

Finally ! We can use our class to create a neural network with as many layers as we want ! We are going to build two neural networks : a simple XOR and a MNIST solver.

Solve XOR

Starting with XOR is always important as it’s a simple way to tell if the network is learning anything at all.

I don’t think I need to emphasize many things. Just be careful with the training data, you should always have the sample dimension first. For example here, the input shape is (4,1,2).

Result

$ python xor.py 
epoch 1/1000 error=0.322980
epoch 2/1000 error=0.311174
epoch 3/1000 error=0.307195
. 
epoch 998/1000 error=0.000243
epoch 999/1000 error=0.000242
epoch 1000/1000 error=0.000242[ 
array([[ 0.00077435]]), 
array([[ 0.97760742]]), 
array([[ 0.97847793]]), 
array([[-0.00131305]])
]

Clearly this is working, great ! We can now solve something more interesting, let’s solve MNIST !

Solve MNIST

We didn’t implemented the Convolutional Layer but this is not a problem. All we need to do is to reshape our data so that it can fit into a Fully Connected Layer.

MNIST Dataset consists of images of digits from 0 to 9, of shape 28x28x1. The goal is to predict what digit is drawn on a picture.

Result

$ python example_mnist_fc.py
epoch 1/30 error=0.238658
epoch 2/30 error=0.093187
epoch 3/30 error=0.073039
. 
epoch 28/30 error=0.011636
epoch 29/30 error=0.011306
epoch 30/30 error=0.010901predicted values : 
[ 
array([[ 0.119, 0.084 , -0.081, 0.084, -0.068, 0.011, 0.057, 0.976, -0.042, -0.0462]]), 
array([[ 0.071, 0.211, 0.501 , 0.058, -0.020, 0.175, 0.057 , 0.037, 0.020, 0.107]]), 
array([[ 1.197e-01, 8.794e-01, -4.410e-04, 4.407e-02, -4.213e-02, 5.300e-02, 5.581e-02, 8.255e-02, -1.182e-01, 9.888e-02]])
]
true values : 
[[0. 0. 0. 0. 0. 0. 0. 1. 0. 0.] 
[0. 0. 1. 0. 0. 0. 0. 0. 0. 0.] 
[0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]]

This is working perfectly ! Amazing 🙂

GitHub Repository & Google Colab

You can find the whole working code used for this post in the following GitHub repository, and Google Colab file. It also contains the code for other layers like Convolutional or Flatten.

Источник

Deep Learning with Python: Neural Networks (complete tutorial)

Build, Plot & Explain Artificial Neural Networks with TensorFlow

Summary

In this article, I will show how to build Neural Networks with Python and how to explain Deep Learning to the Business using visualization and creating an explainer for model predictions.

Deep Learning is a type of machine learning that imitates the way humans gain certain types of knowledge, and it got more popular over the years compared to standard models. While traditional algorithms are linear, Deep Learning models, generally Neural Networks, are stacked in a hierarchy of increasing complexity and abstraction (therefore the “deep” in Deep Learning).

Neural Networks are based on a collection of connected units (neurons), which, just like the synapses in a brain, can transmit a signal to other neurons, so that, acting like interconnected brain cells, they can learn and make decisions in a more human-like manner.

Today, Deep Learning is so popular that many companies want to use it even though they don’t fully understand it. Often data scientists, first have to simplify these complex algorithms for the Business, and then explain and justify the results of the models, which is not always simple with Neural Networks. I think the best way to do it is through visualization.

I will present some useful Python code that can be easily applied in other similar cases (just copy, paste, run) and walk through every line of code with comments so that you can replicate the examples.

In particular, I will go through:

Environment Setup, tensorflow vs pytorch.
Artificial Neural Networks breakdown, input, output, hidden layers, activation functions.
Deep Learning with deep neural networks.
Model design with tensorflow/keras.
Visualization of Neural Networks with python.
Model training & testing.
Explainability with shap.

Setup

There are two main libraries for building Neural Networks: TensorFlow (developed by Google) and PyTorch…

Источник