## Introduction

Neural network in artificial intelligence is a concept taken from human brain. You can think of neuron is a unit of memory which can hold a value between 0 and 1. Human brain might consist of billions of neurons and similarly ANN (artificial neural networks) may have thousands or millions of neurons.

Simplest form of a neural network is multilayer perceptron, similar to linear regression but has multiple layers between input and output layer, also MLP is feed forwarder where inputs from i^{th} layer will be passed on to (i+1)^{th} layer but never vice versa . A linear regression formula is given as following:

Here X is input variable, W is weight and B is bias. Equation can be visualized as below:

MLP will have multiple layers in between input and output layer, those layers we call hidden layers. Each layer may have number of neurons.

Fig-2 presents structure of a neural network. In fig-2 numbers of inputs are i or array of i elements, number of hidden layers are 1, number of neurons in hidden layer are j and number of outputs are k or array of k elements. i*j are numbers of weights between input layer and hidden layer, j*k are numbers of weights between hidden layer and output layer. j are number of bias added in calculation of each H_{j}, k are number of bias added in calculation of each Y_{k}.

As per fig-2 H_{0} is calculated based on all inputs [X_{0}-X_{i}], the weights among [X_{0}-X_{i}] and H_{0}. Similarly H_{1}, H_{2}...H_{j} is calculated. Output layer is again calculated based on final hidden layer [H0-Hj], weights among [H_{0}-H_{j}] and [Y_{0}-Y_{k}]. A bias is added on calculation of any of the neuron (i.e. H0), [b_{0}-b_{j}] are set of bias between input and hidden layer and [B_{0}-B_{k}] are bias between hidden layer and output layer.

Suppose you are training above model with image data of digits between 0-9. So input will be pixels of the image or you can say X_{0}-X_{i} are pixels extracted from image, if image size is 8*8 then there will be 64 pixels which will be converted to [X_{0}-X_{63}]. And similarly output layer must be in between 0-9 or Y_{0}-Y_{9} also each output node value can be between 0-1. In prediction one output node should be more than 0.5 and rest of all should be less than 0.5, so if Y_{5} is 0.8 and others are less than 0.5 we can say input image was of digit 5. If multiple output variables produce result more than 0.5 then model is not able to classify image correctly.

Let’s come back to mathematical equation and as per fig-2 and above explanation formula for H_{0} and H_{1} will be given by:

**There are H _{0}-H_{j} nodes in hidden layer and X_{0}-X_{i} are inputs, w_{ji} is weight between j^{th} hidden node and i^{th} input, b_{j} is bias for H_{j}. So the formula is driven for H_{j} is:**

**Activation function:**

H_{j} might produce any value since it is linear equation and we have to scale it between 0-1 or some fixed range. We apply a function on linear equation to normalize the results, there are some functions like tangent hyperbolic function (tanh), sigmoid, relu etc. After applying tangent on right part of above equation:

Output of tangent:

*tanh* will scale output value between -1 and 1 which will be easy to convert between 0-1 or keep as it is. If we use sigmoid It will scale it between 0-1 so it’s our choice which function do we use. node value less than 0.5 indicate node is inactive and grater than 0.5 indicate node is activated.

Similarly weights between hidden layers and output are W_{00}-W_{kj} and bias B_{0}-B_{k}. Formula for each output node will be given by:

## Implementation

We are going to use sci-kit’s digit data set and *MLPClassifier* for this implementation. We will try to achieve high accuracy by modifying number of hidden layers and neurons in each layer.

from sklearn import datasets import matplotlib.pyplot as plt digits = datasets.load_digits()

I have loaded the data set, let’s see the structure of data:

print(digits['data'].shape)

*(1797, 64)*

1797 images data available and each row size is 64 pixels (8*8 image size).

print('{}'.format(digits['data'][0]))

*[ 0. 0. 5. 13. 9. 1. 0. 0. 0. 0. 13. 15. 10. 15. 5. 0. 0. 3.*

* 15. 2. 0. 11. 8. 0. 0. 4. 12. 0. 0. 8. 8. 0. 0. 5. 8. 0.*

* 0. 9. 8. 0. 0. 4. 11. 0. 1. 12. 7. 0. 0. 2. 14. 5. 10. 12.*

* 0. 0. 0. 0. 6. 13. 10. 0. 0. 0.]*

All 64 pixels of first image.

plt.figure(1, figsize=(4,4)) plt.imshow(digits.images[0], cmap=plt.cm.gray_r, interpolation='nearest') plt.show()

print('{}'.format(digits['target']))

*[0 1 2 ... 8 9 8]*

Each image must be of a digit between 0 and 9. Now train this data with *MLPClassifier*.

from sklearn.neural_network import MLPClassifier from sklearn.model_selection import train_test_split import numpy as np hidden_layers = (60,) X_train, X_test, y_train, y_test = train_test_split(digits.data, digits.target) mlpClassifier = MLPClassifier(max_iter=1000, hidden_layer_sizes=hidden_layers, random_state=0) mlpClassifier.fit(X_train, y_train) predicted = mlpClassifier.predict(X_test) print("accuracy: {:.2f}%".format(100*np.mean(predicted == y_test)))

*accuracy: 94.00%*

I have hidden layer size (10,) means there is one layer and 10 neurons in the layer. Let’s increase number of layers in line 5 of above snippet:

hidden_layers = (10, 10)

*accuracy: 94.67%*

Nothing much improvement let’s increase number of neurons

hidden_layers = (60,)

*accuracy: 98.67%*

MLP is very basic form of neural networks and very easy to understand, there are different kinds of neural networks like CNN, RNN etc which basic concept is same except some additional/advanced steps are taken care to increase accuracy on different kind of data i.e. image, audio, text etc.

## Comments: