Neural Networks and MLP

Introduction to Neural networks and MLP(multilayer perceptron)

Category: Deep Learning Tags: Python, Python 3



    Neural network in artificial intelligence is a concept taken from human brain. You can think of neuron is a unit of memory which can hold a value between 0 and 1. Human brain might consist of billions of neurons and similarly ANN (artificial neural networks) may have thousands or millions of neurons.

Simplest form of a neural network is multilayer perceptron, similar to linear regression but has multiple layers between input and output layer, also MLP is feed forwarder where inputs from ith layer will be passed on to (i+1)th layer but never vice versa . A linear regression formula is given as following:

Linear regression




Here X is input variable, W is weight and B is bias. Equation can be visualized as below:

Linear regression
Fig 1: Linear regression


MLP will have multiple layers in between input and output layer, those layers we call hidden layers. Each layer may have number of neurons.

Neural Network
Fig 2: Neural Network


    Fig-2 presents structure of a neural network. In fig-2 numbers of inputs are i or array of i elements, number of hidden layers are 1, number of neurons in hidden layer are j and number of outputs are k or array of k elements. i*j are numbers of weights between input layer and hidden layer, j*k are numbers of weights between hidden layer and output layer. j are number of bias added in calculation of each Hj, k are number of bias added in calculation of each Yk.

As per fig-2 H0 is calculated based on all inputs [X0-Xi], the weights among [X0-Xi] and H0. Similarly H1, H2...Hj is calculated. Output layer is again calculated based on final hidden layer [H0-Hj], weights among [H0-Hj] and [Y0-Yk]. A bias is added on calculation of any of the neuron (i.e. H0), [b0-bj] are set of bias between input and hidden layer and [B0-Bk] are bias between hidden layer and output layer.

Suppose you are training above model with image data of digits between 0-9. So input will be pixels of the image or you can say X0-Xi are pixels extracted from image, if image size is 8*8 then there will be 64 pixels which will be converted to [X0-X63]. And similarly output layer must be in between 0-9 or Y0-Y9 also each output node value can be between 0-1. In prediction one output node should be more than 0.5 and rest of all should be less than 0.5, so if Y5 is 0.8 and others are less than 0.5 we can say input image was of digit 5. If multiple output variables produce result more than 0.5 then model is not able to classify image correctly.

Let’s come back to mathematical equation and as per fig-2 and above explanation formula for H0 and H1 will be given by:

Hidden layer equation




There are H0-Hj nodes in hidden layer and X0-Xi are inputs, wji is weight between jth hidden node and ith input, bj is bias for Hj. So the formula is driven for Hj is:

Hidden layer equation




Activation function:

Hj might produce any value since it is linear equation and we have to scale it between 0-1 or some fixed range. We apply a function on linear equation to normalize the results, there are some functions like tangent hyperbolic function (tanh), sigmoid, relu etc. After applying tangent on right part of above equation:

Activation function




Output of tangent:

Tangent hyperbolic function
Fig 3: Tangent hyperbolic function


tanh will scale output value between -1 and 1 which will be easy to convert between 0-1 or keep as it is. If we use sigmoid It will scale it between 0-1 so it’s our choice which function do we use. node value less than 0.5 indicate node is inactive and grater than 0.5 indicate node is activated.

Similarly weights between hidden layers and output are W00-Wkj and bias B0-Bk. Formula for each output node will be given by:

Output layer equation





    We are going to use sci-kit’s digit data set and MLPClassifier for this implementation. We will try to achieve high accuracy by modifying number of hidden layers and neurons in each layer.

from sklearn import datasets
import matplotlib.pyplot as plt
digits = datasets.load_digits()

I have loaded the data set, let’s see the structure of data:


(1797, 64)

1797 images data available and each row size is 64 pixels (8*8 image size).


[ 0.  0.  5. 13.  9.  1.  0.  0.  0.  0. 13. 15. 10. 15.  5.  0.  0.  3.

 15.  2.  0. 11.  8.  0.  0.  4. 12.  0.  0.  8.  8.  0.  0.  5.  8.  0.

  0.  9.  8.  0.  0.  4. 11.  0.  1. 12.  7.  0.  0.  2. 14.  5. 10. 12.

  0.  0.  0.  0.  6. 13. 10.  0.  0.  0.]

All 64 pixels of first image.

plt.figure(1, figsize=(4,4))
plt.imshow(digits.images[0],, interpolation='nearest')
Digit zero











[0 1 2 ... 8 9 8]

Each image must be of a digit between 0 and 9. Now train this data with MLPClassifier.

from sklearn.neural_network import MLPClassifier
from sklearn.model_selection import train_test_split
import numpy as np

hidden_layers = (60,)

X_train, X_test, y_train, y_test = train_test_split(,
mlpClassifier = MLPClassifier(max_iter=1000, hidden_layer_sizes=hidden_layers, random_state=0), y_train)
predicted = mlpClassifier.predict(X_test)
print("accuracy: {:.2f}%".format(100*np.mean(predicted == y_test)))

accuracy: 94.00%

I have hidden layer size (10,) means there is one layer and 10 neurons in the layer. Let’s increase number of layers in line 5 of above snippet:

hidden_layers = (10, 10)

accuracy: 94.67%

Nothing much improvement let’s increase number of neurons

hidden_layers = (60,)

accuracy: 98.67%

MLP is very basic form of neural networks and very easy to understand, there are different kinds of neural networks like CNN, RNN etc which basic concept is same except some additional/advanced steps are taken care to increase accuracy on different kind of data i.e. image, audio, text etc.

Like 0 People
Last modified on 15 March 2020
Nikhil Joshi

Nikhil Joshi
Ceo & Founder at Dotnetlovers
Atricles: 136
Questions: 12
Given Best Solutions: 12 *


No Comments Yet

You are not loggedin, please login or signup to add comments:

Existing User

Login via:

New User