# A full explanation of logistic regression, mathematical background, classification, likelihood methods and Newton Raphson method

 Category: Machine Learning Tags: Logistic Regression Code Files

## Introduction

We have seen in previous article how we can make predictions using linear regression. In this article, we will dive deep into generalized model of linear regression for classification instead of prediction. Logistic regression is called linear classifier. It calculates probability of two classes between 0 and 1. We can simply classify an item if it’s probability score is less than 0.5 is classified in class 1 otherwise in class 2.

To derive formula for logistic regression and since it works on probability model we must give value of liner equation by logit (log odds).

The linear model is given by:

And Logit:

So, Logistic Regression formula is given by:

The graph formed by logistic regression is given below usually “s” shaped and values at Y-axis will always be between 0 and 1:

Now we have to find values of β0 and β1 but like linear regression there are no such straight formulas to calculate coefficients. There are many methods to estimate these coefficients i.e. Maximum likelihood method, Log likelihood method, Newton Raphson method etc. We will solve it using Newton Rapson Method.

## Newton Raphson Method

Newton Raphson method is an iterative process on Taylor’s expansion to find root of the graph, a root is a point where a graph intersects the x axis and which is our one of the coefficient. A coefficient can be given by Newton Raphson method:

Above βi is one of the coefficient’s estimated value and βi-1 is value of same coefficient estimated in last iteration. We will be iterating same process until value of βi is stabilized. Now we need to find first and second derivatives of function.

Likelihood function for Bernoulli’s Distribution:

Note: I can substitute value of P in above equation and set it to zero, if I do so I can estimate values of β0 and β1, by finding on which value of β0 and β1 function is maximum. And that method is called Maximum Likelihood Method.  But finding such values will be very difficult or we might not be able to estimate correct values. Now to simplifying this equation by taking log:

Substituting value of P and simplifying

Now first derivative will be

Note: We can set above equation to zero and try finding out values of β0 and β1 and that method is called Log Likelihood Method. But again, we will not be able to set it zero and solve it so taking second derivative and generalizing:

## Implementation

Now we are all set, we have derived probability function, first derivative and second derivative. Let’s get into python to implement the same. Create a class LogisticRegression and add some methods like below:

```class LogisticRegression:
def __init__(self, X):
self.beta_old_i = []
#initializing b_i, always one additional coefficient than number of features of predictor
#because eq β_0 + β_1*x having two coefficients β_0, β_1 where x has only one dimension
self.beta_new_i = np.zeros(X.shape + 1)

#p(x) = e^(β_0 + β_1*x)/(1 + e^(β_0 + β_1*x))
def probabilityFun(self, X):
z = np.dot(self.beta_new_i, X.T)
p = math.e**z/(1 + math.e**z)
return p

#f'(β_j) = dl/d(β_j) = (i=1 to N)_Σ (y_i - p(x_i))*x_ij
def firstDerivative(self, X, Y, P):
firstDer = np.dot((Y-P), X)
return firstDer

#f''(β_k) = dl/d(β_j)d(β_k) = - (i=1 to N)_Σ x_ij*x_ik*p(x_i)*(1 - p(x_i))
def secondDerivative(self, X, P):
probMul = P*(1-P)
xMulp = np.array([x*y for (x,y) in zip(X, probMul)])
secondDer = -1*np.dot(xMulp.T,X)
return secondDer

#β_(i+1) = β_i - (f'(β_i))/(f''(β_i))
def newtonRaphson(self, firstDer, secondDer):
self.beta_new_i = self.beta_old_i - np.dot(linalg.inv(secondDer), firstDer)```

All functions have been defined as per derived formulas. Now let’s write method which does iterative process until coefficients are stabilized.

```#training the model
def fit(self, X, Y, maxIteration=50, diffThreshHold=10**-5):
X = np.c_[X, np.array(*X.shape)]
iteration = 0
diffBetaList = []

while(list(self.beta_new_i) != list(self.beta_old_i)):
self.beta_old_i = self.beta_new_i
P = self.probabilityFun(X)
firstDer = self.firstDerivative(X, Y, P)
secondDer = self.secondDerivative(X, P)
self.newtonRaphson(firstDer, secondDer)
#difference between last calcuated coefficients and current coefficients
diff = linalg.norm(self.beta_new_i - self.beta_old_i)
diffBetaList.append(diff)
iteration += 1
if(diff <= diffThreshHold or iteration > maxIteration):
break

return diffBetaList```

Now create a predict and classify method to calculate probability and classification.

```#predict probability any new data points
def predict(self, X):
X = np.c_[X, np.array(*X.shape)]
probability = self.probabilityFun(X)
return probability

#classify based on provided classes
def classify(self, X, dataClass):
Y = self.predict(X)
#if probability is less than 0.5 than categorized as class one else class two
return [0 if item <= 0.05 else 1 for item in Y]```

Finally training and testing above code using iris data, I will use only two classes of iris data:

```iris = datasets.load_iris()
#iris data is 50 each three classes so only taking to 100 for two classes
x_train, x_test, y_train, y_test = train_test_split(iris['data'][:100], iris['target'][:100])
reg = LogisticRegression(x_train)
reg.fit(x_train,y_train)
pred = reg.classify(x_test, iris["target_names"][:2])
print("Accuracy: {:.2f}%".format(100*np.mean(pred == y_test)))```

Output

Accuracy: 100.00%

 Like 0 People Nikhil Joshi Ceo & Founder at Dotnetlovers Atricles: 132 Questions: 9 Given Best Solutions: 9 *

Login via:   x 