## Introduction

We have seen in previous article how we can make predictions using linear regression. In this article, we will dive deep into **generalized model of linear regression for classification instead of prediction**. Logistic regression is called linear classifier. It calculates probability of two classes between 0 and 1. We can simply classify an item if it’s probability score is less than 0.5 is classified in class 1 otherwise in class 2.

To derive formula for logistic regression and since it works on probability model we must give value of liner equation by logit (log odds).

The linear model is given by:

And Logit:

So, Logistic Regression formula is given by:

The graph formed by logistic regression is given below usually “s” shaped and values at Y-axis will always be between 0 and 1:

Now we have to find values of β_{0} and β_{1} but like linear regression there are no such straight formulas to calculate coefficients. There are many methods to estimate these coefficients i.e. Maximum likelihood method, Log likelihood method, Newton Raphson method etc. We will solve it using Newton Rapson Method.

## Newton Raphson Method

Newton Raphson method is an iterative process on Taylor’s expansion to find root of the graph, a root is a point where a graph intersects the x axis and which is our one of the coefficient. A coefficient can be given by Newton Raphson method:

Above β_{i} is one of the coefficient’s estimated value and β_{i-1} is value of same coefficient estimated in last iteration. We will be iterating same process until value of β_{i} is stabilized. Now we need to find first and second derivatives of function.

Likelihood function for Bernoulli’s Distribution:

**Note: I can substitute value of P in above equation and set it to zero, if I do so I can estimate values of β _{0} and β_{1}, by finding on which value of **

**β**

_{0}and β_{1}**function is maximum. And that method is called Maximum Likelihood Method. But finding such values will be very difficult or we might not be able to estimate correct values.**Now to simplifying this equation by taking log:

Substituting value of P and simplifying

Now first derivative will be

**Note: We can set above equation to zero and try finding out values of β0 and β1 and that method is called Log Likelihood Method.** But again, we will not be able to set it zero and solve it so taking second derivative and generalizing:

## Implementation

Now we are all set, we have derived probability function, first derivative and second derivative. Let’s get into python to implement the same. Create a class *LogisticRegression* and add some methods like below:

class LogisticRegression: def __init__(self, X): self.beta_old_i = [] #initializing b_i, always one additional coefficient than number of features of predictor #because eq β_0 + β_1*x having two coefficients β_0, β_1 where x has only one dimension self.beta_new_i = np.zeros(X.shape[1] + 1) #p(x) = e^(β_0 + β_1*x)/(1 + e^(β_0 + β_1*x)) def probabilityFun(self, X): z = np.dot(self.beta_new_i, X.T) p = math.e**z/(1 + math.e**z) return p #f'(β_j) = dl/d(β_j) = (i=1 to N)_Σ (y_i - p(x_i))*x_ij def firstDerivative(self, X, Y, P): firstDer = np.dot((Y-P), X) return firstDer #f''(β_k) = dl/d(β_j)d(β_k) = - (i=1 to N)_Σ x_ij*x_ik*p(x_i)*(1 - p(x_i)) def secondDerivative(self, X, P): probMul = P*(1-P) xMulp = np.array([x*y for (x,y) in zip(X, probMul)]) secondDer = -1*np.dot(xMulp.T,X) return secondDer #β_(i+1) = β_i - (f'(β_i))/(f''(β_i)) def newtonRaphson(self, firstDer, secondDer): self.beta_new_i = self.beta_old_i - np.dot(linalg.inv(secondDer), firstDer)

All functions have been defined as per derived formulas. Now let’s write method which does iterative process until coefficients are stabilized.

#training the model def fit(self, X, Y, maxIteration=50, diffThreshHold=10**-5): #adding one additional column since we will have additional coefficient X = np.c_[X, np.array([1]*X.shape[0])] iteration = 0 diffBetaList = [] while(list(self.beta_new_i) != list(self.beta_old_i)): self.beta_old_i = self.beta_new_i P = self.probabilityFun(X) firstDer = self.firstDerivative(X, Y, P) secondDer = self.secondDerivative(X, P) self.newtonRaphson(firstDer, secondDer) #difference between last calcuated coefficients and current coefficients diff = linalg.norm(self.beta_new_i - self.beta_old_i) diffBetaList.append(diff) iteration += 1 if(diff <= diffThreshHold or iteration > maxIteration): break return diffBetaList

Now create a predict and classify method to calculate probability and classification.

#predict probability any new data points def predict(self, X): X = np.c_[X, np.array([1]*X.shape[0])] probability = self.probabilityFun(X) return probability #classify based on provided classes def classify(self, X, dataClass): Y = self.predict(X) #if probability is less than 0.5 than categorized as class one else class two return [0 if item <= 0.05 else 1 for item in Y]

Finally training and testing above code using iris data, I will use only two classes of iris data:

iris = datasets.load_iris() #iris data is 50 each three classes so only taking to 100 for two classes x_train, x_test, y_train, y_test = train_test_split(iris['data'][:100], iris['target'][:100]) reg = LogisticRegression(x_train) reg.fit(x_train,y_train) pred = reg.classify(x_test, iris["target_names"][:2]) print("Accuracy: {:.2f}%".format(100*np.mean(pred == y_test)))

**Output**

Accuracy: 100.00%

## Comments: