Logistic Regression Explained

A full explanation of logistic regression, mathematical background, classification, likelihood methods and Newton Raphson method


Category: Machine Learning Tags: Python, Python 3, Engineering Mathematics

Logistic Regression Code Files

Introduction

    We have seen in previous article how we can make predictions using linear regression. In this article, we will dive deep into generalized model of linear regression for classification instead of prediction. Logistic regression is called linear classifier. It calculates probability of two classes between 0 and 1. We can simply classify an item if it’s probability score is less than 0.5 is classified in class 1 otherwise in class 2.

To derive formula for logistic regression and since it works on probability model we must give value of liner equation by logit (log odds).

The linear model is given by:

Linear Expression

And Logit:

Logit

So, Logistic Regression formula is given by:

Logistic Regression Formula

The graph formed by logistic regression is given below usually “s” shaped and values at Y-axis will always be between 0 and 1:

Logistic Regression
Logistic Regression

 

Now we have to find values of β0 and β1 but like linear regression there are no such straight formulas to calculate coefficients. There are many methods to estimate these coefficients i.e. Maximum likelihood method, Log likelihood method, Newton Raphson method etc. We will solve it using Newton Rapson Method.

Newton Raphson Method

    Newton Raphson method is an iterative process on Taylor’s expansion to find root of the graph, a root is a point where a graph intersects the x axis and which is our one of the coefficient. A coefficient can be given by Newton Raphson method:

Newton Raphson Method

Above βi is one of the coefficient’s estimated value and βi-1 is value of same coefficient estimated in last iteration. We will be iterating same process until value of βi is stabilized. Now we need to find first and second derivatives of function.

Likelihood function for Bernoulli’s Distribution:

Likelihood for Logistic Regression

Note: I can substitute value of P in above equation and set it to zero, if I do so I can estimate values of β0 and β1, by finding on which value of β0 and β1 function is maximum. And that method is called Maximum Likelihood Method.  But finding such values will be very difficult or we might not be able to estimate correct values. Now to simplifying this equation by taking log:

Log Likelihood of Logistic Regression

Substituting value of P and simplifying

Log Likelihood for Logistic Regression

Now first derivative will be

First derivative of Liner Regression

Note: We can set above equation to zero and try finding out values of β0 and β1 and that method is called Log Likelihood Method. But again, we will not be able to set it zero and solve it so taking second derivative and generalizing:

Second derivative of Logistic Regression

Implementation

    Now we are all set, we have derived probability function, first derivative and second derivative. Let’s get into python to implement the same. Create a class LogisticRegression and add some methods like below:

class LogisticRegression:
    def __init__(self, X):
        self.beta_old_i = []
        #initializing b_i, always one additional coefficient than number of features of predictor
        #because eq β_0 + β_1*x having two coefficients β_0, β_1 where x has only one dimension
        self.beta_new_i = np.zeros(X.shape[1] + 1)
    
    #p(x) = e^(β_0 + β_1*x)/(1 + e^(β_0 + β_1*x))
    def probabilityFun(self, X):
        z = np.dot(self.beta_new_i, X.T)
        p = math.e**z/(1 + math.e**z)
        return p

    #f'(β_j) = dl/d(β_j) = (i=1 to N)_Σ (y_i - p(x_i))*x_ij
    def firstDerivative(self, X, Y, P):
        firstDer = np.dot((Y-P), X)
        return firstDer

    #f''(β_k) = dl/d(β_j)d(β_k) = - (i=1 to N)_Σ x_ij*x_ik*p(x_i)*(1 - p(x_i))
    def secondDerivative(self, X, P):
        probMul = P*(1-P)
        xMulp = np.array([x*y for (x,y) in zip(X, probMul)])
        secondDer = -1*np.dot(xMulp.T,X)
        return secondDer

    #β_(i+1) = β_i - (f'(β_i))/(f''(β_i))
    def newtonRaphson(self, firstDer, secondDer):
        self.beta_new_i = self.beta_old_i - np.dot(linalg.inv(secondDer), firstDer)

All functions have been defined as per derived formulas. Now let’s write method which does iterative process until coefficients are stabilized.

#training the model
def fit(self, X, Y, maxIteration=50, diffThreshHold=10**-5):
    #adding one additional column since we will have additional coefficient
    X = np.c_[X, np.array([1]*X.shape[0])]
    iteration = 0
    diffBetaList = []

    while(list(self.beta_new_i) != list(self.beta_old_i)):
        self.beta_old_i = self.beta_new_i
        P = self.probabilityFun(X)
        firstDer = self.firstDerivative(X, Y, P)
        secondDer = self.secondDerivative(X, P)
        self.newtonRaphson(firstDer, secondDer)
        #difference between last calcuated coefficients and current coefficients
        diff = linalg.norm(self.beta_new_i - self.beta_old_i)
        diffBetaList.append(diff)
        iteration += 1
        if(diff <= diffThreshHold or iteration > maxIteration):
            break
    
    return diffBetaList

Now create a predict and classify method to calculate probability and classification.

#predict probability any new data points
def predict(self, X):
    X = np.c_[X, np.array([1]*X.shape[0])]
    probability = self.probabilityFun(X)
    return probability

#classify based on provided classes
def classify(self, X, dataClass):
    Y = self.predict(X)
    #if probability is less than 0.5 than categorized as class one else class two
    return [0 if item <= 0.05 else 1 for item in Y]

Finally training and testing above code using iris data, I will use only two classes of iris data:

iris = datasets.load_iris()
#iris data is 50 each three classes so only taking to 100 for two classes
x_train, x_test, y_train, y_test = train_test_split(iris['data'][:100], iris['target'][:100])
reg = LogisticRegression(x_train)
reg.fit(x_train,y_train)
pred = reg.classify(x_test, iris["target_names"][:2])
print("Accuracy: {:.2f}%".format(100*np.mean(pred == y_test)))

Output

Accuracy: 100.00%


Like 0 People
Last modified on 22 October 2018
Nikhil Joshi

Nikhil Joshi
Ceo & Founder at Dotnetlovers
Atricles: 127
Questions: 9
Given Best Solutions: 8 *

Comments:

No Comments Yet

You are not loggedin, please login or signup to add comments:

Existing User

Login via:

New User



x