Provide Information Regarding Statistics & Econometrics : Logistic Regression

Logistic Regression

Introduction

In linear regression model the response variable is continuous and regressor(s) variable is fixed. The main objective of using regression models is to obtain mean value of the response variable. Now if the response variable (Y) is categorical variable (like sunny, cloudy and rainy) or binary (like Yes / No or 1 / 0) and the regressors continuous, discrete, binary or a combination of them and the regression model is used to represent this kind of relationship is called logistic regression. In logistic regression the response variable is qualitative and we interested to find the probability of something happen. The logit model is graphically represented by sigmoid graph.

Let explain the logistic regression with the help of following hypothetical example. Let the performance of student in terms of pass or fail is depend on the number of study hours.

Study (hrs.)	0 - 1	1 – 2	2 – 3	3 – 4	4 – 5	5 – 6	6 - 7	7 – 8
Result	Fail	Fail	Fail	Fail	Pass	Pass	Pass	Pass
	0	0	0	0	1	1	1	1

If we use simple linear regression plot (scatter plot) for binary dependent variable, it is meaningless because some values lie in the bottom of the line and some the top of the line.

The best possible way it to represent it by sigmoid graph.

Binomial Logistic Regression Model

If the probability of success is P(y) and the probability of not success is ( 1 - P(y) ) of the dependent variable, then the odds is defined as:

The logit function is defined as:

logit P (y) = log Odds

Where:

log P (y) = β0 + β1 X

log Odds = β0 + β1 X

Odds = exp β0 + β1 X

P (y) = { 1 - P(y)} Odd

P (y) = { 1 - P(y)} exp β0 + β1 X

P (y) = exp β0 + β1 X - P (y) exp β0 + β1 X

P (y) + P (y) exp β0 + β1 X = exp β0 + β1 X

P (y) {1 + exp β0 + β1 X} = exp β0 + β1 X

Assumptions:

i. The response variable should be dichotomous i.e., Yes – No, Present – Absent, etc.

ii. The regressors should be continuous, discrete, categorical or combination of all.

iii. The response and regressor should be mutually and exhaustive categories.

iv. There will be a linear relationship between regressor and the logit transformation of the response variable.

Estimation of Binomial Logistic Regression Parameters

The simple form of logistic regression model is given by:

The p(y) is called the predicted probability.

The maximum log likelihood function is used to estimate the parameters of the model.

The following iterative procedure is used to estimate the parameters of the model:

Update the P (y) by substituting new values of β0 and .β1

Continue this iteration for all values of independent variable or P(y) is near toy.

Loss Function

The loss function is defined as: the sum of squares deviation of observed values and predicted values of the dependent variable.

Repeated the iteration process until loss function is minimised or ideally zero. The values of model parameters updated until the loss function is minimised.

Practice Question

Estimated the logistic parameters of the following data.