Logistic Regression
Introduction
In linear regression model the response
variable is continuous and regressor(s) variable is fixed. The main objective of
using regression models is to obtain mean value of the response variable. Now
if the response variable (Y) is categorical variable (like sunny, cloudy and
rainy) or binary (like Yes / No or 1 / 0) and the regressors continuous,
discrete, binary or a combination of them and the regression model is used to
represent this kind of relationship is called logistic regression. In logistic regression the response
variable is qualitative and we interested to find the probability of something
happen. The logit model is graphically represented by sigmoid graph.
Let explain the logistic regression with the help of following hypothetical example. Let the performance of student in terms of pass or fail is depend on the number of study hours.
|
Study (hrs.)
|
0 - 1 |
1 – 2 |
2 – 3 |
3 – 4 |
4 – 5 |
5 – 6 |
6 - 7 |
7 – 8 |
|
Result |
Fail |
Fail |
Fail |
Fail |
Pass |
Pass |
Pass |
Pass |
|
|
0 |
0 |
0 |
0 |
1 |
1 |
1 |
1 |
If we use simple linear regression plot (scatter plot) for binary dependent variable, it is meaningless because some values lie in the bottom of the line and some the top of the line.
The best possible
way it to represent it by sigmoid graph.
Binomial Logistic Regression Model
If the probability of success is P(y) and the
probability of not success is ( 1 - P(y) )
The logit function is defined as:
logit P (y) = log Odds
Where:
Assumptions:
i.
The
response variable should be dichotomous i.e., Yes – No, Present – Absent, etc.
ii.
The
regressors should be continuous, discrete, categorical or combination of all.
iii.
The
response and regressor should be mutually and exhaustive categories.
iv.
There
will be a linear relationship between regressor and the logit transformation of
the response variable.
Estimation
of Binomial Logistic Regression Parameters
The simple form of logistic regression model is given by:
The p(y) is called
the predicted probability.
The maximum log likelihood function is used to estimate the
parameters of the model.
The following iterative procedure is used to estimate the
parameters of the model:
Update the P (y)
Continue this iteration for all values of independent variable
or P(y)
Loss Function
The loss function is defined as: the sum of squares
deviation of observed values and predicted values of the dependent variable.
Repeated the iteration process until loss function is minimised or
ideally zero. The values of model parameters updated until the loss function is
minimised.
Practice Question
Estimated the logistic parameters of the following data.
|
|
0 |
0 |
1 |
1 |
1 |
|
|
29 |
15 |
33 |
88 |
39 |
Solution: The logistic regression model
is given by;
Using iteration
procedure by taking
Step – I:
Taking
Step – II:
The loss function is
minimized. Stop the iteration procedure. The estimated logistic regression
model is given by;
- Read More; Introduction to Design of Experiment













No comments:
Post a Comment