INTRODUCTION TO REGRESSION Lecture 02

 

INTRODUCTION TO REGRESSION

Regression investigates the dependence of a variable on one or more independent variables and provides a probabilistic equation to be used for estimating / forecasting the average value of the dependent variable. Technically, the dependent variable is referred to as the response variable, and the independent variable is referred to as the regression or prediction.

For example, consider the opinions of various people on the image that was shared on social media.

The manner in which individual response is determined by


Thus, regression is a statistical technique utilizing a probabilistic model to quantify the relationship between a response variable and predictor (s).


Example 1:

If we take revenue of a firm as a response variable and spending on advertisement as a predictor. The regression model would take the following form:

Revenue= α + β (advertisement) + Error term

Y = α + β X + ϵ

α: represent total expected revenue when advertisement spending is zero.


The coefficient β represents the average change in total revenue when advertisement spending is increased by one unit (e.g., one dollar).

Example 2:

Researchers might administer various dosages of a certain drug to patients and observe how their blood pressure responds. Here blood pressure is taken as a response variable, and dosage is a predictor variable. The regression model would take the following form:

Blood Pressure Level = α + β (Dosage) + Error Term

Y = α + β X + ϵ

α represents the expected blood pressure when the dosage is zero, and β represents the average change in total blood pressure when the dosage is increased by one unit.

Example 3:

The revenue (let Y) is dependent on price (X1), advertising spending (X2), and so many other factors.

Y = α + β1 X1 + β2 X2 + ...

The variables that are relevant to the concerned phenomenon or problem but not considered in the model are deviated to the disturbance term, and the above model can be expressed as:

                                                            Y = α + β1 X1 +  ϵ

Where:

Y: Response variable (Dependent Variable)

X: Regressor/explanatory variable (independent variable). The independent variables will be fixed not random and should be scalar or categorical variables

α: Y – intercept (The intercept (often labelled as constant) is the point where the function crosses the y-axis.)

β1: Slope (It measure a unit change in Y when a unit change in X)

 ϵ: Disturbance term (error term). The disturbance term captures all those regressors that are relevant to the phenomenon but not included in the regression model.

Thus, regression is a statistical technique that quantifies the relationship between a response variable and predictor(s) using a probabilistic model.

Objectives

The regression model is used to establish the relationship of response with the predictor(s) by using theoretical or logical arguments and represent this relationship by a probabilistic equation. This probabilistic model is based on three components, which are:

Actual = Systematic Pattern + Deviation of Systematic Pattern from actual

1.      Establish if there is a relationship between response and predictor(s)

e.g., spending increases as income increases.

2. Statistical modelling for a phenomenon or activity and forecasting new observations. 


Simple Linear Regression Model

Simple linear regression investigates the dependence of a response variable on a single predictor variable.

Statistically A simple linear regression model can be expressed as:

Y = α + β X + ϵ

Y: :  Response variable

 X : Predictor variable

α : Intercept

 β:  Slope (Slope is the rate of change in y as x changes.)  and ϵ include all those variables which are not under consideration in the analysis (called disturbance or noise term). 

 The estimated model can be expressed as:

Y^ = a + b X


Assumptions of Simple Linear Regression Model

i. The relationship between a dependent variable and an independent variable shall be linear.

ii. The average disturbance value will be zero.

                                                                     E (ϵi ) = 0        for all i

iii..    The variance of the disturbance term is constant. This assumption is technically called homoscedasticity.

                                                          Var( ϵi )= σ2

iv. The disturbance terms are independent of each other. That’s there will be no serial correlation between  &  for all  i # j

                                                           E (ϵϵj) = 0 for all i not equal to j.

  v. The explanatory variable (X) is free or independent from error. This assumption is technically called a nonstochastic regressor.
E (X ϵ) = 0

vi. The disturbance term is normally distributed with mean zero and fixed variance.

                                                                  ϵi ~ N (0, σ2)

Properties of Regression Line

i. The regression lines always pass through the means of data.

ii. The sum of the difference between observed and estimated value (called residual) is always equal to zero.

iii. The sum of the squares of the difference between the observed and estimated value is the minimum.


Practice Question 1.1

A variety of summary statistics were collected for a small sample (8) of bivariate data, where the dependent variable was y and an independent variable was x.

X

5         6         8       10        12       13      15      16        17

Y

16      19      23      28        36       41       44     45        50

Find the regression line y on x and interpret the results of slope and intercept.

Solution: The OLS method can be used to calculate the regression line for the above data

Revenue= α + β (advertisement) + Error termY = α + β X + ϵ

No

X

Y

XY

1

2

3

4

5

6

7

8

9

5

6

8

10

12

13

15

16

17

16

19

23

28

36

41

44

45

50

80

114

184

280

432

533

660

720

850

25

36

64

100

144

169

225

256

289

102

302

3853

1308


As a = 1.47 is the value of Y, when X = 0 and b = 2.831, which indicate that the values of Y increase by 2.831 units for a unit increase in X.


Using SPSS:


Analysis & Output

3 comments:

Moving Average Models (MA Models) Lecture 17

  Moving Average Models  (MA Models)  Lecture 17 The autoregressive model in which the current value 'yt' of the dependent variable ...