MULTICOLLINEARITY

 

MULTICOLLINEARITY

Multicollinearity is a statistical phenomenon of the presence of linear relationship among some or all predictor variables included in a multiple linear regression model.

The explanatory variables are assumed to be independent in a multiple linear regression model. That’s  and  are independent.




Let the salary (Y)   of an employee is regress on years of education  (X1) and skill level relevant to the work ,(X2) then it can be modelled as:


In this model, Years of schooling (X1)  and skill level appropriate to the job (X2)  have no direct correlation in this model. There is no col linearity, and the OLS method is utilised to estimate the parameters.

If this assumption is violated and there is a linear relationship between some or all predictor variables in a multiple linear regression model. The problem of multicollinearity is said to exist. 

In multicollinearity




Let the salary(Y)   of an employee is regress on years of education (X1) and skill level relevant to the work (X2)  , age (X3) and years of experience (X4),  then it can be modelled as:


n this model, we expect an employee's years of experience and age to be associated, and we expect a positive correlation between years of experience and age. Thus, X3 = a+bX4. Where a and b are constants. In this the OLS method cannot be employed to estimate the parameters of the above stated model.

Consider the General linear model:


The regression model with two regressors is given





Nature (Types) of Multicollinearity

We can make the distinction between two types of multicollinearity.

1.      Perfect multicollinearity.

2.      Imperfect multicollinearity. 


Perfect Multicollinearity 

If two or more predictor variables in an econometric model have precise linear relationship. Let we have the following regression model.

Imperfect Multicollinearity  

Near multicollinearity exists when there is an approximate linear relationship between two or more predictor variables including in an econometric model. The correlation between two predictor variables (r12) will be less than - 1 and +1.

Consider a multiple linear regression model with two predictors.

The relation between predictors included in the above regression model is

Diagrammatic representation of week and strong multicollinearity



The Reason(s) behind Multicollinearity's existence

1.      An over define econometric model

An over defined model is one that contains more explanatory variables than observations.

When a significant number of explanatory variables are included in the model to make it more realistic, the number of observations "n" becomes smaller than the number of explanatory variables "k".  Such a situation can develop in medical research when the number of patients is relatively small even information on a significant number of factors is obtained.

2.      The technique of data collection

When the researcher samples only a subspace of the predictor region, this data collecting strategy can lead to multicollinearity.

3.      Population and Model Restrictions Certain constraints may apply to the model or the population from which the sample is derived; for example, the predictors should be uncorrelated and only influence the response variable, not vice versa.

4.      Inclusion of Predictor variables that can computed from other Predictor variables

The inclusion of predictor(s) that can be calculated from other predictor variables included in a regression model can lead to the problem of multicollinearity. In a regression model, for example, investment income and saving income are used as predictor variables.

5.      Using the same variable twice

When two measures of the same concept are incorporated in an econometric model, the problem of multicollinearity arises. In a regression model, predictor variables such as weight in pounds and weight in kilograms are used.

6.      Dummy variable Trap

When categorical variables, such as gender (male / female), season (summer/winter/fall/spring), and so on, are included as independent variables in a regression model, they take values 0 and 1, signifying the lack or existence of the category.

Consider the model with dummy variable


The number of dummy variables cannot be greater than the number of categories. When we utilize an amount equal to the number of categories, we get multicollinearity.


If we use one dummy variable “D1  for male and another “D2 for female. This is known as a dummy variable trap. Multicollinearity is introduced into the model by the dummy variable trap.



 



No comments:

Post a Comment

Moving Average Models (MA Models) Lecture 17

  Moving Average Models  (MA Models)  Lecture 17 The autoregressive model in which the current value 'yt' of the dependent variable ...