Provide Information Regarding Statistics & Econometrics : MULTICOLLINEARITY

MULTICOLLINEARITY

Multicollinearity is a statistical phenomenon of the presence of linear relationship among some or all predictor variables included in a multiple linear regression model.

The explanatory variables are assumed to be independent in a multiple linear regression model. That’s and are independent.

Let the salary (Y) of an employee is regress on years of education (X1) and skill level relevant to the work ,(X2) then it can be modelled as:

In this model, Years of schooling (X1) and skill level appropriate to the job (X2) have no direct correlation in this model. There is no col linearity, and the OLS method is utilised to estimate the parameters.

If this assumption is violated and there is a linear relationship between some or all predictor variables in a multiple linear regression model. The problem of multicollinearity is said to exist.

In multicollinearity

Let the salary(Y) of an employee is regress on years of education (X1) and skill level relevant to the work (X2) , age (X3) and years of experience (X4), then it can be modelled as:

n this model, we expect an employee's years of experience and age to be associated, and we expect a positive correlation between years of experience and age. Thus, X3 = a+bX4. Where a and b are constants. In this the OLS method cannot be employed to estimate the parameters of the above stated model.

Consider the General linear model:

The regression model with two regressors is given

Nature (Types) of Multicollinearity

We can make the distinction between two types of multicollinearity.

1. Perfect multicollinearity.

2. Imperfect multicollinearity.

Perfect Multicollinearity

If two or more predictor variables in an econometric model have a precise linear relationship. Let we have the following regression model.

Imperfect Multicollinearity

Near multicollinearity exists when there is an approximate linear relationship between two or more predictor variables including in an econometric model. The correlation between two predictor variables (r12) will be less than - 1 and +1.

Consider a multiple linear regression model with two predictors.

The relation between predictors included in the above regression model is

Diagrammatic representation of week and strong multicollinearity

The Reason(s) behind Multicollinearity's existence

1. An over define econometric model

An over defined model is one that contains more explanatory variables than observations.

When a significant number of explanatory variables are included in the model to make it more realistic, the number of observations "n" becomes smaller than the number of explanatory variables "k". Such a situation can develop in medical research when the number of patients is relatively small even information on a significant number of factors is obtained.

2. The technique of data collection

When the researcher samples only a subspace of the predictor region, this data collecting strategy can lead to multicollinearity.

3. Population and Model Restrictions Certain constraints may apply to the model or the population from which the sample is derived; for example, the predictors should be uncorrelated and only influence the response variable, not vice versa.

4. Inclusion of Predictor variables that can computed from other Predictor variables

The inclusion of predictor(s) that can be calculated from other predictor variables included in a regression model can lead to the problem of multicollinearity. In a regression model, for example, investment income and saving income are used as predictor variables.

5. Using the same variable twice

When two measures of the same concept are incorporated in an econometric model, the problem of multicollinearity arises. In a regression model, predictor variables such as weight in pounds and weight in kilograms are used.

6. Dummy variable Trap

When categorical variables, such as gender (male / female), season (summer/winter/fall/spring), and so on, are included as independent variables in a regression model, they take values 0 and 1, signifying the lack or existence of the category.

Consider the model with dummy variable

The number of dummy variables cannot be greater than the number of categories. When we utilize an amount equal to the number of categories, we get multicollinearity.

If we use one dummy variable “D1 ” for male and another “D2 ” for female. This is known as a dummy variable trap. Multicollinearity is introduced into the model by the dummy variable trap.

Read More: Effects of Multicollinearity

Provide Information Regarding Statistics & Econometrics

MULTICOLLINEARITY

No comments:

Post a Comment

Moving Average Models (MA Models) Lecture 17

Report Abuse

Labels