Detection & Remedy of Multicollinearity

 

Detection of Multicollinearity

Two Approaches of Detecting Multicollinearity 

1. Symptoms.

2. Diagnostic techniques.

Symptoms of Multicollinearity

One of the following symptoms can indicate the presence of multicollinearity in a model.

1.1: Large Standard Error

The presence of large standard errors that are orders of magnitude larger than their coefficients is indication of harmful multicollinearity.

1.2: The estimated coefficients have the wrong sign

The opposite signs of the coefficients than expected from theory is another possible symptom of severe multicollinearity.

1.3: Estimated coefficient is subject to specification changes

Another sign of significant multicollinearity is when you add or remove an independent variable, or add or remove an observation, or both, and the coefficient estimates change dramatically.

1.4: Another possible symptom is insignificant t ratio or F ratio and high value of coefficient of determination.

Consider the model


Diagnostic Procedures

The diagnostic procedure involves three aspects of the detection of multicollinearity.

i.                    Determining its presence.

ii.                  Determining its severity

iii.                Determining its form or location.

Determinant of Correlation Coefficient

The simplest diagnostic approach is to compute the sample correlation coefficients between all pairs of independent variables in the sample. High correlation coefficients between pairs of explanatory variables indicate multicollinearity. This occurs when the correlation coefficient exceeds a certain threshold, i.e., 0.9.


 If IRI=0  then it indicates the existence of exact linear dependence among explanatory variables. Thus, a value close to 0 is an indication of high degree of multicollinearity. Any value of   between 0 and 1 gives an idea of the degree of multicollinearity.

Consider for three predictors





Practice Question: Consider the data given below:


Check Multicollinearity by Correlation matrix of data practice question - 1:

Step – 1: The correlation coefficient between X1 and X2 is given by:



Step – 2: The correlation matrix:



There is moderate multicollinearity.

Variance inflation factor (VIF)




The variance inflation factor is another way to express exactly the same information found in the coefficient of multiple determination. A variance inflation factor is computed for each independent variable, using the following formula:



Proceed for other regressors included in a regression model.

Practice Question: Consider the data given below:




Check the multicollinearity by VIF method.

Solution:

The three equations are;



Solving simultaneously, we have


The estimated regression model:








SPSS:



Significant auxiliary regressions

Auxiliary regressions are used to determine which explanatory variable is linearly related to other explanatory variables.

Consider a multiple linear regression model


Checking of multicollinearity by auxiliary regression (F test) of Previous Question 

The Condition Index

The magnitude of the eigenvalues of the correlation matrix of the regressors is another approach to test the degree of multicollinearity. A high degree of multicollinearity is indicated by a high degree of diversity among the eigenvalues.

 Two features of these eigenvalues are of interest:

• Eigenvalues of zero indicate exact collinearities. Therefore, very small eigenvalues indicate near linear dependencies or high degrees of multicollinearity.

• The square root of the ratio of the largest to the smallest eigenvalue is given by:


called the condition number and large condition number (say, 10 or more) indicates that relatively small changes in the data tend to produce large changes in the least-squares estimate. When a model is limited to two regressors, then k can be obtained as

Practice Question: Consider the data of previous question.


THE FARRAR AND GLAUBER TEST

Donald E. Farrar and Robert R. Glauber (1967) developed a three-test approach for finding multicollinearity. The first establishes whether collinearity exists, the second which regressors are collinear, and the third determines the type of multicollinearity. The authors offer the following based on the idea that X is multivariate normal:

1.      The chi square test is used to determine the presence of multicollinearity in a model with multiple predictor variables. 

In chi square test the basic hypotheses are tested as:

 H0: X, s is independent vs. H1: X, s is dependent.

The test statistic is to be used are:


Rejection of hypothesis declare that multicollinearity exist in the model. Where k is number of predictor variables.

1.      The F-test is used for the determination of collinear regressors:

The hypothesis is to be tested here as:



The test statistic is to be used:


 

Rejection of null hypothesis declare that  is linearly related with other predictors.

3. t- test: This test is used which variables are responsible for the existence of multicollinearity.

The hypothesis is to be tested here as;


The test statistic is to be used;



MEASURES OF RECOVERY

Several approaches for dealing with multicollinearity have been proposed. Some of them are as follows:

1.      Additional data collection

Multicollinearity is a sample feature; it is possible that the multicollinearity in one sample is Sevier but the multicollinearity in another sample involving the same predictor is not. Increasing sample size (more data) can sometimes be used to break up the presence of multicollinearity in the data.

However, due to economic constraints, the insertion of fresh data is not always possible.

The variance of the O L S estimate, for example, is given by;



2.      Variable Omission and Specification bias

When using two strongly related variables in a regression model. Dropping one predictor may be beneficial in dealing with the multicollinearity problem. The procedure of eliminating variables can be carried out based on some form of ordering of explanatory variables, for example, those variables with lower t-ratios or greater VIF can be dropped first.

For example: Suppose that we regress Y (intake of fruits and vegetables) on X1   (income) and X2   (total family income).



The problem of multicollinearity can be resolved. But dropping a variable from the model we may committing a specification bias, which reduced the predictive ability of the model.

3.      Transformation of variables

When a regression model depends on time series data, the predictor variables have a strong connection with the time trend, indicating multicollinearity. In this case, the first difference form of regression model eliminates multicollinearity completely or partially.

Assume we have a time series of data on consumption, income, and wealth. One cause for multicollinearity in such data is that both variables tend to change in the same direction over time. One approach to reducing this dependency is as follows:


The regression on the difference of successive values of the variables. The first difference regression model generally minimizes the severity of multicollinearity.

4.      RIDGE REGRESSION

(Remedial method of multicollinearity)

In multiple linear regression model, when the number of independent variables exceeds the number of observations the problem of multicollinearity is arise and the OLS estimates are indeterminate and their variances are undefined (in perfect multicollinearity) or the OLS estimates and their variances are sensitive and (in imperfect multicollinearity). To improve the OLS estimator procedure Horel and Kennard suggested a new technique called ridge regression technique to improve OLS estimates. The ridge regression technique is used to analyze multiple linear regression models that suffer from such kind of multicollinearity. Take a look at the general linear regression model.

 

The Gauss Markov’s theorem guarantee that the O L S estimates are unbiased and have minimum variance but there is no guarantee that this variance will be minimum. One way to deal with this type of situation is to drop the condition of unbiased and consider a biased estimator 


The ridge regression procedure attempt to overcome the problem of multicollinearity in the data by adding a small positive quantity “k” (called bias) to the diagonal terms of the matrix (XtX).


Where k is called RIDGE or biasing constant.


k is a positive quantity less than one (usually less than 0.3)

If k = 0,


 




Using ridge regression, the value of k should be chosen so that the reduction in the variance term is larger than the rise in the square bias. In such a case,

Several methods for estimating the ridge parameter "k" have been proposed. The usage of ridge trace is a popular subjective strategy.

The ridge is generated by graphing   versus k in (0, 1) for increasing values of k in (0, 1) i.e., . The value of k is decided at point where the ridge estimate has been settled. The goal is finding a fairly modest value of k at which Ridge estimator  is stable.


No comments:

Post a Comment

Moving Average Models (MA Models) Lecture 17

  Moving Average Models  (MA Models)  Lecture 17 The autoregressive model in which the current value 'yt' of the dependent variable ...