Detection of
Multicollinearity
Two Approaches of Detecting Multicollinearity
1. Symptoms.
2. Diagnostic techniques.
Symptoms of Multicollinearity
One of the following symptoms can indicate the presence of multicollinearity in a model.
1.1: Large Standard Error
The presence of large standard errors that are orders
of magnitude larger than their coefficients is indication of harmful
multicollinearity.
1.2: The estimated coefficients have the wrong sign
The opposite signs of the coefficients than expected from theory is another possible symptom of severe multicollinearity.
1.3: Estimated coefficient is subject to specification changes
Another sign of significant multicollinearity is when you add or remove an independent variable, or add or remove an observation, or both, and the coefficient estimates change dramatically.
1.4: Another possible symptom is insignificant t ratio or F ratio and high value of coefficient of determination.
Consider the model
Diagnostic Procedures
The diagnostic
procedure involves three aspects of the detection of multicollinearity.
i.
Determining
its presence.
ii.
Determining
its severity
iii.
Determining
its form or location.
Determinant of Correlation
Coefficient
The simplest diagnostic approach is to compute the sample correlation coefficients between all pairs of independent variables in the sample. High correlation coefficients between pairs of explanatory variables indicate multicollinearity. This occurs when the correlation coefficient exceeds a certain threshold, i.e., 0.9.
Check Multicollinearity by Correlation matrix of data practice question - 1:
Step – 1: The
correlation coefficient between X1
Step – 2: The
correlation matrix:
There is moderate multicollinearity.
Variance inflation
factor (VIF)
The variance inflation factor is another way to express
exactly the same information found in the coefficient of multiple
determination. A variance inflation factor is computed for each independent
variable, using the following formula:
Proceed for other
regressors included in a regression model.
Practice Question: Consider the data given below:
SPSS:
Significant auxiliary
regressions
Auxiliary regressions are used to determine which
explanatory variable is linearly related to other explanatory variables.
Consider a multiple linear regression model
The
Condition Index
The magnitude of the eigenvalues of the correlation matrix of the regressors is another approach to test the degree of multicollinearity.
A high degree of multicollinearity is indicated by a high degree of diversity among the eigenvalues.
Two features of these eigenvalues are of
interest:
• Eigenvalues
of zero indicate exact collinearities. Therefore, very small eigenvalues
indicate near linear dependencies or high degrees of multicollinearity.
• The square root of the ratio of the largest to the smallest eigenvalue is given by:
called the condition number and large condition number (say, 10 or more) indicates that relatively small changes in the data tend to produce large changes in the least-squares estimate. When a model is limited to two regressors, then k can be obtained as
Practice Question: Consider the data of previous question.
THE
FARRAR AND GLAUBER TEST
Donald E. Farrar and Robert R. Glauber (1967) developed a three-test approach for finding multicollinearity.
The first establishes whether collinearity exists, the second which regressors are collinear, and the third determines the type of multicollinearity.
The authors offer the following based on the idea that X is multivariate normal:
1. The
chi square test is used to determine the presence of multicollinearity in a
model with multiple predictor variables.
In
chi square test the basic hypotheses are tested as:
The test statistic is to be used are:
Rejection of hypothesis declare that multicollinearity exist in the model. Where k is number of predictor variables.
1.
The
F-test is used for the determination of collinear regressors:
The hypothesis is to be tested here as:
The test statistic is to be used:
Rejection of null
hypothesis declare that
3. t-
test: This test is used which variables are responsible for the existence of
multicollinearity.
The hypothesis is to be
tested here as;
The test statistic is to be used;
MEASURES OF RECOVERY
Several approaches for dealing with multicollinearity have been proposed.
Some of them are as follows:
1.
Additional data collection
Multicollinearity
is a sample feature; it is possible that the multicollinearity in one sample is
Sevier but the multicollinearity in another sample involving the same predictor
is not. Increasing sample size (more data) can sometimes be used to break up
the presence of multicollinearity in the data.
However,
due to economic constraints, the insertion of fresh data is not always
possible.
The
variance of the O L S estimate, for example, is given by;
2.
Variable Omission and Specification bias
When
using two strongly related variables in a regression model. Dropping one
predictor may be beneficial in dealing with the multicollinearity problem. The
procedure of eliminating variables can be carried out based on some form of
ordering of explanatory variables, for example, those variables with lower
t-ratios or greater VIF can be dropped first.
For example: Suppose that we regress Y (intake of fruits and vegetables)
on X1
The problem of multicollinearity can be resolved. But
dropping a variable from the model we may committing a specification bias,
which reduced the predictive ability of the model.
3.
Transformation of variables
When
a regression model depends on time series data, the predictor variables have a
strong connection with the time trend, indicating multicollinearity. In this
case, the first difference form of regression model eliminates
multicollinearity completely or partially.
Assume
we have a time series of data on consumption, income, and wealth. One cause for
multicollinearity in such data is that both variables tend to change in the
same direction over time. One approach to reducing this dependency is as
follows:
The
regression on the difference of successive values of the variables. The first
difference regression model generally minimizes the severity of
multicollinearity.
4.
RIDGE REGRESSION
(Remedial method of multicollinearity)
In
multiple linear regression model, when the number of independent variables
exceeds the number of observations the problem of multicollinearity is arise
and the OLS estimates are indeterminate and their variances are undefined (in
perfect multicollinearity) or the OLS estimates and their variances are
sensitive and (in imperfect multicollinearity). To improve the OLS estimator
procedure Horel and Kennard suggested a new technique called ridge regression
technique to improve OLS estimates. The ridge regression technique is used to
analyze multiple linear regression models that suffer from such kind of
multicollinearity. Take a look at the general linear regression model.
The ridge regression procedure attempt to overcome the problem of multicollinearity in the data by adding a small positive quantity “k” (called bias) to the diagonal terms of the matrix (XtX).
Where k is called RIDGE or biasing constant.
k is a positive quantity less than one (usually less
than 0.3)
If k = 0,
Several methods for estimating the ridge parameter "k" have been proposed.
The usage of ridge trace is a popular subjective strategy.
The ridge is generated by graphing
- Read More: Hetroscedasticity





































No comments:
Post a Comment