Provide Information Regarding Statistics & Econometrics : January 2024

Confidence Interval Estimates of Regression Coefficients.

Confidence Interval Estimation

Regression Parameters

A confidence interval estimate for a simple linear regression line is based on sample statistics and their accompanying sampling distributions, with a statement indicating how confident, in terms of probability, the interval contains the population linear regression line. The probability associated with a confidence interval is 1-α or (1-α). Thus, the confidence interval is the distance between the two curves (dotted lines) and 1-α chance that the population linear regression line will lie within the space.

Confidence Level

The estimates are based on sample data and vary from samples drawn from the same population, and these estimates produce slightly different intervals. The confidence coefficient, or confidence level, is the percentage (probability) of them that will contain the population linear regression line or parameters of the model.

Confidence Interval for Intercept Parameter

Let α^ be the estimate of α computed from the values of a small random sample of size n selected from a bivariate normal population having mean "μα" and standard deviation "σα." The population mean and standard deviation are unknown, so replace them with their estimates.

The sampling distribution of α^ approaches the t-distribution with mean "μα^" and standard deviation "sα^"

α^ ~ t(α, sα^)

Where:

Thus, a (1 - α) % confidence interval estimate for α is given by

Confidence Interval for Slope Parameter

Let β^ be the estimate of β computed from the values of a small random sample of size n selected from a bivariate normal population having mean "μβ" and standard deviation "σβ." The population mean and standard deviation are unknown, so replace them with their estimates.

The sampling distribution of α^ approaches the t-distribution with mean "μβ^" and standard deviation "sβ^".

β^ ~ t(β, sβ^)

Where:

Thus, a (1 - α) % confidence interval estimate for β is given by

Confidence Interval for the Mean value of Response Variable

Let Y^ = α^ + β^ X0 be the estimate of Y = α + β X0 + ϵ at X = X0 computed from the values of a small random sample of size n. The sampling distribution of Y^ = α^ + β^ X₀ approaches the t-distribution with μY.X = α + β X and standard deviation σy.x.

Where:

The population standard deviation is unknown, so replace it with its estimate given below:

The test statistic:

A 100(1-α)% confidence interval is given by:

Practice Question 1.5

The age and systolic blood pressure of 100 people gave the following information:

∑X=4421, ∑Y=12130, ∑XY=542735, ∑X²=208349, ∑Y²=1498976

i. Compute the regression line, which is used to estimate the true value μY.X.

ii. Assume normality and construct a 95% confidence interval for α, β, and the true value of blood pressure for the age of 50 years.

iii. Predict blood pressure for the age of 50 years and compute the 95% confidence interval for this estimate.

Solution: The OLS method is using to estimate the parameters

i. Estimation of Regression Line

The estimated regression line is given by:

Y^ = α^ + β^ X

Y^ = 98.97 + 0.5015X

95% Confidence interval for α, β

1 - α = 0.95

α = 0.05

tα/2(n-2) = t 0.025(98) = 1.96

95% confidence interval for α

α^ ± tα/2(n-2) sα^

98.97 ± 1.96 (6.337)

98.97 ± 12.547

86.423 < α < 111.517

95% confidence interval for β

β^ ± tα/2(n-2) sβ^

0.5015 ± 1.96 (0.138)

0.5015 ± 0.271

0.2305 < β < 0.7725

iii. Prediction of blood pressure for the age 50 years, i.e., X0 = 50

Y^ = α^ + β^ X

Y^0 = 98.97 + 0.5015 x 50

Y^ = 124.23

95% prediction interval for true value

Y0 ± tα/2(n-2) sY^

124.23 ± 1.96 x 15.96

124.23 ± 31.40

92.83 ≤ μY ≤ 155.63

Read More: Goodness of Fit Test

Sampling Distribution of OLS Estimators

Sampling Distributions

The Distribution of Response variable “Y”

The assumption of the classical linear regression model is that the disturbance is independently, identically distributed normal with mean zero and fixed variance.

ϵi ~N(0, σ²)

The distribution of the response variable is dependent on the distribution of the disturbance term.

Consider the model

Y= α + βX + ϵ

The mean of the response variable is given by;

E(Y) = α + βX + E (ϵ)

E(Y) = α + βX

The variance of the response variable is given by;

V(Y) = E[Y - E(Y)] ^2

V(Y) = E[α + βX + ϵ - α - βX] ^2

V(Y) = E[ϵ]^2

V(Y) = σ²

The shape of the sampling distribution is dependent on the disturbance term "ϵ". As the disturbance term follows normal distribution, the sampling distribution of response variable "Y" follows normal distribution with mean (α + βX) and standard deviation (σ²).

The Sampling Distribution Regression Parameters

Consider a linear regression model;

Y= α + βX + ϵ

Where:

ϵi ~N(0, σ²)

The response variable follows a normal distribution with a mean (α + βX) and standard deviation (σ²). The estimate of the slope of the simple linear regression model is given by:

let

Where k follows the following properties.

a. ∑k = 0

b. ∑k X = 1

c. ∑ k² = 1/∑(X-X¯)²

The OLS estimate is expressed as:

β ^ =∑kY

Thus, the OLS estimate "β ^" is a linear function of the response variable.

The estimate of the intercept of the regression model;

α^ = Y¯ - β ^ X¯

α^=∑Y/n-β^ X−

α^=∑{(1/n - (X-X−) /

∑(X-X−)^2) X−} Y

α^=∑wY

Thus, the OLS estimate "α^" is a linear function of the response variable.

The Sampling Distribution of Mean value of Response Variable “Y” for a given value

Consider the simple linear regression model:

Y= α + βX + ϵ

The predicted value of Y is Y^0 X = X0, given by;

Y^0 = α^ + β^ X0

The mean of response variable for a specified value of X = X₀

E(Y^0) = E(α^ + β^ X0)

The mean of Y^0 is denoted by μY.X0.

μY.X0 = α + βX0

The variance of Y^0 is denoted by σ2Y.X0

Var (Y^0) = Var (α^ + β^ X0)

Var (Y^0) = Var (α^) + Var (β^ X0 ) + 2X0 Cov (α^, β^ X) ....(1)

We know that

Equation (1) becomes:

The sampling distribution of Y^0 approaches a normal distribution.

Sampling Distribution of an Individual value of Response Variable

The primary objective of a regression model is to forecast an individual value of a response variable Y₀ for a specified value of X = X₀. The following estimated equation is used to predict Y^0.

Y^0 = a + b X0

The true value of Y0 of Y (response variable) is given by:

Y0 = a + b X0 + ϵ0

The above equation satisfies the classical assumptions of OLS.

The mean of true value Y0 is

E (Y0) = E (a) + E (b) X0 + E (ϵ0)

E (Y0) = α + βX0

The variance of true value Y0 is

Var (Y0) = Var (a + bX0 + ϵ0)

Read More: Confidence Interval Estimates for slope and Intercept

Features of OLS Estimators

Gauss Markov Theorem

The Gauss Markov theorem says that, under certain conditions, the ordinary least squares (OLS) estimator of the coefficients of a linear regression model is the best linear unbiased estimator (BLUE) (Gauss, 1821).

Consider the simple linear Regression Model