Dummy Variables

 

Dummy Variables

A dummy variable is a categorical used to represent the presence of an attribute, such as gender, race, political affiliation, etc., is denoted by the value 0 or 1. Technically, the presence of an attribute is denoted by 1 and the absence of an attribute is denoted by 0.


The category for which the value of dummy variable is 0 is called benchmark (also called base or reference category) and for which value 1 is called control category. The dummy variables are quantified by constructing artificial variables.

Dummy Variables in Regression Model

when the independent variable is categorical, a regression model employe a dummy variable. If there are k – categories, then (k – 1) dummy variables will be used. Dummy variables are used to make it possible to include multiple groups in a single regression and calculate the group intercept.  

If the wage of worker “Y” is dependent on the skill “X”, in regression it may state as follow:

                                                                 Y = α + β X + ϵ

In the above regression model skill is a dummy variable having two categories i.e., skill and unskilled. Introduce a single dummy variable to represent the two categories.

  Y = α + β D + ϵ

Where:

  ϵ: follow the classical assumptions of linear regression.

The mean wage of unskilled workers:

he mean wage of skilled workers:

The slope  tells us by how much the average wage of unskilled workers (which is ) is different from the average wage of skilled worker.

Features of Dummy Variables in Regression Model

Following are the features of dummy variable in a regression model:

1.      One dummy variable is sufficient to distinguish two categories.

2.      For k categories (k – 1) dummy variables are used and assign 0 to the bench mark category.

Suppose the demand Qd of a product is depend on quarter price Pt and quarter 1 is used as reference quarter.

Assign 0 to quarter 1 and the model can be written as: 


3.      The assigning of value 0 to the base or reference category and 1 to the other categories.

Application of Dummy Variables

1.      Dummy variable is used when the intercept of a regression model is change in different periods and other coefficients of independent variables remain unchanged. The change in intercept in different periods when other coefficients remain unchanged is called shift function.

Suppose we wish to study the aggregate consumption function”Ct ” from 2000 to 2010 in Pakistan is regress on aggregate income . In the year 2005 Pakistan start a war against militants and the economy is suffered. In this situation the inclusion of dummy variable is necessary because the aggregate consumption during war time is different from the aggregate consumption of normal time.

So, the aggregate consumption model can be written as:


2.      The dummy variables are used for measuring the changes in the parameters associated with regressors of a regression model.

Suppose we regress the consumption of a house hold on “ Ct” on income “ Yt but consumption pattern is depending on the number of children in a family. In the presence of children, the share of income on consumption is different from the share of income on consumption with no children.


3.      Dummy variables are helpful to isolate the seasonal component “S” from observed time series model “TCSI”. This process is called de - seasonalization. 

Consider the four quarters in the following regression equation:

 


A dummy variable can be introduced to shift a function from quarter to another quarter.


ANOVA Model

A statistical model in which all independent variables (regressors) are qualitative then such regression model is called ANOVA model.

Statistical form of ANOVA model:

Yt =β0 + β1 D1 + β2 D2 + ...+ ϵ

The ANOVA Model with one categorical variable can be written as:

                                                 Yt =β0 + β1 D1 + ϵ

In this model β0 is the average value of bench mark category and  is the difference between in average of two categories.

Let the wage “Yt ”  is depends on gender of person only. Gender is a dummy variable and the quality for which we looking is male. So, assign 1 if male and 0 for female because female is a reference category, then the model can be written as:

e.g., The average wage of male 

The average wage of female

 


βrepresenting the difference in average wage between male and female.

The ANOVA model with categorical variable can be written as:

Y = β0 + β1 D1 + β2 D2 + ϵ

In this model, Yt representing dependent variable (e.g. wage), D1 representing one quality (e.g. gender) and D2  representing second quality (e.g. marital status).


First identify the bench mark category. The categories for which value is 0.

In this model the bench mark is unmarried female because both have assigned 0 value.

β0 is intercept and representing value of  (average wage) for bench mark category (unmarried female).

 β1 represents the difference between male and female.

The average wage of male


 
The average wage of female

Irrespective of the martial status.

 βrepresenting the difference between married and unmarried.

The average wage of married

The average wage of unmarried

Irrespective of gender.

Now if consider both qualities (i.e., gender and marital status)

The average wage of unmarried female 
The average wage of married female

 
The average wage of unmarried male
The average wage of married male

ANCOVA Model

A regression model in which some variables are quantitative and some variables are qualitative, then such model is called ANCOVA model.

Y = β0 + β1 X + β2 D + ϵ


Let the wage “Y ” is depend on years of experience  and marital status and unmarried is used as bench mark. Then the regression (ANCOA) model in this case is given by:

Y = β0 + β1 X + β2 D + ϵ


The mean value of Y for control category (married), D = 1;

E( Y ) = β0 + β1 X + β2 ( 1 ) + ϵ

E (Y ) = (β0 +  β2) + β1 X + ϵ

The mean value of  for bench mark (unmarried), D = 0;

E ( Y ) = β0 + β1 X 

Practice Question

The wage of 12 workers in a stitching centre is given below:

No.

1

2

3

4

5

6

Worker

Male

Male

Male

Male

Male

Male

wage

11557

29387

31463

29554

2513

14952

No.

7

8

9

10

11

12

worker

Male

Female

Female

Female

Female

Female

Wage

11589

33328

36151

35448

32988

20437

Solution:

The policy of the stitching to give more pay to female workers to empower and encourage. Find predicted average wage for male and female. Test the significance difference between wages at 5 %.

Solution: A dummy variable is introduced and assign 0 to female worker and 1 to male worker.


No.

Wage ( Y )

D

YD

1

11557

1

11557

2

29387

1

29387

3

31463

1

31463

4

29554

1

29554

5

25137

1

25137

6

14952

1

14952

7

11589

0

0

8

33328

0

0

9

36151

0

0

10

35448

0

0

11

32988

0

0

12

20437

0

0

 

311991

6

142050




Reject the null hypothesis, if  I t I > t 0.025 (12)df = 2.306

 There is no significance difference between the wage of male and female workers.

SPSS output

Coefficientsa

Model

Unstandardized Coefficients

Standardized Coefficients

t

Sig.

B

Std. Error

Beta

1

(Constant)

28323.500

3772.682

 

7.508

.000

gender

-4648.500

5335.378

-.266

-.871

.404

a. Dependent Variable: wage

 

 

 

ANOVAa

Model

Sum of Squares

df

Mean Square

F

Sig.

1

Regression

64825656.750

1

64825656.750

.759

.404b

Residual

853987835.500

10

85398783.550

 

 

Total

918813492.250

11

 

 

 

a. Dependent Variable: wage

b. Predictors: (Constant), gender








No comments:

Post a Comment

Moving Average Models (MA Models) Lecture 17

  Moving Average Models  (MA Models)  Lecture 17 The autoregressive model in which the current value 'yt' of the dependent variable ...