Chi Square Test for independence Lecture 38

Chi-Square Test for independence  

Lecture 38

Contingency Table

A table in which two variables, each of which has several levels, are classified in “r” rows and “c” columns is a powerful tool for analysing the two variables. This technique can be used to analyse categorical data or nominal data. The data presented in a contingency table is used to test the null hypothesis that the two variables are independent.

Let's have two variables, say A with r levels that’s A1, A2, …, Ai, …, Ar, and B with c levels that’s B1, B2, …, Bj, …, Bc. Then it can be arranged as:

The r×c contingency table is the name given to the above tabular arrangement. The observed data is represented by Oij. The expected value can be obtained as:
eij= (Ai)(Bj) / n

A 2 x 2 table is the smallest type of contingency table. In the 2x2 table, the observed frequency is shown as:

 

B1

B2

Total

A1

a

b

a+b

A2

c

d

c+d

Total

a+c

b+d

n


Chi-Square Test of Independence

The chi-square test of independence is a nonparametric test used to test the association between two categorical or nominal variables. The categorical data presented in a contingency table is used to test the hypothesis that the classification of two categorical variables, say A and B, are independent or not independent. This test is also known as the chi-square test of association.

Let the two variables, say A and B, be classified as:


The expected frequencies can be obtained as:

i. The null & alternative hypotheses may be stated as:

H0: The variable 1 and variable classification are independent vs. H1: The variable 1 and variable classification are not independent.

ii. The significance level: α

iii. The test statistic:

In the case of a 2x2 table, the test statistic will be:

A / B

B1

B2

Total

A1

a

b

(a+b)

A2

c

d

(c+d)

Total

(a+c)

(b+d)

n



iv. Critical Region:

Reject the null hypothesis if χ² ≥ χ²α (r-1)(c-1)
v. Computation
vi. Remarks.

Example 9.17: The operation in charge of a tire manufacturing company decides to know if the quality of work for the three daily shifts varies in any way. The officer carefully examines the 500 tires that he chooses at random. Every tire is labelled as excellent, satisfactory, and defective, and the shift that generated it is also noted. The shift and the state of the tire are two categorical variables. The following table provides a summary of the data. Test at a 10% significance level that the shift and state of the quality are associated.

Shift

Excellent

Satisfactory

Defective

1

106

124

2

2

68

86

1

3

38

72

3


Solution:

i. The null & alternative hypotheses may be stated as:

H0: The shifts and tiers label classification is not associated vs. H1: The shifts and tiers label classification associated

ii. The significance level: α = 0.10

iii. The test statistic:


iv. Reject the null hypothesis if χ² ≥ χ² 0.10(2) = 4.605

v. Computation:




vi. Remarks: The computed chi-square value falls in the rejection area; the sample data does not provide sufficient evidence to accept the null hypothesis at the 10% significance level. Thus, it is concluded that the shifts and the quality of tiers are associated.
Example 9.18: The following outcomes came from an experiment on how growth regulators affected muskmelon fruit setting. At the 5% significance level, determine if muskmelon fruit setting and growth regulator treatment are associated.

G / S

fruit set

Fruit not set

Treated

20

14

Control

8

25


Solution:

i. The null & alternative hypotheses may be stated as:

H0: The growth regulator and muskmelon fruit setting not associated 

vs. H1: The growth regulator and muskmelon fruit setting are associated.

ii. The significance level: α = 0.05

iii. The test statistic:

iv. Reject the null hypothesis if χ² ≥ χ²₀.₀₅(1) = 3.841
v. Computation:

G / S

fruit set

Fruit not set

Total

Treated

20

14

34

Control

8

25

33

Total

28

39

67

vi. Remarks: The computed chi-square value falls in the rejection area; the sample data does not provide sufficient evidence to accept the null hypothesis at the 5% significance level. Thus, it is concluded that the growth regulator and muskmelon fruit setting are associated.

Co efficient of Contingency Table

The chi-square statistic simply indicates the independence or association of two variables in classification. It provides no information regarding the degree of association, which is occasionally required. Karl Pearson has defined a coefficient "C," also referred to as the coefficient of contingency table, which measures the strength of association between two variables. This coefficient is given by:

k is the smaller one in r and c; e.g., in table 2x3, k = 2

The coefficient measures the strength of the association between two variables in classification.

When C = 0. There is complete independence.

And when C > 0. There will be association

Cramer’s Co efficient of Contingency Table

Cramer’s proposed another measure for the strength of association in a contingency table, given by:

Where n is the sample size and k is the smaller r or c.

The value of Q lies between 0 and 1.

When Q = 0, there will be no association.

           Q < 0.2, there will be weak association

           Q = 0.5; there will be moderate association

Q > 0.7, there is a strong association.

Example 9.19: Using the data of Example 9.17. Find the strength of the association.

Shifts

Excellent

Satisfactory

Defective

1

106

124

2

2

68

86

1

3

38

72

3

Solution:
Co efficient of Contingency Table
As the number of rows and number of columns are identical, so k = 3.

There is a weak association between shifts and quality of tiers.

Cramer’s Coefficient of Contingency Table

As Q < 0.2, there is a weak association between shifts and quality of tiers.

No comments:

Post a Comment

Moving Average Models (MA Models) Lecture 17

  Moving Average Models  (MA Models)  Lecture 17 The autoregressive model in which the current value 'yt' of the dependent variable ...