Testing Hypothesis about Homogeneity of Two Samples Lecture 40

 

Testing Hypothesis about Homogeneity of Two Samples

                 (Brandt-Snedecor Test)

Lecture 40

The goodness of fit test is used to test that the sample data came from a theoretical distribution, and the test of independence is to determine if the two variable classifications are independent. The goodness of fit and test of independence are not enough to portray a real picture. This can be improved to determine if the samples came from the same population or if both the populations are identical for a categorical variable. To determine if the samples came from the same population or the two populations have the same distribution, another test known as the test for homogeneity can be used. To calculate the test statistic for a test for homogeneity, the data will be displayed in a 2 x k contingency table.

To test the hypothesis that the two samples come from the same population of a single categorical variable or the samples are homogeneous. The Brandt-Snedecor test can be used to test the null hypothesis that two samples came from the same population.

Suppose we select two independent samples of size n from a population, and we wish to test the null hypothesis, whether the two samples are homogenous. The values of both samples presented below,

 

1

2

….

i

….

n

Total

Sample I

a1

a2

….

ai

….

an

A

Sample II

b1

b2

….

bi

….

bn

B

Total

c1

c2

….

ci

….

cn

N

To test H0: the two samples are homogenous, the Brandt Snedecore test is given by:

Testing Procedure:
i. State the null & alternative hypothesis
H0: The two samples came from the same population.
Vs.
H1: The two samples did not come from the same population.
ii. The significance level: α
iii. The test statistic: Brandt Snedcore Test
vi. Critical Region:
Reject H0 when χ²  χ²α(k-1)
v. Computation:
vi. Remarks.

Example 9.22: A random sample of 50 men and another sample of 50 women were asked about their educational backgrounds in a particular neighbourhood. They were divided into three categories classified as intermediate, associate diploma, and BS honour. The results are arranged in the table below:

 

Intermediate

Associate Diploma

Bs Honor

Total

Male

13

25

12

50

Female

23

20

7

50

Total

36

45

19

100


Test whether the male and female are homogenous in respect of educational levels at a 0.05 significance level.
Solution:
i. State the null & alternative hypothesis
H0: The male and female are homogenous.
Vs.
H1: The male and female are not homogenous.
ii. The significance level: α = 0.05
iii. The test statistic: Brandt Snedcore Test
vi. Critical Region:
Reject H0 when χ²  χ²0.05(2) = 5.99
v. Computation:

vi. Remarks: The computed chi-square value falls in the rejection region; the sample data does not provide sufficient evidence to reject the null hypothesis. Thus, it is concluded that the male and female are homogenous in respect of educational levels.

Example 9.23: Voters were surveyed before and after a recent earthquake to find out which of the three candidates they intended to vote for in the upcoming municipal council election. Has the situation changed since the earthquake? The survey’s results are displayed in the following table.

 

A

B

C

Before

167

128

135

After

214

197

225

Solution:
i. State the null & alternative hypothesis
H0: The voters intention is not to change before and after the earthquake.
Vs.
H1: The voters intention is change before and after the earthquake.
ii. The significance level: α = 0.05
iii. The test statistic: Brandt Snedcore Test
vi. Critical Region:
Reject H0 when χ²  χ²0.05(2) = 5.99
v. Computation:

 

A

B

C

Total

Before

167

128

135

430

After

214

197

225

636

Total

381

325

360

1066


vi. Remarks: The computed chi-square value falls in the rejection region; the sample data does not provide sufficient evidence to reject the null hypothesis. Thus, it is concluded that the before and after the intention of voters are homogenous.


Yates Continuity Correction & Fisher Exact Test Lecture 39

 

Yates Correction for Continuity

Lecture 39

Yates correction is a statistical technique to improve the precision of the chi-square test of independence of two variables classification presented in a contingency table. In chi-square approximation, the smaller cell frequencies (less than 5) combine with the larger one and reduce the chi-square degree of freedom. But in the case of 2x2 contingency, the smaller cell cannot combine with the larger because the chi-square table value is not available at zero degrees of freedom.

Facing such a situation, Frank Yates proposed the following continuity correction for the 2 x 2 table, which markedly enhanced the chi-square approximation.

The above modification in the chi-square approximation is known as Yates continuity correction and is applicable when there is a single degree of freedom.

The Frank Yates continuity correction for a 2 x 2 contingency table is given by:

 

B1

B2

Total

A1

a

b

a+b

A2

c

d

c+d

Total

a+c

b+d

n


Example 9.20: A study examined the relationship between blood group and disease severity. The results are displayed in the 2X2 contingency table that follows:

 

Blood Group

Severity of Disease

Normal

Sevier

A (+)

50

4

A (-)

36

10

Is there an association between blood group and the severity of the condition? Can you suggest applying Yates continuity correction?

Solution: The cell frequency is small (less than 5); it is suggested to apply Yates continuity correction.

i. The null and alternative hypotheses may be stated as:

H0: The blood group and the severity of the disease are not associated.

Vs.

H1: The blood group and the severity of the disease are associated.

ii. The significance level: α = 0.05

iii. The test statistic:

vi. Critical Region:

Reject H₀ when χ² ≥ χ²₀.₀₅(1) = 3.481

Computation:

vi. Remarks: The computed chi-square calculated value falls in the acceptance region; the sample data does not provide sufficient evidence to reject the null hypothesis. Thus, it is concluded that the blood group and the severity of the disease are not associated.

Fisher's Exact Test

In a 2x2 contingency table where the cell frequencies are small. The effectiveness of the chi-square approximation will be questioned to some extent. In response to these circumstances, R.A. Fisher, J.O. Irwin, and Frank Yates developed the Fisher exact test, which is a method for evaluating the hypothesis of independence in a contingency table with fairly small cell frequencies.

Procedure: First, identify the smaller cell frequency and then alter the cell frequency with the restriction that marginal frequencies are fixed.

If it is desired to test the null hypothesis, there is no association between the two variables classification.

A / B

B1

B2

Total

A1

a

b

(a+b)

A2

c

d

(c+d)

Total

(a+c)

(b+d)

n

Where the marginal cell frequencies are fixed, given by
It follows hypergeometric distribution with parameters n, a, and (a+b).
where: 
Population size is n;
 a is the sample success, 
(a+b) is the population success.
(a+c) is the sample size.

Assuming that d is the least frequency, the other possible tables are obtained by reducing d by unity, altering the cell frequencies of the other cells, and repeating the procedure till d becomes zero.  Then compute the probability of the observed and other possible tables.

Then the total probabilities, P = Pd + Pd-1 + Pd-2 + ⋯ + P0.

The test statistic for two-tailed tests:

χ² = 2P 
Reject H0 if χ² > α
The test statistic for one-tailed tests:
χ² = P
 Reject H0 if χ² > α
Example 9.21: A researcher wants to investigate if political party choice is associated with gender. 18 voters are selected at random and asked which political party they favour. The survey’s results are displayed in the following table:

 

Gender

Political Party

 

Total

A

B

Male

2

9

11

Female

4

3

7

Total

6

12

18

Solution:
i. State the null and alternative hypothesis
H0: The gender and political party affiliation are not associated.
Vs.
H0: The gender and political party affiliation are associated.
ii. The significance level: α = 0.05
iii. The test statistic: Fisher's Exact test
iv. Critical Region:
Reject H₀ when χ² ≥ 0.05
v. Computation:

 

Gender

Political Party

 

Total

A

B

Male

2

9

11

Female

4

3

7

Total

6

12

18


 

Gender

Political Party

 

Total

A

B

Male

1

10

11

Female

5

2

7

Total

6

12

18


 

Gender

Political Party

 

Total

A

B

Male

0

11

11

Female

6

1

7

Total

6

12

18

P = P2 + P1 + Pd0
P = 0.1036 + 0.0124 + 0.00037
P = 0.11637
χ² = 2P
χ² = 2 x 0.11637
χ² = 0.23274

vi. Remarks: The calculated value falls in the rejection region: the sample data does not provide sufficient evidence to accept the null hypothesis. Thus, it is concluded that the gender and political party affiliation are associated.

Moving Average Models (MA Models) Lecture 17

  Moving Average Models  (MA Models)  Lecture 17 The autoregressive model in which the current value 'yt' of the dependent variable ...