Provide Information Regarding Statistics & Econometrics : Goodness of Fit Test Lecture 37

Goodness of Fit Test

Lecture 37

A goodness of fit test is a hypothesis test that determines whether the sample data matches or follows a theorised distribution or whether the sample data follow a particular theorised distribution. The theorised distributions are basic distributions like the binomial distribution, the Poisson distribution, the normal distribution, etc. The chi square goodness of fit test is used to test the null hypothesis. The observed data follow a hypothesised theoretical distribution.

The observed values are denoted by “O” based on sample data and represented by "O,” and the expected values are denoted by “E” based on theorised distribution. There are a number of tests to test the agreements between observed data and expected data, like the chi square test, the Kolmogorov-Smirnov test, and the Shapiro-Wilk test to check the goodness of fit test. If the variable continues, then the Kolmogorov-Smirnov test is used to test the agreement between observed and expected data. The Shapiro-Wilk test is used to test that the sample data follow a normal distribution. The chi square distribution is used when the variable is discrete, nominal, or categorical.

The following requirements are needed to apply the goodness of fit test.

1. The variable of interest will be nominal or categorical.

2. The sample data will be selected by simple random from the entire population.

3. The observed and expected frequencies will be at least 5.

4. The observed and expected frequencies less than 5 will be combined with a larger one.

The Pearson Chi Square Goodness of Test

Let O be the observed frequency of a random sample and E be the expected frequency based on a hypothesised theoretical distribution.

If it is desired to test H0: The data confirm a particular hypothesised theoretical distribution.

The following test statistic is used to test the above null hypothesis:

Where m: number of parameters to be estimated.

Testing procedure:

Example 9.14: 200 times, four identical six-sided dice are rolled. The number of dice with an even score on the top face is recorded at each roll. These are the outcomes.

No. of even scores	0	1	2	3	4
Frequency	10	41	70	57	22

Solution:

i. State the null and alternative hypotheses

The data follow binomial distribution Vs. The data does not follow binomial distribution

ii. The significance level: α =.05

iii. The test statistic:

iii. Critical Region:

Reject H0 when χ2 ≥ χ2 0.01(5-1-1) = 11.345

vi. Computation:

vi. Remarks: The chi square calculated value falls in the acceptance region; the sample data does not provide sufficient evidence to reject the null hypothesis. Thus, it is concluded that the sample data came from a binomial population.

Example 9.15: The following table shows the number of recoveries from scabies.

No. of Patients	0	1	2	3	4	5
No. of Recovery	180	173	69	20	6	2

Test the hypothesis that the data came from poisson distribution.

Solution:

i. State the null and alternative hypotheses

The data follow Poisson distribution vs. The data does not followPoisson distribution

ii. The significance level: α

iii. The test statistic:

iii. Critical Region:

Reject H0 when χ2 ≥ χ2 0.01(6-1-1) = 11.143

vi. Computation:

The 5^th cell frequency is smaller than 5, so combine with the immediate large frequency.

vi. Remarks: The chi square calculated value falls in the rejection region; the sample data does not provide sufficient evidence to accept the null hypothesis. Thus, it is concluded that the sample data do not come from a poisson population.