Goodness of Fit Test Lecture 37

 

Goodness of Fit Test

Lecture 37

A goodness of fit test is a hypothesis test that determines whether the sample data matches or follows a theorised distribution or whether the sample data follow a particular theorised distribution. The theorised distributions are basic distributions like the binomial distribution, the Poisson distribution, the normal distribution, etc. The chi square goodness of fit test is used to test the null hypothesis. The observed data follow a hypothesised theoretical distribution.

The observed values are denoted by “O” based on sample data and represented by "O,” and the expected values are denoted by “E” based on theorised distribution. There are a number of tests to test the agreements between observed data and expected data, like the chi square test, the Kolmogorov-Smirnov test, and the Shapiro-Wilk test to check the goodness of fit test. If the variable continues, then the Kolmogorov-Smirnov test is used to test the agreement between observed and expected data. The Shapiro-Wilk test is used to test that the sample data follow a normal distribution. The chi square distribution is used when the variable is discrete, nominal, or categorical.

The following requirements are needed to apply the goodness of fit test.

1. The variable of interest will be nominal or categorical.

2. The sample data will be selected by simple random from the entire population.

3. The observed and expected frequencies will be at least 5.

4. The observed and expected frequencies less than 5 will be combined with a larger one.

The Pearson Chi Square Goodness of Test

Let O be the observed frequency of a random sample and E be the expected frequency based on a hypothesised theoretical distribution.

If it is desired to test H0: The data confirm a particular hypothesised theoretical distribution.

The following test statistic is used to test the above null hypothesis:


Where m: number of parameters to be estimated.

Testing procedure:

Example 9.14: 200 times, four identical six-sided dice are rolled. The number of dice with an even score on the top face is recorded at each roll. These are the outcomes.

No. of even scores

0

1

2

3

4

Frequency

10

41

70

57

22

Solution:
i. State the null and alternative hypotheses
The data follow binomial distribution Vs. The data does not follow binomial distribution
ii. The significance level: α =.05
iii. The test statistic:

iii. Critical Region:
Reject H0 when χ2  χ2 0.01(5-1-1) = 11.345
vi. Computation:

vi. Remarks: The chi square calculated value falls in the acceptance region; the sample data does not provide sufficient evidence to reject the null hypothesis. Thus, it is concluded that the sample data came from a binomial population.

Example 9.15: The following table shows the number of recoveries from scabies.

No. of Patients

0

1

2

3

4

5

No. of Recovery

180

173

69

20

6

2

 Test the hypothesis that the data came from poisson distribution.

Solution:
i. State the null and alternative hypotheses
The data follow Poisson distribution vs. The data does not followPoisson distribution
ii. The significance level: α
iii. The test statistic:
iii. Critical Region:
Reject H0 when χ2  χ2 0.01(6-1-1) = 11.143
vi. Computation:

The 5th cell frequency is smaller than 5, so combine with the immediate large frequency. 


vi. Remarks: The chi square calculated value falls in the rejection region; the sample data does not provide sufficient evidence to accept the null hypothesis. Thus, it is concluded that the sample data do not come from a poisson population.

Example 9.16: The table below shows the results of weighting multicoloured birds in town.

Test the null hypothesis at 5%. The sample was drawn from a normal distribution with a mean of 520 and a standard deviation of 30.

Solution:
i. State the null and alternative hypotheses
The data follow normal distribution. The data does not follow normal distribution
ii. The significance level: α = 0.05
iii. The test statistic:
iii. Critical Region:
Reject H0 when χ2≥ χ2 0.01(4-1) = 7.815
vi. Computation:

vi. Remarks: The chi square calculated value falls in the rejection region; the sample data does not provide sufficient evidence to accept the null hypothesis. Thus, it is concluded that the sample data do not come from a normal population.




No comments:

Post a Comment

Moving Average Models (MA Models) Lecture 17

  Moving Average Models  (MA Models)  Lecture 17 The autoregressive model in which the current value 'yt' of the dependent variable ...