The Kolmogorov – Smirnov Test (K – S Test) Lecture 57

The Kolmogorov – Smirnov Test (K – S Test)

Lecture 57 

The Kolmogorov-Smirnov test is a non-parametric version of the chi-squared goodness of fit test. The Kolmogorov-Smirnov test is used to test whether the underlying probability distribution differs from a hypothesised distribution or whether the two distributions are significantly different. The Kolmogorov-Smirnov one-sample test was developed by a Russian mathematician, Andrey Nikolaevich Kolmogorov, and the two-sample test was developed by Vladimir T.N. Smirnov. Later on, both tests are combined due to similarities.

The Kolmogorov–Smirnov one-sample test

It is a non-parametric test alternative to the chi-square goodness-of-fit test. This test compares a cumulative distribution function based on a sample with some specified theoretical distribution from which the random sample has been selected.

Let Sn(X) denote the cumulative distribution based on a sample of n observations. That is,

Where k is the number of sample observations less than or equal to X, and let F0(X) be the hypothesised population cumulative distribution function. The test is based on maximum absolute difference, given below:

D = Max | Sn(X) - F0(X)|

OR

Another convenient test statistic is given below:

Reject the null hypothesis if Dn exceeds the table value.

The advantage of this test over the chi-square test is that it is applicable for small samples. 

The Kolmogorov–Smirnov two-sample test

To test the hypothesis that the two samples came from a specified theoretical distribution. Let Sn1(X) and Sn2(X) denote the cumulative relative frequency distributions of two independent samples of size n1 and n2. Then the Kolmogorov – Smirnov two-sample test is based on maximum difference D, defined by

If n1 and n2 are large (more than 40) for one-tailed test, the test statistic to use is


Example 13.16: An equal number of students from each of the 60 college streams participated in the study. Their intention to join the college's drama club was mentioned during our conversation.

Stream

B.sc

B.A

B.com

M.A

M.com

No. of students

5

9

11

16

19

Twelve pupils from each class were anticipated to join the drama club. To determine whether there are any differences between student classes about their intention to join a theatrical club, use the K-S test.

Solution: An equal number of selections from each stream means it follows a uniform distribution.

i. State the null and alternative hypotheses:

H0: The population distribution is uniform. (i.e., Fn(X) ≠ F0(X))

vs.

 H1: The population distribution is not uniform (i.e., Fn(X) ≠ F0(X)).

ii. The significance level; α = 0.04

iii. The Test statistic:

vi. Reject H0 when

v. Computation:

Class

Observed frequency

Cmf

Theoretical Frequency

Cmf

B.sc

5

5

12

12

BA

9

14

12

24

B. Com

11

25

12

36

MA

16

41

12

48

M. Com

19

60

12

60

Total

60

 

60

 



vi. Remarks: The computed K-S value falls in the rejection region; the sample data does not provide sufficient evidence to accept the null hypothesis. Thus, it is concluded that the sample data does not follow a uniform distribution.
Example 13.17: The following scores were obtained by rolling a die 10 times. 3, 4, 4, 2, 6, 6, 3, 4, 2, 5.

Use the K-S test to test at the 5% level of significance that the sample is drawn from a uniform distribution of integer values from 1 to 6.

Solution:

i. State the null and alternative hypotheses:

H0: The population distribution is uniform. (i.e., Fn(X) = F0(X))

vs.

 H1: The population distribution is not uniform (i.e., Fn(X) ≠ F0(X)).

ii. The significance level; α = 0.05

iii. The Test statistic:

vi. Reject H0 when D > 4.10
v. Computation:

X

Observed Frequency

Cmf

Theoretical Frequency

Cmf

2

2

2

2

2

3

2

4

2

4

4

3

7

2

6

5

1

8

2

8

6

2

10

2

10

Total

10

 

10

 

vi. Remarks: The K-S calculated value falls in the acceptance region; the sample does not provide sufficient evidence to reject the null hypothesis. Thus, it is concluded that the sample is selected from a uniform distribution.
Example 13.18: The data of two independent random samples selected from two populations are given below:

Measurement

Frequency 1

Frequency 2

A

4

5

B

11

3

C

5

9

D

7

6

E

2

2

Use the K-S test to test at the 5% level of significance that the two samples are drawn from identical populations.
Solution:

i. State the null and alternative hypotheses:

H0: The two samples selected from identical distribution (i.e., F1(X) = F2(X))

vs.

 H1: The two samples selected from not identical distribution (i.e., F1(X) ≠ F2(X)).

ii. The significance level; α = 0.05

iii. The Test statistic:

vi. Reject H0 when D > 0.183
 

Measurement

Frequency 1

Cmf

Frequency 2

cmf

A

4

4

5

5

B

11

15

3

8

C

5

20

9

17

D

7

27

6

23

E

2

29

2

25

 

29

 

26

 

vi. Remarks: The computed K-S value falls in the rejection region; the sample data does not provide sufficient evidence to accept the null hypothesis. Thus, it is concluded that both samples are not selected from identically distributed populations.
Example 13.19: The data of two independent random samples drawn from two populations is given below:

X

1.2

1.4

1.9

3.7

4.4

4.8

9.7

17.3

21.2

28.4

Y

5.6

6.5

6.6

6.9

9.2

10.4

10.6

19.3

 

 

Use the K-S test to test the hypothesis that the two sampled populations have identical distribution at a 5% significance level.

Solution:

i. State the null and alternative hypotheses:

H0: The two samples selected from identical distribution (i.e., F1(X) = F2(X))

vs.

 H1: The two samples selected from not identical distribution (i.e., F1(X) ≠ F2(X)).

ii. The significance level; α = 0.05

iii. The test statistic: K-N test

vi. Reject H0 when D > 0.645
v. Computation:

vi. Remarks: The K-N calculated value falls in the acceptance region; the sample data does not provide sufficient evidence to accept the null hypothesis. Thus, it is concluded that both samples are not selected from identically distributed populations.

No comments:

Post a Comment

Moving Average Models (MA Models) Lecture 17

  Moving Average Models  (MA Models)  Lecture 17 The autoregressive model in which the current value 'yt' of the dependent variable ...