The Will-Coxon Rank Sum Test Lecture 52

 The Wilcoxon Rank Sum Test

 Lecture 52

The Wilcoxon signed-rank test is applicable when the observations of two samples are dependent, meaning matched or paired. The Wilcoxon's rank sum test is an improvement over the Wilcoxon's signed rank test. The Wilcoxon rank-sum test is a nonparametric counterpart of the student’s t two independent samples test or non-matched paired. The Wilcoxon rank sum test is similar to the Mann-Whitney U test. Let’s review the student’s two-sample t-test assumptions for comparing two population means:

i. The observations of both samples are independent.

ii. The sample populations have identical variances.

iii. The sampled populations follow normal distribution.

The Wilcoxon's rank sum test is used when the above assumptions are not met. The Wilcoxon rank sum test can be used to test the null hypothesis that the medians of the two populations are identical or the two populations have the same distribution.

Procedure

Step 1: Combine and arrange the observations of both samples in ascending order of magnitude.

Step 2: Assign ranks to the arranged observations in step 1.

Step 3: Calculate the sum of ranks denoted by R and assign it to the smaller sample.

The test statistic is denoted by R, and reject H₀ if R ≤ lower table value OR R ≥ upper table value. The upper pair of the table values is used for a two-tailed test, and the lower pair of the table values is for a one-tailed test.

Step 4: If the sample sizes are large, R is approximately normally distributed with the mean and standard deviation given below:

The test statistic is given by;



Example 13.11: The two companies, A and B, manufacture tubeless tyres. Two independent random samples of about the length of the life of tubeless tyres are measured in 1000 kilometres. The lengths of the lives of two companies are given below:

Manufacture A

29

27

23

30

 

 

Manufacture B

24

37

35

19

40

32

Use Wilcoxon rank sum to test if there is any difference in the length of tubeless life of the two types of tyres.

Solution:

i. State null and alternative hypothesis

H0: M1 = M2 vs. H1: M1 ≠ M2 

ii. The significance level; α = 0.05

iii. The test statistic: The sample sizes are small; then R is used as the test statistic.

iv. Reject H0, if R ≤ 12 OR R ≥ 32

v. Computation:

Arranged the observations of both samples combined and assigned ranks. If sample 1 is smaller, then add the ranks of sample 1.


R = 2 + 4 + 5 + 6 = 17 

vi. Remarks: The R (sum of ranks assigned to a smaller sample) falls in the acceptance region; the sample data does not provide sufficient evidence to reject the null hypothesis. Thus, it is concluded that the medians of both types of tyres are identical.
Example 13.12: An agriculture researcher claims that the local farmers' collected potatoes have lower producing ability than the newly developed hybrid variety of potato. To check the claim, two independent random samples of sizes 13 and 16 of production are selected. The collected data of the two samples are given below:

Local

26

25

38

33

42

40

44

26

25

43

35

48

37

 

 

 

Hybrid

44

30

34

47

35

46

35

47

48

34

32

42

43

49

46

47

Test the null hypothesis that the population medians are equal against the alternative that M1 < M2.
Solution:

i. State null and alternative hypothesis

H0: M1 = M2 vs. H1: M1 < M2 

ii. The significance level; α = 0.05

iii. The test statistic: 

iv. Reject H0, When z < -1.645

v. Computation: Arrange both samples combined in ascending order of magnitude and assign ranks. Assigned average ranks to tied observations.

R1 = 1.5 + 1.5 + 3.5 + 3.5 + 7 + 10 + 13 + 14 + 15 + 16 + 18 + 20.5 + 27 = 152.5

vi. Remarks: The calculated z value falls in the rejection region; the sample data does not provide sufficient evidence to accept the null hypothesis of the equality of medians. Thus, it is concluded that the yield capabilities of the local are less than those of the hybrid.
Example 13.3: It is claimed that the working women spent more time on Saturday visiting than on Sunday in the shopping mall. To check the claim, a random sample of the time they spent in the shopping mall, nearest to minutes, is given below:

Sat Day

35

48

63

49



Sun Day

29

49

105

35

60

69

Test the null hypothesis that the population medians of the time spent in the shopping mall on Saturday are more than the time spent on Sunday.
Solution: Let M1 and M2 be the medians of time spent in the shopping mall on Saturday and Sunday, respectively.

i. State null and alternative hypothesis

H0: M1 = M2 vs. H1: M1 > M2 

ii. The significance level; α = 0.05

iii. The test statistic: As the sample sizes are small, the test statistic is R.

iv. Reject H0, if ≤  13 OR R ≥ 31

v. Computation:

R = 2.5 + 4 + 5.5 + 8 = 20
vi. Remarks: The computed R value in the acceptance region; the sample does not provide sufficient evidence to accept H0. Thus, it is concluded that population medians of the time spent in the shopping mall on Saturday are equal to the time spent on Sunday.



The Mann–Whitney U test Lecture 51

 

The Mann–Whitney U test  

Lecture 51

The Mann–Whitney U test is the true nonparametric counterpart of the two-sample independent t test. This test is used when the samples are independent and the observations of both samples are independently randomly selected. It is also used to test the differences between two independent groups or the medians of two populations when the data is either ordinal or continuous of identical shape but not normally distributed.

Procedure to Perform Test:

To carry out the test, arrange the observations of both samples in ascending order of magnitude and assign ranks to them. Assign the average of ranks in case of tied observations. Compute the sum of ranks assigned to sample 1 and sample 2, denoted by R1 and R2, respectively.

The test statistic for small samples, i.e., n1, n2 < 8.

μ = Minimum (μ1, μ2)

Where μ1 and μ2 can be calculated as:
In the case of a two-tailed test, use the upper pair of table values.
Reject H0 when μ ≤ the lower value of the Mann
-Whitney table.
OR
 Reject H0 when μ ≥ the upper value of the Mann-Whitney table.
In the case of a one-tailed test:
Reject H0 when μ ≤ the lower value of the Mann-Whitney table.
In the case of a one-tailed test, use the lower pair of table values.
 Reject H0 when μ ≥ the upper value of the Mann-Whitney table.
Normal Approximation
If the sample or samples are large (
n1, n2 ≥ 8). Then the normal approximation is used, given below:

Where:

Mann – Whitney U test in case of group data:

In the case of grouped data, add the frequencies of both groups and denote by Tj and find the cumulative frequencies of the Tj denoted by c.

Next, find the average rank denoted by rj as:

Compute the total of ranks of group 1, denoted by R1 as:
 Multiplying the average rank by the frequency of group 1.

R1 = rj X fi
The μ1 and μ2 can be computed as:

For small samples, the test statistic is denoted by μ = Minimum (μ₁, μ₂
). Use normal approximation for large samples.



Example 13.8: The doctors are interested to know the timing of recovery from the seasonal flu. The doctors' team selected the two types of patients of approximately the same age and health conditions. The doctors' team divides the group into treated and untreated and records the recovery time (in hours) from the flu of treated and untreated patients given below:

Treated

14

15

15

17

18

23

Untreated

17

24

23

18

19

28

 Use the Mann-Whitney U test to test the hypothesis that the medians of recovery times of treated and untreated are identical at a 5% significance level.

Solution: 

i. State the null and alternative hypotheses as:

H0: Median 1 = Median 2 vs. H1: Median 1 ≠ Median 2

ii. The significance level; α = 0.05

iii. The test statistics: μ = Minimum (μ1, μ2)

iv. Reject H0, when U is lies out side of (5, 31)

v. Computation:

Arrange the observations of both samples in ascending order of magnitude.

R1 = 1 + 2.5 + 2.5 + 4 + 6.5 + 9 = 25.5
R2 = 5 + 6.5 + 8 + 9.5 + 11 + 12 = 52.0


The test statistic for small samples

μ = Minimum (μ1, μ2)
μ = Minimum (31.5, 5)
μ = 5
vi. Remarks: The calculated μ value is within the rejection region; there is insufficient evidence in the sample data to support the null hypothesis that the treated and untreated groups' recovery time medians are the same. The recovery periods of patients who receive treatment and those who do not are therefore found to differ.
Example 13.9: A student investigated whether there were more trichomes (stings) on nettles that were grazed compared with nettles that were ungrazed. He collected two independent random samples of size 9 and 8, respectively. The number of trichomes per cm² on a sample of nettle leaves from each area is given below:

Grazed plants

12

14

15

17

19

22

23

26

21

Ungrazed plants

10

13

14

14

16

20

21

23

 


It is claimed that the number of trichomes on the grazed leaves is significantly higher than those on the ungrazed leaves. The sampled population are identical but non-normal.

Solution:

i. State the null and alternative hypotheses as:

H0: Median 1 = Median 2 vs. H1: Median 1 > Median 2

ii. The significance level; α = 0.05

iii. The test statistics: 

iv. Reject H0, when z > 1.645 

v. Computation


R1 = 2 + 3 + 5 + 7 + 9 + 10 + 14 + 15 + 17 = 82
R2 = 1 + 5 + 5 + 8 + 11 + 12 + 12 + 15 = 69
μ = Minimum (μ1, μ2)
μ = Minimum (36, 48)
μ = 36
vi. Remarks: The z-calculated value falls in the acceptance area; the sample data does not provide sufficient evidence to accept the alternative hypothesis that the number of trichomes on the grazed leaves is significantly higher than those on the ungrazed leaves. 
Example 13.10: The wages of the two factory workers are given below:

Wage

1000 - 1200

1200 - 1400

1400 - 1600

1600 - 1800

1800 - 2000

No. of workers in Factory A

10

15

12

9

6

No. of workers in Factory B

11

13

18

7

5

Apply the Mann-Whitney U test to check whether the medians of wages of the two factories are identical.

Solution:

i. State the null and alternative hypotheses as:

H0: Median 1 = Median 2 vs. H1: Median 1 ≠ Median 2

ii. The significance level; α = 0.05

iii. The test statistics: 

iv. Reject H0, when |z| > 1.96

v. Computation:

μ = Minimum (μ1, μ2)
μ = Minimum (1376, 1432)
μ = 1376


vi. Remarks: The z-calculated value falls in the acceptance area; the sample data does not provide sufficient evidence to reject the null hypothesis. Thus, it is concluded that the medians of wages of the two factories are identical.


Moving Average Models (MA Models) Lecture 17

  Moving Average Models  (MA Models)  Lecture 17 The autoregressive model in which the current value 'yt' of the dependent variable ...