Biserial Correlation Lecture 58

 

Biserial Correlation

Lecture 58

The biserial correlation measures the strength of association between an artificial dichotomous variable and a continuous variable. The artificial dichotomous variable is defined by the researcher, expert or investigator, like pass/fail, weak/strong, etc.

e.g., an examiner decides that below 40% is considered a fail. Thus, above 40% will be considered a pass.

Coefficient of Biserial Correlation

A quantitative measure that measures the strength of association between an artificial dichotomous variable and a continuous variable is called biserial correlation, denoted by ρb (population) and rb⁣ (sample). The following is used to compute the biserial correlation between an artificial dichotomous and continuous variable.

Where:

X−p is the mean of the interval variable’s values associated with the dichotomous variable’s first category.

X−q is the mean of the interval variable’s values associated with the dichotomous variable’s second category.

s is the standard deviation of the variable on the interval scale.

Pp is the proportion of the interval variable values associated with the dichotomous variable’s first category.

Pq is the proportion of the interval variable values associated with the dichotomous variable’s second category.

The mean and proportion of the dichotomous variable's first “p” category:

The mean and proportion of the dichotomous variable second “q” category:

The standard deviation “s” can be obtained as:

Z is the table value against Pp and Pq and is called the height of the ordinate of the normal curve separating the proportions p and q.

Hypothesis Testing about the Association of an Artificial dichotomous and Continuous Variable

Let rb be the estimate of ρbcomputed from the values of a sample selected from a bivariate population that consists of an artificial dichotomous and continuous variable. Suppose it is desired to test that there is no association between an artificial dichotomous and continuous variable.

That’s

H0: ρb = 0

versus suitable alternatives.

If the sample size is small, the following test statistics are used to test the above null hypothesis.

If the sample size is large, the following test statistics are used to test the above null hypothesis.




Example 13.20: A sociology department of a college wants to know if the grade point averages (GPAs) can predict the performance of students in the comprehensive exam that is necessary for graduation. The comprehensive exam is scored as either pass or fail. The college graduate sociology department wanted to know if the grade point averages (GPAs) of its students could be used to forecast how well they do on the comprehensive exam that is necessary to graduate. There are two possible grades for the comprehensive exam: pass and fail. Last year, sixteen students took the comprehensive exam. Five students did not pass the exam. The GPAs and exam results of the students are given below:

Participants

Exam Result

GPA

1

F

3.5

2

F

3.4

3

F

3.3

4

F

3.2

5

F

3.6

6

P

4.0

7

P

3.6

8

P

4.0

9

P

4.0

10

P

3.8

11

P

3.9

12

P

3.9

13

P

4.0

14

P

3.8

15

P

3.5

16

P

3.6

Test the hypothesis that there is no association between exam results and GPAs at a 5% significance level.

Solution: First calculate the biserial correlation and next test the hypothesis.

Participants

Exam Result

X

X^2

1

F

3.5

12.25

2

F

3.4

11.56

3

F

3.3

10.89

4

F

3.2

10.24

5

F

3.6

12.96

6

P

4.0

16.00

7

P

3.6

12.96

8

P

4.0

16.00

9

P

4.0

16.00

10

P

3.8

14.44

11

P

3.9

15.21

12

P

3.9

15.21

13

P

4.0

16.00

14

P

3.8

14.44

15

P

3.5

12.25

16

P

3.6

12.96

 

 

59.1

219.37

Let p represent the fail category and q the pass category.

np = 5 nq = 11 n = 16



Now, determine the height of the unit normal curve ordinate, y, at the point dividing Pp = 0.3125 and Pq = 0.6875

However, we will compute the value. Using the above table also provides the z-score at the point dividing Pp = 0.31 and Pq = 0.68, z = 0.49:

Now, compute the biserial correlation coefficient using

The biserial correlation shows a strong association between exam results and GPAs.
Now to test the hypothesis
i. State the null and alternative hypotheses.

H0: ρb = 0 vs. H1: ρb ≠ 0 
ii. The significance level: α = 0.05
iii. The test statistic: The sample size is small.
iv. Reject H0 when |t| > 2.360
v. Computation:
vi. Remarks: The computed t value falls in the rejection area; the sample data does not provide sufficient evidence to accept the null hypothesis about the association of exam results and GPAs. Thus, it is concluded that there exists an association between exam results and GPAs.

Example 13.21: An investigator sought to ascertain whether poverty and self-esteem are associated. 18 participants were categorised as either above or below the poverty line based on their income level. A 20-person survey measuring self-esteem was filled out by participants. The results of the survey are presented in the following table:

No.

Poverty Line

Survey Score

1

above

26

2

above

38

3

above

26

4

above

35

5

above

32

6

above

22

7

above

28

8

above

33

9

above

19

10

above

23

11

above

25

12

above

28

13

above

41

14

above

35

15

above

47

16

above

20

17

above

29

18

above

21

19

above

24

20

below

19

21

below

39

22

below

21

23

below

18

24

below

22

25

below

31

26

below

36

27

below

15

28

below

21

29

below

12

30

below

7

31

below

34

32

below

24

33

below

27

34

below

21

35

below

15

36

below

19

37

below

17

38

below

31

39

below

22

40

below

19


Test the hypothesis: there is a strong association between survey score and poverty line.

Solution: First calculate the biserial correlation and next test the hypothesis.

No.

Poverty

X

X^2

1

above

26

676

2

above

38

1444

3

above

26

676

4

above

35

1225

5

above

32

1024

6

above

22

484

7

above

28

784

8

above

33

1089

9

above

19

361

10

above

23

529

11

above

25

625

12

above

28

784

13

above

41

1681

14

above

35

1225

15

above

47

2209

16

above

20

400

17

above

29

841

18

above

21

441

19

above

24

576

20

below

19

361

21

below

39

1521

22

below

21

441

23

below

18

324

24

below

22

484

25

below

31

961

26

below

36

1296

27

below

15

225

28

below

21

441

29

below

12

144

30

below

7

49

31

below

34

1156

32

below

24

576

33

below

27

729

34

below

21

441

35

below

15

225

36

below

19

361

37

below

17

289

38

below

31

961

39

below

22

484

40

below

19

361

1022

28904


The mean and proportion of the dichotomous variable first “p” above the category:


The mean and proportion of the dichotomous variable second “q” below the category:


The standard deviation “s” can be obtained as:

Z is the table value against Pp and Pq and is called height of the ordinate of the normal curve separating the proportion p = 0.475 and q = 0.525, y = 0.3982

The coefficient of biserial correlation is given below:
Now to test the hypothesis
i. State the null and alternative hypotheses.

H0: ρb = 0 vs. H1: ρb ≠ 0 
ii. The significance level: α = 0.05
iii. The test statistic: The sample size large
iv. Reject H0 when |z| > 1.96
v. Computation:

vi. Remarks: The z-calculated value falls in the rejection area; the sample data does not provide sufficient evidence to accept the null hypothesis. Thus, it is concluded that there exists an association between poverty level and survey score.

No comments:

Post a Comment

Moving Average Models (MA Models) Lecture 17

  Moving Average Models  (MA Models)  Lecture 17 The autoregressive model in which the current value 'yt' of the dependent variable ...