Point Biserial Correlation Lecture 59

 

Point Biserial Correlation

Lecture 59

The point biserial correlation is a statistical measure that assesses the association between a natural dichotomous variable and a continuous variable. The natural dichotomous variable has two natural categories, like 'male / female', 'yes / no', etc. The point biserial correlation is a special case of correlation and is based on the following assumptions.

i. There will be no outliers in the continuous variable.

ii. The continuous variable follows normal distribution or approximately follows normal distribution.

iii. The variance of the continuous variable is homogeneous for both categories of the natural dichotomous variable.

e.g., suppose it is desired to study the association between study hours (continuous variable) and gender (natural dichotomous variable); then such a kind of association can be measured by point biserial correlation.

Coefficient Point Biserial Correlation

A numerical quantity that measures the strength of linear association between a natural dichotomous variable and a continuous variable. The point biserial correlation coefficient is denoted by ρb (population) and by r(sample).

The point biserial correlation between dichotomous variables, categorised into natural categories “p” and “q”, and a continuous variable is denoted by “rb” and given by:


Where:

X¯p is the mean of the interval variable’s values associated with the dichotomous variable’s first category.

X¯q q is the mean of the interval variable’s values associated with the dichotomous variable’s second category.

s is the standard deviation of the variable on the interval scale.

Pp is the proportion of the interval variable values associated with the dichotomous variable’s first category.

Pq is the proportion of the interval variable values associated with the dichotomous variable’s second category.

The mean and proportion of the dichotomous variable's first “p” category:

The mean and proportion of the dichotomous variable second “q” category:

The standard deviation “s” can be obtained as:

If it is desired to test H0: ρpb = 0
The following test statistic will be used:
if the sample size is small.
if the sample size is large.
Example 13.22: A researcher was examining the gender disparity and wanted to evaluate how men and women could identify and remember visual features. The researcher used 17 individuals, 9 of whom were women and 8 of whom were men, who were initially not aware of the experiment. The researcher instructed them to wait and put them all in a room filled with different items. The researcher invited each participant to finish a 30-question post-test about various features in the room. The post-test results and participants genders are displayed in the following table:

Participants

Gender

Score

1

M

7

2

M

19

3

M

8

4

M

10

5

M

7

6

M

15

7

M

6

8

M

13

9

F

14

10

F

11

11

F

18

12

F

23

13

F

17

14

F

20

15

F

14

16

F

24

17

F

22

The researcher wants to know the association between gender and score. Test the hypothesis that there is no association between gender and score is null.

 Solution: First calculate the point biserial correlation and then test the hypothesis.

Participants

Gender

X

X^2

1

M

7

49

2

M

19

361

3

M

8

64

4

M

10

100

5

M

7

49

6

M

15

225

7

M

6

36

8

M

13

169

9

F

14

196

10

F

11

121

11

F

18

324

12

F

23

529

13

F

17

289

14

F

20

400

15

F

14

196

16

F

24

576

17

F

22

484

 

 

248

4168

Let p represent the male category and q the female category.

np = 8, nq = 9, n = 17
The mean and proportion of category "p" that is male.


The mean and proportion of category "q" that is female.


The standard deviation of the score:
The coefficient of the point biserial correlation is given by;
Now test the hypothesis:
i. State the null and alternative hypotheses:
H0: ρpb = 0 vs. H1: ρpb  0 
ii. The significance level: α = 0.05
iii. The test statistic: The sample size is small; the following test statistic will be used.
iv. Reject H0 when |t| > 2.131

v. Computation:

vi. Remarks: The calculated t value falls in the rejection area; the sample data does not provide sufficient evidence to accept the null hypothesis. Thus, it is concluded that there exist a relationship between gender and score.

No comments:

Post a Comment

Moving Average Models (MA Models) Lecture 17

  Moving Average Models  (MA Models)  Lecture 17 The autoregressive model in which the current value 'yt' of the dependent variable ...