Point Biserial Correlation
Lecture 59
The point biserial correlation is a statistical measure
that assesses the association between a natural dichotomous variable and a continuous variable. The natural dichotomous variable has two natural categories, like 'male / female', 'yes / no', etc. The point biserial correlation is
a special case of correlation and is based on the following assumptions.
i. There will be no outliers in the continuous variable.
ii. The continuous variable follows normal distribution or
approximately follows normal distribution.
iii. The variance of the continuous variable is homogeneous
for both categories of the natural dichotomous variable.
e.g., suppose it is desired to study the association
between study hours (continuous variable) and gender (natural dichotomous
variable); then such a kind of association can be measured by point biserial
correlation.
Coefficient Point Biserial Correlation
A numerical quantity that measures the strength of linear
association between a natural dichotomous variable and a continuous variable. The
point biserial correlation coefficient is denoted by ρb (population) and by rb (sample).
The
point biserial correlation between dichotomous variables, categorised into natural categories “p” and “q”, and a continuous variable is denoted by “rb” and given by:
Where:
X¯p is the mean of the interval variable’s
values associated with the dichotomous variable’s first category.
X¯q q is the mean of the interval
variable’s values associated with the dichotomous variable’s second category.
s is the standard deviation of the variable on the interval scale.
Pp is the proportion of the interval variable values associated with the
dichotomous variable’s first category.
Pq is the proportion of the interval variable values associated with the
dichotomous variable’s second category.
The mean and proportion of the dichotomous variable's first “p” category:
The mean and proportion of the dichotomous variable second “q” category:
The
standard deviation “s” can be obtained as:
If it is desired to test H0: ρpb = 0
The following test statistic will be used:
if the sample size is small.
if the sample size is large.
Example 13.22: A researcher was examining the gender
disparity and wanted to evaluate how men and women could identify and remember visual
features. The researcher used 17 individuals, 9 of whom were women and 8 of whom were men,
who were initially not aware of the experiment. The researcher instructed them to
wait and put them all in a room filled with different items. The researcher
invited each participant to finish a 30-question post-test about various
features in the room. The post-test results and participants genders are
displayed in the following table:
|
Participants
|
Gender
|
Score
|
|
1
|
M
|
7
|
|
2
|
M
|
19
|
|
3
|
M
|
8
|
|
4
|
M
|
10
|
|
5
|
M
|
7
|
|
6
|
M
|
15
|
|
7
|
M
|
6
|
|
8
|
M
|
13
|
|
9
|
F
|
14
|
|
10
|
F
|
11
|
|
11
|
F
|
18
|
|
12
|
F
|
23
|
|
13
|
F
|
17
|
|
14
|
F
|
20
|
|
15
|
F
|
14
|
|
16
|
F
|
24
|
|
17
|
F
|
22
|
The researcher wants to know the association between
gender and score. Test the hypothesis that there is no association between
gender and score is null.
Solution: First calculate the point biserial correlation and then test the hypothesis.
|
Participants
|
Gender
|
X
|
X^2
|
|
1
|
M
|
7
|
49
|
|
2
|
M
|
19
|
361
|
|
3
|
M
|
8
|
64
|
|
4
|
M
|
10
|
100
|
|
5
|
M
|
7
|
49
|
|
6
|
M
|
15
|
225
|
|
7
|
M
|
6
|
36
|
|
8
|
M
|
13
|
169
|
|
9
|
F
|
14
|
196
|
|
10
|
F
|
11
|
121
|
|
11
|
F
|
18
|
324
|
|
12
|
F
|
23
|
529
|
|
13
|
F
|
17
|
289
|
|
14
|
F
|
20
|
400
|
|
15
|
F
|
14
|
196
|
|
16
|
F
|
24
|
576
|
|
17
|
F
|
22
|
484
|
|
|
|
248
|
4168
|
Let p represent the male category and q the female category.
np = 8, nq = 9, n = 17
The mean and proportion of category "p" that is male.
The mean and proportion of category "q" that is female.
The standard deviation of the score:
The coefficient of the point biserial correlation is given by;
Now test the hypothesis:
i. State the null and alternative hypotheses:
H0: ρpb = 0 vs. H1: ρpb ≠ 0
ii. The significance level: α = 0.05
iii. The test statistic: The sample size is small; the following test statistic will be used.
iv. Reject
H0 when |t| > 2.131
v. Computation:
vi. Remarks: The calculated t value falls in the rejection area; the sample data does not provide sufficient evidence to accept the null hypothesis. Thus, it is concluded that there exist a relationship between gender and score.