A/B Test Lecture 60

 

A/B Test

Lecture 60

The A/B or A/B/.../K test, also known as a bucket or split-run test, is a statistical method used to compare two (A, B) or more versions of the same variable to determine which version is more effective on a specific metric. Version A is called 'control', and version B is called 'experimental'. The businesses and industry widely used the A/B test to utilise their resources in a better way. The specific metric is

Average revenue per user per unit of time.

A/B testing is a handy technique for verifying optimisation when comparing a variation to the conventional. For instance, altering a movie poster might boost attendance, or altering a product's packing can boost sales.

Examples:

i. A company is creating two versions of an advertisement: one is version A with a folk song in the background, and the second is version B with fast music in the background. Both versions of advertisements are viral;check which one is more popular or effective in terms of revenue.

ii. A dress designer develops two brands of a school uniform and wants to check which brand is more attractive and share with kids’s parents.

Procedure to Perform A/B or A/B/…/K Test

Define your objectives. 

Define your variable or variables and metrics.

State your null and alternative hypotheses.

Procedure to Collect Data

i. Suppose you want to analyse two variants, A and B. The observations for both variants are collected randomly and independently.

ii. The observations for both variants are collected at the same time period to control extraneous sources of variations.

iii. The duration to collect observations for both variants will be the same and conducted in the same circumstances to measure the real change.

 Test Statistic

The test statistic is dependent; you define the hypothesis and metric. The following commonly employed test statistics are:

z-statistic, t-statistic, chi-square statistic, F statistic and ANOVA.

Assume the following forms of hypotheses along with a valid test statistic:

H0: P1 = P2 


H0: P1 = P2 = ... = Pk


H0: μ1 = μ2
The test statistic for large samples:
H0: μ1 = μ2
The test statistic for small samples:

H0: μ1 = μ2 = ... = μk

The ANOVA technique will be used.



Point Biserial Correlation Lecture 59

 

Point Biserial Correlation

Lecture 59

The point biserial correlation is a statistical measure that assesses the association between a natural dichotomous variable and a continuous variable. The natural dichotomous variable has two natural categories, like 'male / female', 'yes / no', etc. The point biserial correlation is a special case of correlation and is based on the following assumptions.

i. There will be no outliers in the continuous variable.

ii. The continuous variable follows normal distribution or approximately follows normal distribution.

iii. The variance of the continuous variable is homogeneous for both categories of the natural dichotomous variable.

e.g., suppose it is desired to study the association between study hours (continuous variable) and gender (natural dichotomous variable); then such a kind of association can be measured by point biserial correlation.

Coefficient Point Biserial Correlation

A numerical quantity that measures the strength of linear association between a natural dichotomous variable and a continuous variable. The point biserial correlation coefficient is denoted by ρb (population) and by r(sample).

The point biserial correlation between dichotomous variables, categorised into natural categories “p” and “q”, and a continuous variable is denoted by “rb” and given by:


Where:

X¯p is the mean of the interval variable’s values associated with the dichotomous variable’s first category.

X¯q q is the mean of the interval variable’s values associated with the dichotomous variable’s second category.

s is the standard deviation of the variable on the interval scale.

Pp is the proportion of the interval variable values associated with the dichotomous variable’s first category.

Pq is the proportion of the interval variable values associated with the dichotomous variable’s second category.

The mean and proportion of the dichotomous variable's first “p” category:

The mean and proportion of the dichotomous variable second “q” category:

The standard deviation “s” can be obtained as:

If it is desired to test H0: ρpb = 0
The following test statistic will be used:
if the sample size is small.
if the sample size is large.
Example 13.22: A researcher was examining the gender disparity and wanted to evaluate how men and women could identify and remember visual features. The researcher used 17 individuals, 9 of whom were women and 8 of whom were men, who were initially not aware of the experiment. The researcher instructed them to wait and put them all in a room filled with different items. The researcher invited each participant to finish a 30-question post-test about various features in the room. The post-test results and participants genders are displayed in the following table:

Participants

Gender

Score

1

M

7

2

M

19

3

M

8

4

M

10

5

M

7

6

M

15

7

M

6

8

M

13

9

F

14

10

F

11

11

F

18

12

F

23

13

F

17

14

F

20

15

F

14

16

F

24

17

F

22

The researcher wants to know the association between gender and score. Test the hypothesis that there is no association between gender and score is null.

 Solution: First calculate the point biserial correlation and then test the hypothesis.

Participants

Gender

X

X^2

1

M

7

49

2

M

19

361

3

M

8

64

4

M

10

100

5

M

7

49

6

M

15

225

7

M

6

36

8

M

13

169

9

F

14

196

10

F

11

121

11

F

18

324

12

F

23

529

13

F

17

289

14

F

20

400

15

F

14

196

16

F

24

576

17

F

22

484

 

 

248

4168

Let p represent the male category and q the female category.

np = 8, nq = 9, n = 17
The mean and proportion of category "p" that is male.


The mean and proportion of category "q" that is female.


The standard deviation of the score:
The coefficient of the point biserial correlation is given by;
Now test the hypothesis:
i. State the null and alternative hypotheses:
H0: ρpb = 0 vs. H1: ρpb  0 
ii. The significance level: α = 0.05
iii. The test statistic: The sample size is small; the following test statistic will be used.
iv. Reject H0 when |t| > 2.131

v. Computation:

vi. Remarks: The calculated t value falls in the rejection area; the sample data does not provide sufficient evidence to accept the null hypothesis. Thus, it is concluded that there exist a relationship between gender and score.

Moving Average Models (MA Models) Lecture 17

  Moving Average Models  (MA Models)  Lecture 17 The autoregressive model in which the current value 'yt' of the dependent variable ...