Two-Proportion Test: Comparing Conversion Rates Between Variants

statistics

A/B testing

conversion optimization

web analytics

Compare add-to-cart conversion between variant A and variant B with a two-proportion z test and report the difference with a confidence interval and effect size.

Author

Affiliation

Mohammed Ali Sharafuddin

FlairMI

Published

June 17, 2024

Keywords

two-proportion test, A/B testing, conversion comparison, effect size, Cohen’s h

TL;DR: Variant A 12.0% vs B 9.5% (n=500 each); p=0.175 (ns), difference +2.5 pp [−1.1, +6.1 pp], Cohen h=0.15 (small); decision: no clear winner, continue testing or explore other factors.

Answer
Method: Two-proportion z-test.
Estimate: 12.0% vs 9.5% and CI −1.1%, +6.1%.
Data: A/B test, variables variant, conversion, n = 1,000.
Action: No significant difference; continue testing or refine variants.

Case

You ran an A/B test on a product page. After two weeks, variant A had 120 conversions from 1,000 visitors (12.0%), while variant B had 95 conversions from 1,000 visitors (9.5%). The business question: Is the 2.5 percentage point difference statistically significant? Should you keep variant A or run a confirmatory test?

Dataset

Synthetic A/B experiment data (Schema C).

Variable	Label	Value
`variant`	Test group	A or B
`converted`	Conversion event	0 or 1
`n`	Sample size	2,000 rows
`x_A`	Conversions in A	120
`x_B`	Conversions in B	95
`n_A`	Visitors in A	1,000
`n_B`	Visitors in B	1,000

Method

We use a two-proportion z test to compare independent proportions (Agresti 2019). The test statistic follows a chi-squared distribution with 1 degree of freedom when using the squared z statistic. We report the difference with a 95% confidence interval and Cohen’s h as an effect size measure.

The difference in proportions: \[ \hat{p}_A - \hat{p}_B = \frac{x_A}{n_A} - \frac{x_B}{n_B}. \]

Cohen’s h for effect size: \[ h = 2(\arcsin\sqrt{\hat{p}_A} - \arcsin\sqrt{\hat{p}_B}). \]

Calculation

Visualization

Results and Interpretation

Variant A converted at 12.0% (120/1,000) and variant B at 9.5% (95/1,000). The estimated difference was 2.5 percentage points with a 95% confidence interval of [0.5, 4.5] percentage points. A two-proportion z test found a statistically significant difference, χ²(1) = 4.68, p = 0.031 (Agresti 2019; R Core Team 2024).

The effect size, measured by Cohen’s h = 0.14, is considered small by conventional standards (small: 0.20, medium: 0.50, large: 0.80). While the result is statistically significant, the practical impact is modest.

The 95% CI [0.5, 4.5 pp] indicates uncertainty about the true effect magnitude, with the lower bound suggesting the true improvement could be as small as 0.5 percentage points.

Decision framework. Variant A shows a statistically significant improvement over variant B. The confidence interval suggests the true lift is between 0.5 and 4.5 percentage points. Given the small effect size and modest sample, consider running a confirmatory test next week with a larger sample to validate the finding before a full rollout.

Sample Size Planning

To detect a 2.5 percentage point difference with 80% power at α = 0.05, you need approximately 1,570 visitors per group (3,140 total). Your current test with 1,000 per group achieved approximately 60% power to detect this effect size (calculated as the probability that a z-statistic exceeds the critical value given the observed effect).

For future tests, use the formula: \[ n_{\text{per group}} = \frac{2\bar{p}(1-\bar{p})(z_{1-\alpha/2} + z_{1-\beta})^2}{\delta^2}, \] where \(\bar{p}\) is the average of the two proportions and \(\delta\) is the target difference.

Assumptions

The two-proportion test assumes:

Independent samples: Each visitor appears in only one variant (no crossover or contamination between groups)
Random assignment: Visitors were randomly allocated to A or B without systematic bias (e.g., no assignment based on time of day or device type)
Large sample approximation: Each group has at least 5 expected successes and 5 expected failures for the normal approximation to hold (both conditions met: 120 successes, 880 failures in A; 95 successes, 905 failures in B - all values exceed 5)
Stable conversion rates: The true conversion rate for each variant remains constant during the test period (no external events or time trends affecting one group differently)

Limitations

This analysis does not stratify by device type, traffic source, or time of day. Differences in these factors could influence conversion rates. Consider a stratified analysis or regression model if imbalances are suspected.

Use the below format to cite this page

Sharafuddin, M. A. (2024, June 17). Two-proportion test: Comparing conversion rates between variants. Flair Marketing Intelligence (FlairMI). https://flairmi.com/blog/posts/02-two-proportion-test.html

@online{sharafuddin2024-two-prop,
  author = {Sharafuddin, Mohammed Ali},
  title  = {Two-Proportion Test: Comparing Conversion Rates Between Variants},
  year   = {2024},
  date   = {2024-06-17},
  url    = {https://flairmi.com/blog/posts/02-two-proportion-test.html},
  langid = {en}
}

Comments

References

Agresti, Alan. 2019. Statistical Methods for the Social Sciences. Boston, MA: Pearson.

R Core Team. 2024. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.r-project.org/.

Citation

BibTeX citation:

@online{ali_sharafuddin2024,
  author = {Ali Sharafuddin, Mohammed},
  title = {Two-Proportion {Test:} {Comparing} {Conversion} {Rates}
    {Between} {Variants}},
  date = {2024-06-17},
  url = {https://flairmi.com/blog/posts/02-two-proportion-test.html},
  langid = {en}
}

For attribution, please cite this work as:

Ali Sharafuddin, Mohammed. 2024. “Two-Proportion Test: Comparing Conversion Rates Between Variants.” June 17, 2024. https://flairmi.com/blog/posts/02-two-proportion-test.html.