Welch t-Test: Comparing Average Order Value Between Groups

statistics
A/B testing
revenue optimization
ecommerce
Compare average order value between control and treatment groups with a Welch t-test and report the mean difference with a confidence interval and effect size.
Published

June 24, 2024

Keywords

t-test, Welch t-test, average order value, effect size, Hedges g

TL;DR: Control ₹1,500 vs Treatment ₹1,580 (n=100 each); p=0.001 (sig), difference ₹80 [₹32, ₹128], Hedges g=0.39 (small-medium); decision: treatment increases AOV, roll out if margin supports.

Answer
Method: Welch t-test for independent samples.
Estimate: ₹1,500 vs ₹1,580 and CI ₹32, ₹128.
Data: A/B test, variables group, order_value, n = 200.
Action: Treatment increases AOV by ₹80; roll out if margin supports.

Case

Case

You are analyzing a merchandising experiment for an e-commerce store. The control group (A) saw standard product displays, while the treatment group (B) received enhanced product recommendations. After four weeks with 200 customers in each group, you need to determine: Does the treatment significantly increase average order value (AOV)? Should you roll out the new merchandising approach?

Dataset

Synthetic sample from e-commerce experiment (Schema A).

Variable Label Value
aov_a Control group AOV ₹ (rupees)
aov_b Treatment group AOV ₹ (rupees)
n_a Control sample size 200
n_b Treatment sample size 200
mean_a Control mean ₹1,500
mean_b Treatment mean ₹1,580

Method

We use a Welch t-test to compare two independent groups with potentially unequal variances (Welch 1947). This is more robust than Student’s t-test when variances differ. We report the mean difference with a 95% confidence interval and Hedges g as an effect size measure (which applies a small-sample correction to Cohen’s d).

The mean difference: \[ \bar{x}_B - \bar{x}_A = \frac{1}{n_B}\sum x_{B,i} - \frac{1}{n_A}\sum x_{A,i}. \]

Hedges g for effect size: \[ g = J \times \frac{\bar{x}_B - \bar{x}_A}{s_{\text{pooled}}}, \quad J = 1 - \frac{3}{4(n_A + n_B) - 9}. \]

Calculation

Visualization

Results and Interpretation

The control group had a mean AOV of ₹1,500 (SD = ₹260), while the treatment group had a mean AOV of ₹1,580 (SD = ₹300). The estimated mean difference was ₹80 with a 95% confidence interval of [₹32, ₹128]. A Welch t-test found a statistically significant difference, t(391) = 3.28, p = 0.001 (Welch 1947; R Core Team 2024).

The effect size, measured by Hedges g = 0.29, is considered small to medium by conventional standards (small: 0.20, medium: 0.50, large: 0.80). This indicates a practically meaningful improvement in average order value.

While statistically significant (p = 0.001), the 95% CI [₹32, ₹128] suggests the true improvement could range from modest to substantial. The lower bound indicates at minimum a ₹32 increase per order, which could translate to significant revenue gains at scale.

Decision framework. The treatment group shows a statistically significant and practically meaningful increase in AOV. With an estimated lift of ₹80 per order and a small-to-medium effect size, this suggests the enhanced merchandising approach is effective. Consider rolling out the treatment, monitoring for consistency across customer segments, and calculating the expected revenue impact based on your order volume.

Sample Size Planning

To detect an ₹80 difference in AOV with 80% power at α = 0.05 (assuming SD ≈ ₹280), you need approximately 196 customers per group (392 total). Your current test with 200 per group achieved approximately 81% power to detect this effect size.

For future tests, use the formula: \[ n_{\text{per group}} = \frac{2(z_{1-\alpha/2} + z_{1-\beta})^2}{d^2}, \] where \(d\) is Cohen’s d (mean difference divided by pooled standard deviation).

Assumptions

The Welch t-test assumes:

  • Independent observations: Each customer’s order is independent
  • Random assignment: Customers were randomly allocated to control or treatment
  • Approximate normality: AOV distributions are approximately normal, or sample sizes are large enough (n ≥ 30 per group) for the Central Limit Theorem to apply
  • No requirement for equal variances: Welch t-test adjusts degrees of freedom for unequal variances (SD control = ₹260, treatment = ₹300)

Limitations

This analysis does not account for:

  • Skewness: AOV distributions in retail are often right-skewed. Consider median comparisons or log-transformation if extreme outliers are present.
  • Segmentation: Results may vary by customer segment (new vs. returning, device type, traffic source)
  • Time effects: Seasonal patterns or time-of-week effects could influence AOV
  • Multiple testing: If running multiple simultaneous experiments, adjust significance levels accordingly

For highly skewed data, consider the Mann-Whitney U test (Wilcoxon rank-sum test) as a non-parametric alternative.


Use the below format to cite this page

Sharafuddin, M. A. (2024, June 24). Welch t-test: Comparing average order value between groups. Flair Marketing Intelligence (FlairMI). https://flairmi.com/blog/posts/03-t-test.html
@online{sharafuddin2024-t-test,
  author = {Sharafuddin, Mohammed Ali},
  title  = {Welch t-Test: Comparing Average Order Value Between Groups},
  year   = {2024},
  date   = {2024-06-24},
  url    = {https://flairmi.com/blog/posts/03-t-test.html},
  langid = {en}
}

Comments

References

R Core Team. 2024. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.r-project.org/.
Welch, Bernard Lewis. 1947. “The Generalization of Student’s Problem When Several Different Population Variances Are Involved.” Biometrika 34 (1-2): 28–35. https://doi.org/10.1093/biomet/34.1-2.28.

Citation

BibTeX citation:
@online{ali_sharafuddin2024,
  author = {Ali Sharafuddin, Mohammed},
  title = {Welch {t-Test:} {Comparing} {Average} {Order} {Value}
    {Between} {Groups}},
  date = {2024-06-24},
  url = {https://flairmi.com/blog/posts/03-t-test.html},
  langid = {en}
}
For attribution, please cite this work as:
Ali Sharafuddin, Mohammed. 2024. “Welch t-Test: Comparing Average Order Value Between Groups.” June 24, 2024. https://flairmi.com/blog/posts/03-t-test.html.