Logistic Regression Basics: Predicting Purchase from Session Count
logistic regression, odds ratio, purchase prediction, conversion, session behavior
TL;DR: Purchase probability rises from ~9% (1 session) to ~50% (8+ sessions) (n=500); OR per session = 1.34 [1.23, 1.46]; McFadden R²=0.11, AIC=423; decision: each extra session increases odds by 34%, engage users across multiple visits.
Answer
Method: Logistic regression for binary outcome.
Estimate: OR = 1.34 per session and CI 1.23, 1.46.
Data: User behavior, variables sessions, purchased, n = 500.
Action: Each extra session increases purchase odds by 34%; encourage repeat visits.
Case
Case
You are analyzing customer behavior for an e-commerce platform and want to understand: Does the number of on-site sessions influence the likelihood of making a purchase? Specifically, how much do purchase odds increase with each additional session? Can you predict purchase probability from session count to identify high-intent customers?
Dataset
Synthetic sample from customer session data (Schema B).
| Variable | Label | Value |
|---|---|---|
sessions |
Number of sessions | Count (0+) |
purchase |
Purchase made | 0 (No), 1 (Yes) |
n |
Number of customers | 1,500 |
| Distribution | Mean sessions | ≈3 (Poisson λ=3) |
Method
We use logistic regression to model the relationship between session count (predictor) and purchase outcome (binary response). Unlike linear regression, logistic regression models the log-odds (logit) of the outcome:
\[ \log\left(\frac{p}{1-p}\right) = \beta_0 + \beta_1 \times \text{Sessions}, \]
where \(p\) is the probability of purchase. Rearranging gives the predicted probability:
\[ p = \frac{1}{1 + e^{-(\beta_0 + \beta_1 \times \text{Sessions})}}. \]
The odds ratio (OR) for a one-unit increase in sessions is: \[ \text{OR} = e^{\beta_1}. \]
An OR > 1 indicates increased odds of purchase with more sessions; OR < 1 indicates decreased odds. We report 95% Wald confidence intervals for the odds ratio.
Interpretation guide: - OR = 1.00: No effect - OR = 1.28: 28% increase in odds per additional session - OR = 2.00: 100% increase (doubling) in odds - OR = 0.80: 20% decrease in odds
Calculation
Visualization
Results and Interpretation
The logistic regression reveals a statistically significant positive relationship between session count and purchase probability. The estimated slope coefficient is β₁ = 0.247 (p < 0.001), which translates to an odds ratio of 1.28 (95% CI [1.18, 1.40]) (R Core Team 2024).
Odds ratio interpretation: For every additional session, the odds of making a purchase increase by 28%. This means customers with 2 sessions have 1.28 times the odds of purchasing compared to customers with 1 session, holding all else constant. The 95% CI [1.18, 1.40] indicates we can be confident the true effect is between an 18% and 40% increase in odds per session.
Probability predictions: - 1 session: Predicted purchase probability = 9.3% - 3 sessions (mean): Predicted purchase probability = 15.3% - 5 sessions: Predicted purchase probability = 25.5% - 8 sessions: Predicted purchase probability = 44.6%
The model shows purchase probability increases non-linearly with sessions, following a logistic S-curve. Customers with 5+ sessions show substantially higher purchase intent (>25% probability) compared to single-session visitors (~9%).
Model fit: The pseudo R² of 0.056 (5.6%) indicates the model explains a modest portion of variance in purchase outcomes. This is typical for individual-level behavioral models where pseudo R² values of 2-15% are often acceptable - many factors beyond session count influence purchase decisions. The residual deviance (1643.9) is substantially lower than the null deviance (1741.2), indicating the model improves over a baseline intercept-only model.
Statistical significance: The Wald z-test for the slope coefficient is highly significant (p < 0.001), providing strong evidence that session count is a meaningful predictor of purchase probability.
Decision framework. The positive relationship between sessions and purchase probability suggests several actionable insights: (1) Engagement strategies: Encourage multi-session visits through retargeting, email reminders, or personalized recommendations, (2) Customer segmentation: Treat 1-session visitors differently from 5+ session visitors in targeting and messaging, (3) Lead scoring: Use session count as a component of lead quality scores, (4) Conversion optimization: Focus conversion tactics on high-session users who show strongest purchase intent (>25% probability), (5) Further analysis: Extend the model with additional predictors (device type, traffic source, time on site) to improve predictions and understand interaction effects.
Predicted Probabilities by Session Count
Assumptions
Logistic regression assumes:
- Binary outcome: Purchase is coded as 0 (no purchase) or 1 (purchase)
- Independence: Each customer’s purchase decision is independent of others
- Linearity of log-odds: The relationship between sessions and log-odds of purchase is linear (can test with polynomial terms)
- No perfect separation: Predictor values do not perfectly separate outcomes (no session count guarantees purchase or non-purchase)
- Large sample size: Generally requires n ≥ 10 events per predictor (1,500 observations with 1 predictor is adequate)
Note: Unlike linear regression, logistic regression does not assume normality of residuals or homoscedasticity.
Limitations
This analysis does not account for:
- Confounding variables: Device type, traffic source, time spent per session, page views, product browsing behavior
- Time effects: Session timing (same day vs. spread over weeks), recency effects, seasonality
- Customer heterogeneity: New vs. returning customers, demographics, purchase history
- Non-linear effects: Diminishing returns at very high session counts (may plateau beyond 8-10 sessions)
- Model calibration: Predicted probabilities may be poorly calibrated without validation on held-out data
- Interaction effects: Session impact may vary by customer segment or traffic source
- Causality: Correlation does not prove sessions cause purchases. High-intent customers may naturally browse more.
Recommendations for improvement: - Add categorical predictors (device type, traffic source) and test interactions with sessions - Include continuous predictors (average time per session, pages viewed, cart additions) - Test polynomial terms (sessions²) or splines for non-linear relationships - Split data into training/test sets to assess out-of-sample prediction accuracy - Calculate AUC-ROC and calibration curves to evaluate discrimination and calibration - Consider hierarchical models if sessions are nested within customers over time - Use propensity score matching or causal inference methods to estimate treatment effects
Model Diagnostics
Diagnostic interpretation: Panel A shows agreement between observed and predicted purchase rates across session counts (points should fall near diagonal line). Panel B shows deviance residuals should be randomly scattered around zero with no systematic pattern. Large deviations suggest model misspecification or influential outliers.
Use the below format to cite this page
Sharafuddin, M. A. (2025, October 25). Logistic regression basics: Predicting purchase from session count. Flair Marketing Intelligence (FlairMI). https://flairmi.com/blog/posts/06-logistic-basics.html
@online{sharafuddin2025-logistic,
author = {Sharafuddin, Mohammed Ali},
title = {Logistic Regression Basics: Predicting Purchase from Session Count},
year = {2025},
date = {2025-10-25},
url = {https://flairmi.com/blog/posts/06-logistic-basics.html},
langid = {en}
}
References
Citation
@online{ali_sharafuddin2025,
author = {Ali Sharafuddin, Mohammed},
title = {Logistic {Regression} {Basics:} {Predicting} {Purchase} from
{Session} {Count}},
date = {2025-10-25},
url = {https://flairmi.com/blog/posts/06-logistic-basics.html},
langid = {en}
}
Comments