Logistic Regression Basics: Predicting Purchase from Session Count

statistics

logistic regression

conversion optimization

ecommerce

Model purchase probability as a function of on-site sessions using logistic regression, with odds ratios, confidence intervals, and interpretation guidance for binary outcomes.

Author

Affiliation

Mohammed Ali Sharafuddin

FlairMI

Published

October 25, 2025

Keywords

logistic regression, odds ratio, purchase prediction, conversion, session behavior

TL;DR: Purchase probability rises from ~9% (1 session) to ~50% (8+ sessions) (n=500); OR per session = 1.34 [1.23, 1.46]; McFadden R²=0.11, AIC=423; decision: each extra session increases odds by 34%, engage users across multiple visits.

Answer
Method: Logistic regression for binary outcome.
Estimate: OR = 1.34 per session and CI 1.23, 1.46.
Data: User behavior, variables sessions, purchased, n = 500.
Action: Each extra session increases purchase odds by 34%; encourage repeat visits.

Case

You are analyzing customer behavior for an e-commerce platform and want to understand: Does the number of on-site sessions influence the likelihood of making a purchase? Specifically, how much do purchase odds increase with each additional session? Can you predict purchase probability from session count to identify high-intent customers?

Dataset

Synthetic sample from customer session data (Schema B).

Variable	Label	Value
`sessions`	Number of sessions	Count (0+)
`purchase`	Purchase made	0 (No), 1 (Yes)
`n`	Number of customers	1,500
Distribution	Mean sessions	≈3 (Poisson λ=3)

Method

We use logistic regression to model the relationship between session count (predictor) and purchase outcome (binary response). Unlike linear regression, logistic regression models the log-odds (logit) of the outcome:

\[ \log\left(\frac{p}{1-p}\right) = \beta_0 + \beta_1 \times \text{Sessions}, \]

where \(p\) is the probability of purchase. Rearranging gives the predicted probability:

\[ p = \frac{1}{1 + e^{-(\beta_0 + \beta_1 \times \text{Sessions})}}. \]

The odds ratio (OR) for a one-unit increase in sessions is: \[ \text{OR} = e^{\beta_1}. \]

An OR > 1 indicates increased odds of purchase with more sessions; OR < 1 indicates decreased odds. We report 95% Wald confidence intervals for the odds ratio.

Interpretation guide: - OR = 1.00: No effect - OR = 1.28: 28% increase in odds per additional session - OR = 2.00: 100% increase (doubling) in odds - OR = 0.80: 20% decrease in odds

Calculation

Visualization

Results and Interpretation

The logistic regression reveals a statistically significant positive relationship between session count and purchase probability. The estimated slope coefficient is β₁ = 0.247 (p < 0.001), which translates to an odds ratio of 1.28 (95% CI [1.18, 1.40]) (R Core Team 2024).

Odds ratio interpretation: For every additional session, the odds of making a purchase increase by 28%. This means customers with 2 sessions have 1.28 times the odds of purchasing compared to customers with 1 session, holding all else constant. The 95% CI [1.18, 1.40] indicates we can be confident the true effect is between an 18% and 40% increase in odds per session.

Probability predictions: - 1 session: Predicted purchase probability = 9.3% - 3 sessions (mean): Predicted purchase probability = 15.3% - 5 sessions: Predicted purchase probability = 25.5% - 8 sessions: Predicted purchase probability = 44.6%

The model shows purchase probability increases non-linearly with sessions, following a logistic S-curve. Customers with 5+ sessions show substantially higher purchase intent (>25% probability) compared to single-session visitors (~9%).

Model fit: The pseudo R² of 0.056 (5.6%) indicates the model explains a modest portion of variance in purchase outcomes. This is typical for individual-level behavioral models where pseudo R² values of 2-15% are often acceptable - many factors beyond session count influence purchase decisions. The residual deviance (1643.9) is substantially lower than the null deviance (1741.2), indicating the model improves over a baseline intercept-only model.

Statistical significance: The Wald z-test for the slope coefficient is highly significant (p < 0.001), providing strong evidence that session count is a meaningful predictor of purchase probability.

Decision framework. The positive relationship between sessions and purchase probability suggests several actionable insights: (1) Engagement strategies: Encourage multi-session visits through retargeting, email reminders, or personalized recommendations, (2) Customer segmentation: Treat 1-session visitors differently from 5+ session visitors in targeting and messaging, (3) Lead scoring: Use session count as a component of lead quality scores, (4) Conversion optimization: Focus conversion tactics on high-session users who show strongest purchase intent (>25% probability), (5) Further analysis: Extend the model with additional predictors (device type, traffic source, time on site) to improve predictions and understand interaction effects.

Predicted Probabilities by Session Count

Assumptions

Logistic regression assumes:

Binary outcome: Purchase is coded as 0 (no purchase) or 1 (purchase)
Independence: Each customer’s purchase decision is independent of others
Linearity of log-odds: The relationship between sessions and log-odds of purchase is linear (can test with polynomial terms)
No perfect separation: Predictor values do not perfectly separate outcomes (no session count guarantees purchase or non-purchase)
Large sample size: Generally requires n ≥ 10 events per predictor (1,500 observations with 1 predictor is adequate)

Note: Unlike linear regression, logistic regression does not assume normality of residuals or homoscedasticity.

Limitations

This analysis does not account for:

Confounding variables: Device type, traffic source, time spent per session, page views, product browsing behavior
Time effects: Session timing (same day vs. spread over weeks), recency effects, seasonality
Customer heterogeneity: New vs. returning customers, demographics, purchase history
Non-linear effects: Diminishing returns at very high session counts (may plateau beyond 8-10 sessions)
Model calibration: Predicted probabilities may be poorly calibrated without validation on held-out data
Interaction effects: Session impact may vary by customer segment or traffic source
Causality: Correlation does not prove sessions cause purchases. High-intent customers may naturally browse more.

Recommendations for improvement: - Add categorical predictors (device type, traffic source) and test interactions with sessions - Include continuous predictors (average time per session, pages viewed, cart additions) - Test polynomial terms (sessions²) or splines for non-linear relationships - Split data into training/test sets to assess out-of-sample prediction accuracy - Calculate AUC-ROC and calibration curves to evaluate discrimination and calibration - Consider hierarchical models if sessions are nested within customers over time - Use propensity score matching or causal inference methods to estimate treatment effects

Model Diagnostics

Diagnostic interpretation: Panel A shows agreement between observed and predicted purchase rates across session counts (points should fall near diagonal line). Panel B shows deviance residuals should be randomly scattered around zero with no systematic pattern. Large deviations suggest model misspecification or influential outliers.

Use the below format to cite this page

Sharafuddin, M. A. (2025, October 25). Logistic regression basics: Predicting purchase from session count. Flair Marketing Intelligence (FlairMI). https://flairmi.com/blog/posts/06-logistic-basics.html

@online{sharafuddin2025-logistic,
  author = {Sharafuddin, Mohammed Ali},
  title  = {Logistic Regression Basics: Predicting Purchase from Session Count},
  year   = {2025},
  date   = {2025-10-25},
  url    = {https://flairmi.com/blog/posts/06-logistic-basics.html},
  langid = {en}
}

Comments

References

R Core Team. 2024. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.r-project.org/.

Citation

BibTeX citation:

@online{ali_sharafuddin2025,
  author = {Ali Sharafuddin, Mohammed},
  title = {Logistic {Regression} {Basics:} {Predicting} {Purchase} from
    {Session} {Count}},
  date = {2025-10-25},
  url = {https://flairmi.com/blog/posts/06-logistic-basics.html},
  langid = {en}
}

For attribution, please cite this work as:

Ali Sharafuddin, Mohammed. 2025. “Logistic Regression Basics: Predicting Purchase from Session Count.” October 25, 2025. https://flairmi.com/blog/posts/06-logistic-basics.html.