Clustering of Risk Behaviors and Classification Performance in Modeling Adolescent Risk: The Example of the Association between E-Cigarette Use and Cigarette Smoking
- Alan Gor
- May 16
- 1 min read

16 May 2025
Arielle Selya a, Ray Niaura b, Sooyong Kim
Abstract
Background
Adolescent behavioral risks are highly correlated, complicating interpretation of narrowly-focused research (e.g. 2-3 variables). We explore methodological issues when interpreting a narrowly-focused association (typically using a causal-inference framework) vs. a wider approach incorporating many correlated risk factors (using a less-common predictive-inference framework), using the currently-relevant example of adolescent e-cigarette use and cigarette smoking.
Methods
Data were drawn from the Adolescent Behaviors and Experiences (ABES) study, national survey of U.S. youth, and behavioral risks were grouped into categories of e-cigarette use, cigarette smoking, other tobacco use, alcohol and cannabis use, illicit drug use, mental health, violence, other risky behaviors, and parental monitoring. Three exploratory data analyses (Spearman correlation, non-metric multidimensional scaling, and divisive hierarchical clustering) examined clustering/grouping across variables. Logistic regressions examined 1) the association between e-cigarette use and smoking and 2) the reverse-direction association, after successively adjusting for groups of risk factors. Ten-fold cross-validation was performed to evaluate predictive validity.
Results
In three exploratory data analyses, e-cigarette use and cigarette smoking were correlated, but each was more closely related to other variables (alcohol and cannabis use vs. other tobacco use and illicit drugs, respectively). Logistic regression models showed similar odds ratios for the forward- and reverse-direction models, but cross-validation testing showed that the reverse-direction model had better classification performance.
Conclusions
A narrow focus on adolescent risk behaviors with a causal-inference framework can result in erroneous interpretation in the presence of many correlated risk factors. A wider predictive-inference perspective can help inform better screening strategies and potential intervention targets.