top of page

Clustering of Risk Behaviors and Classification Performance in Modeling Adolescent Risk: The Example of the Association between E-Cigarette Use and Cigarette Smoking

16 May 2025


Arielle Selya a, Ray Niaura b, Sooyong Kim



Abstract

Background

Adolescent behavioral risks are highly correlated, complicating interpretation of narrowly-focused research (e.g. 2-3 variables). We explore methodological issues when interpreting a narrowly-focused association (typically using a causal-inference framework) vs. a wider approach incorporating many correlated risk factors (using a less-common predictive-inference framework), using the currently-relevant example of adolescent e-cigarette use and cigarette smoking.

Methods

Data were drawn from the Adolescent Behaviors and Experiences (ABES) study, national survey of U.S. youth, and behavioral risks were grouped into categories of e-cigarette use, cigarette smoking, other tobacco use, alcohol and cannabis use, illicit drug use, mental health, violence, other risky behaviors, and parental monitoring. Three exploratory data analyses (Spearman correlation, non-metric multidimensional scaling, and divisive hierarchical clustering) examined clustering/grouping across variables. Logistic regressions examined 1) the association between e-cigarette use and smoking and 2) the reverse-direction association, after successively adjusting for groups of risk factors. Ten-fold cross-validation was performed to evaluate predictive validity.

Results

In three exploratory data analyses, e-cigarette use and cigarette smoking were correlated, but each was more closely related to other variables (alcohol and cannabis use vs. other tobacco use and illicit drugs, respectively). Logistic regression models showed similar odds ratios for the forward- and reverse-direction models, but cross-validation testing showed that the reverse-direction model had better classification performance.

Conclusions

A narrow focus on adolescent risk behaviors with a causal-inference framework can result in erroneous interpretation in the presence of many correlated risk factors. A wider predictive-inference perspective can help inform better screening strategies and potential intervention targets.



bottom of page