Phenotyping Risk Profiles of Substance Use and Exploring the Dynamic Transitions in Use Patterns: Machine Learning Models using the COMPASS Data
MetadataShow full item record
Background Polysubstance use is on the rise among Canadian youth. Examining risk profiles and understanding how the transition occurs in use patterns can inform the design and implementation of polysubstance risk reduction intervention. The COMPASS study is longitudinal research examining health-related behaviours among Canadian secondary school students, capturing data from multiple sources. Machine learning (ML) techniques can reveal non-linearity and multivariate couplings associated with population-level longitudinal data to inform public health policies. Objectives The overarching goal of this thesis is to identify phenotypes of risk profiles of youth polysubstance use and examine the dynamic transitions of use patterns across time, utilizing both unsupervised ML methods and a latent variable modelling approach. This thesis also aims to understand how ML techniques are best used in modelling transitions and discovering the “hidden” patterns from large complex population-based health survey data, using the COMPASS dataset as a showcase. Methods A linked sample (N = 8824) of three annual waves of the COMPASS data collected starting from the school year of 2016-17 was used. Multiple imputations for missing values were performed. Substance use indicators, including cigarette smoking, e-cigarette use, alcohol drinking, and marijuana consumption, were categorized into “never use,” “occasional use,” and “current use.” To examine phenotypes of risk profiles, hierarchical clustering, partitioning around medoids (PAM), and fuzzy clustering algorithms were applied. The Boruta algorithm was used to identify a subset of features for cluster analysis. Both the internal and external indices were employed to evaluate the clustering validity. A multivariate latent Markov model (LMM) was implemented to explore the dynamic transitions of use patterns over time. The least absolute shrinkage and selection operator (LASSO) approach was applied to select the appropriate covariates for entering the LMM. Model selection was based on the Bayesian information criterion (BIC) and the goodness-of-fit test. Results The top factors impacting youth polysubstance use included the number of smoking friends, the number of skipped classes, the weekly money to spend/save oneself, and others. Four risk profiles of polysubstance use were identified across the three waves: low, medium-low, medium-high, and high-risk profiles. The heterogeneity in the prevalence and phenotype across these four risk profiles was confirmed. The internal measures of clustering performance measured by average silhouette width ranged from 0.51 to 0.55 across the three waves using different clustering algorithms. The clustering algorithms achieved a relatively high degree of agreement on cluster membership. Comparing the fuzzy (FANNY) clustering with PAM clustering, the adjusted Rand indices were 0.9698, 0.7676, and 0.6452 for the three waves. Four distinct use patterns were identified: no use (S1), occasional single-use of alcohol (S2), dual-use of e-cigarette and alcohol (S3), and current multi-use (S4). The initial probabilities of each subgroup were 0.5887, 0.2156, 0.1487, and 0.0470. The marginal distribution of S1 decreased, while that of S3 and S4 increased over time, indicating a tendency towards increased substance use as the students grew older. Although, generally, most students remained in the same subgroup across time, particularly the individuals in S4 with the highest transition probability (0.8668). Over time, those who transitioned typically moved towards a more severe use pattern group, e.g., S3 -> S4. Factors that impact the initial membership of use patterns and the dynamic transitions were multifaceted and complex across the four use patterns across the three waves. Not only do use patterns change with time, but so does the evidence in use patterns. Conclusion As the first study of its kind to ascertain risk profiles and dynamics of use patterns in youth polysubstance use, by employing ML approaches to the COMPASS dataset, this thesis provides insights into the opportunities and possibilities ahead for ML in Public Health. Findings from this thesis can be beneficial to practitioners in the field, such as school program managers or policymakers, in their capacity to develop interventions to prevent or remedy polysubstance use among youth.
Cite this version of the work
Yang Yang (2021). Phenotyping Risk Profiles of Substance Use and Exploring the Dynamic Transitions in Use Patterns: Machine Learning Models using the COMPASS Data. UWSpace. http://hdl.handle.net/10012/17610