Predicting ACL Injuries Using Machine Learning Models and Tibial Anatomical Predictors
Loading...
Date
Authors
Advisor
Naveen, Chandrashekar
Journal Title
Journal ISSN
Volume Title
Publisher
University of Waterloo
Abstract
The tibial slope and the tibial depth are well-established risk factors for Anterior Cru-
ciate Ligament (ACL) injury. As ML continues to progress, it has become an increasingly
reliable tool for clinical screening and risk factor analysis. This thesis aims to develop and
validate an explainable prognostic ML model to predict ACL injury outcomes from these
Tibial Anatomical Feature (TAF), and identify the most predictive features among these
parameters.
A dataset comprising Coronal Tibial Slope (CTS), Medial Tibial Slope (MTS), Lat-
eral Tibial Slope (LTS), Medial Tibial Depth (MTD), and sex was constructed using MRI
scans taken from 104 subjects (44 males: 22 injured, 22 uninjured; 60 females: 27 in-
jured, 33 uninjured). Two distinct ML pipelines were developed: a self-developed pipeline
(including K-Nearest Neighbor (KNN), Support Vector Machine (SVM), Random Forest
(RF), XGBoost, CATBoost, Multi-Layer Perceptron (MLP), and TabNet) and an advanced
AutoGluon pipeline (including XGBoost, LightGBM, CatBoost, TabPFN, TabM, TabICL,
MITRA, and their weighted ensembles). Both were designed as end-to-end pipelines to pro-
cess the dataset and output predictions with integrated feature importance explanations.
Empirically, the AutoGluon Pipeline demonstrated superior performance and training-time
efficiency. The recommended F2-tuned standard ensemble achieved an F2-score of 0.736
on the validation set. On the test set, it demonstrated a test balanced accuracy of 0.955,
F1-score of 0.952, F2-score of 0.980, ROC AUC of 1.000, precision of 0.909, and recall
of 1.000. A full-dataset model, the F2-tuned full-dataset ensemble refitted on the entire
dataset for clinical deployment achieved a validation F2-score of 0.813. The global feature
importance analyses performed via SHapley Additive exPlanations (SHAP), established
the descending order of influences as MTD, LTS, MTS, CTS, and sex.
In summary, the study recommends two versions of the F2-tuned prognostic models, one
being a standard ensemble model and the other a full-dataset ensemble. The former, which
demonstrated moderately high predictive power, was designed for subsequent research
comparison. The latter, without access to the original held-out test set, is constructed
for maximum robustness and generalization in real-life clinical deployment. Global feature
importance analyses elucidated from the standard ensemble decreased MTD along with
increased LTS and MTS as most contributive features for ACL injury. These models serve
as both feature attribution tools as well as clinical screening tools. These models are
intended to be integrated into clinical practice as explainable machines to assist clinicians
in predicting the likelihood of ACL injury.