Detecting Freezing of Gait Using Wearable Sensors and Machine Learning: Exploring Ternary Freezing of Gait Classification

Hart, Andrew

Detecting Freezing of Gait Using Wearable Sensors and Machine Learning: Exploring Ternary Freezing of Gait Classification

Files

Hart_Andrew.pdf (4.64 MB)

Date

2023-09-20

Authors

Hart, Andrew

Advisor

Tung, James

Publisher

University of Waterloo

Abstract

This work focuses on Parkinson's disease (PD), a neurodegenerative disease characterized by the production of Lewy bodies in the brain, resulting in the degeneration of dopaminergic nigrostriatal neurons. A common and debilitating symptom of PD is Freezing of Gait (FoG), which is described as a sudden, episodic inability to make forward progress while walking despite the intention to do so. FoG can lead to falls and difficulties in everyday tasks, especially mobility. Conventional PD treatments have a variable impact on mitigating FoG due to large heterogeneity within the freezing population, necessitating active monitoring of an individual's FoG severity. This study aims to aid the development of active FoG severity monitoring using wearable sensors and machine learning. Specifically, it explores the ternary (3-class) domain of FOG classification (akinetic, kinetic, and no FoG), which has not been extensively studied before. Specific objectives of this thesis comprises of: identifying suitable datasets, selecting and optimizing machine learning models, evaluating model performance on participants, and identifying potential applications based on observed results. Two datasets were considered for this study, including the Sydney dataset collected by Goh et al. at the University of Sydney in Australia, and the publicly-available MJFF dataset comprising multiple collections of data from various groups. The Sydney dataset consists of 10 participants completing the Ziegler protocol in their "ON'' and "OFF'' medication states while equipped with a tri-axial inertial measurement unit (IMU) on their sternum, lumbar, and bilateral feet. Throughout this dataset, there was a total of 24.9% of the time spent in an akinetic freeze and 8.87% of the time spent in a kinetic freeze. As for the MJFF dataset, it was comprised of 100 participants completing a similar Ziegler protocol and an alternative DeFOG protocol in the two medication states with a lumbar tri-axial accelerometer. In total, there were 833 trials for the Zeigler protocol in this dataset, and 91 trials for the DeFOG protocol, combining to produce a total of 1.47% of the time spent in an akinetic freeze and 12.39% of the time spent in a kinetic freeze states. For classification models, a total of seven architectures were considered, including six classical models and one deep network model. The classical models received input in the form of feature vectors, whereas the deep model utilized frequency domain signals along with a convolutional network backbone to extract information. The features included in this study were selected from establishing an initial pool, then trimming the included features down using common feature engineering techniques such as Kendalls correlation, and Minimum Redundancy - Maximum Relevance (mRMR). Additionally, all models went through a randomized grid search for the optimal hyperparameters and architecture parameters to optimize performance on the utilized datasets. Testing the models with the participant data in the Sydney dataset revealed that all classical models and the deep network model encountered challenges in ternary FoG classification compared to results in the current literature. While some models performed well for a subset of participants, mainly severe freezers, the majority of the classifiers struggled to accurately label ternary FoG bouts with many F1-scores falling below 40%. The top-performing classical model, logistic regression (LR), faced difficulties in classifying kinetic freezing and temporal accuracy. It was theorized these difficulties arose due to limited frequency domain features in the final feature set, and limited information about neighbouring windows when making inferences. While the deep model also struggled with correctly classifying the timing of the bout, to a larger extent, it had trouble differentiating between akinetic and kinetic freezing. This drop in performance is likely attributable to freeze states not achieving steady state, and/or the large heterogeneity within the population producing in manifesting akinetic and kinetic freezing (e.g., some akinetic freezes might have movement, while others are purely akinetic with no movement at all). When FoG onsets and offsets were not considered, both models demonstrated better performance in classifying severity, with the LR model predicting correct severity for seven out of ten individuals and achieving an F1-score of 76% in akinetic freezing and correctly predicting six out of ten individuals and achieving an F1-score of 60% in kinetic freezing. The deep model correctly classified the combined total severity (akinetic and kinetic percentages combined) for seven out of ten individuals and achieved an F1-score of 58%. The findings of this thesis indicate that existing models face challenges in automatically detecting ternary FoG labels. Further exploration of feature pools and architectures is warranted to enhance performance in free-living applications. Post-calibration techniques on model outputs or combining models in a majority voting system are recommended. Ultimately, this study suggests that the current use of ternary FoG classification may be better suited for severity estimates or as an annotation tool for clinicians, rather than a gold standard for free-living labels. More specifically, the models could be used to provide severity estimates in free-living conditions. These estimates could be later combined with in-clinic visits to gain a deeper understanding of an individual's disease progression. Alternatively, actual FoG bout classification can serve as a tool to expedite annotators by flagging areas of interest prior to a manual confirmation process.