On the Design of Efficient Deep Learning Methods for Human Activity Recognition in Resource Constrained Devices

Loading...
Thumbnail Image

Date

2023-04-05

Authors

Nooruddin, Sheikh

Advisor

Karray, Fakhri

Journal Title

Journal ISSN

Volume Title

Publisher

University of Waterloo

Abstract

Human Activity Recognition (HAR) is the process of automatic recognition of Activities of Daily Life (ADL) from human motion data captured in various data modalities by wearable and ambient sensors. Advances in deep learning, especially Convolutional Neural Networks (CNN) have revolutionized intelligent frameworks such as HAR systems by effectively and efficiently inferring human activity from various modalities of data. However, the training and inference of CNNs are often resource-intensive. Recent research developments are focused on bringing the effectiveness of CNNs in resource constrained edge devices through Tiny Machine Learning (TinyML). TinyML aims to optimize these models in terms of compute and memory requirements - aiming to make them suitable for always-on resource constrained devices - leading to a reduc- tion in communication latency and network traffic for HAR frameworks. In this thesis, at first, we provide a benchmark to understand these trade-offs among variations of CNN network archi- tectures, different training methodologies, and different modalities of data in the context of HAR, TinyML, and edge devices. We tested and reported the performance of CNN and Depthwise Sep- arable Convolutional Neural Network (DSCNN) models as well as two training methodologies: Quantization Aware Training (QAT) and Post Training Quantization (PTQ) on five commonly used benchmark datasets containing image and time-series data: UP-Fall, Fall Detection Dataset (FDD), PAMAP2, UCI-HAR, and WISDM. We also deployed and tested the performance of the model-based standalone applications on multiple commonly available resource constrained edge devices in terms of inference time and power consumption. Later, we focus on HAR from video data sources. We proposed a two-stream multi-resolution fusion architecture for HAR from video data modality. The context stream takes a resized image as input and the fovea stream takes the cropped center portion of the resized image as input, reducing the overall dimensionality. Due to camera bias, objects of interest are often situated in the center of the frame. We tested two quantization methods: PTQ and QAT to optimize these models for deployment in edge devices and tested the performance in two challenging video datasets: KTH and UCF11. We performed ablation studies to validate the two-stream model performance. We deployed the proposed ar- chitecture in commercial resource constrained devices and monitored their performance in terms of inference latency and power consumption. The results indicate that the proposed architecture clearly outperforms other relevant single-stream models tested in this work in terms of accuracy, precision, recall, and F1 score while also reducing the overall model size. The experimental results in this thesis demonstrate the effectiveness and feasibility of TinyML for HAR from mul- timodal data sources in edge devices.

Description

Keywords

Tiny Machine Learning (TinyML), Human Activity Recognition (HAR), Deep Learning (DL), Computer Vision (CV), Image Processing

LC Subject Headings

Citation