Show simple item record

dc.contributor.authorGebotys, Brennan 17:23:20 (GMT) 17:23:20 (GMT)
dc.description.abstractRecent advances in machine learning strategies have led to improved results across a variety of fields. A field that would benefit greatly from improved machine learning strategies is video analytics: the analysis of video data. Two applications of importance include pose estimation, which aims to identify the pose of a person in a video and action recognition, which aims to identify the action that is performed in a video. However, key problems such as how to train a pose estimation model with a small number of annotations and how to design an action recognition model to achieve the highest possible accuracy still remain. This thesis explores how effectively leveraging motion information can enable strategies that can solve both of these problems. The first problem is that for pose estimation models to achieve a high accuracy, they require a large number of pose annotations, which can be expensive to collect. While a naive approach is to annotate a single frame at a time, researchers have investigated how modifying the model training and generating more annotations can reduce the number of annotations required. However, all these approaches either still include requirements that make annotation collection difficult. This thesis introduces a motion-aware pose annotation strategy called POse annotation using Optical Flow (POOF), which explores how motion information can reduce the number of annotations required without any additional constraints. We show that with only a small number of annotations, utilizing POOF's annotations can achieve a +52% improvement in accuracy compared to training on the small number of annotations. By reducing the number of annotations required, POOF should enable pose estimation models to be more easily applied to many more real-world problems. The second problem is that because there is such a large number of possible design choices, it is difficult to design an action recognition model's architecture to achieve the highest possible accuracy. While state-of-the-art attention mechanisms are a popular choice and have achieved accurate results, a key shortcoming is that they do not leverage any motion information. Motivated by this, this thesis explores how motion can be leveraged with these attention-based mechanisms by introducing a Motion-Aware Attention mechanism called M2A which explicitly leverages both attention and motion information. We show that incorporating motion mechanisms with attention mechanisms using the proposed M2A mechanism can lead from a +15% to a +26% improvement in top-1 accuracy across different backbone architectures, with only a small increase in computational complexity. By better understanding how motion mechanisms can be both accurate and efficient, M2A should enable action recognition solutions to be applied to real-world problems sooner.en
dc.publisherUniversity of Waterlooen
dc.subjectvideo action recognitionen
dc.subjectpose estimationen
dc.titleNovel Motion-Aware Strategies for Efficient and Accurate Video Analyticsen
dc.typeMaster Thesisen
dc.pendingfalse Design Engineeringen Design Engineeringen of Waterlooen
uws-etd.degreeMaster of Applied Scienceen
uws.contributor.advisorWong, Alexander
uws.contributor.advisorClausi, David
uws.contributor.affiliation1Faculty of Engineeringen

Files in this item


This item appears in the following Collection(s)

Show simple item record


University of Waterloo Library
200 University Avenue West
Waterloo, Ontario, Canada N2L 3G1
519 888 4883

All items in UWSpace are protected by copyright, with all rights reserved.

DSpace software

Service outages