|dc.description.abstract||Human pose estimation and action recognition in ice hockey are one of the biggest challenges in computer vision-driven sports analytics, with a variety of difficulties such as bulky hockey wear, color similarity between ice rink and player jersey and the presence of additional sports equipment used by the players such as hockey sticks. As such, deep neural network architectures typically used for sports including baseball, soccer, and track and field perform poorly when applied to hockey. This research involves the design and implementation of deep neural networks for both pose estimation and action recognition can effectively evaluate the pose and the actions of a hockey player.
First, a pre-trained convolutional neural network, known as the stacked hourglass network, is used to determine a hockey player's body placement in video frames, also known as pose estimation. The proposed method provides a tool to analyze the pose of a hockey player via broadcast video which aids in the eventual assessment of a hockey player's speed, shot accuracy, or other metrics. The algorithm demonstrated to be successful since it identifies on average 81.56% of the joints of a hockey player on a set of test images.
Furthermore, inspired by the idea that modeling the pose of a hockey stick can improve hockey player pose estimation, a novel deep learning computer vision architecture known as the HyperStackNet has been designed and implemented for joint player and stick pose estimation. In addition to improving player pose estimation, the HyperStackNet architecture enables improved transfer learning from pre-trained stacked hourglass networks trained on a different domain. Experimental results demonstrate that when the HyperStackNet is trained to detect 18 different joint positions on a hockey player (including the hockey stick), the accuracy is 98.8% on the test dataset, thus demonstrating its efficacy for handling complex joint player and stick pose estimation from video.
Extending from pose recognition, this research involves the development of an algorithm for accurate recognition of actions for hockey. To perform this action recognition, a convolutional neural network estimates actions through unifying latent pose and action recognition. The action recognition hourglass network, or ARHN, is designed to interpret player actions in ice hockey video using estimated pose. ARHN has three components. The first component is the latent pose estimator, the second transforms latent features to a common frame of reference, and the third performs action recognition. Since no benchmark dataset for pose estimation or action recognition is available for hockey players, we first had to generate such an annotated dataset. Experimental results show action recognition accuracy of 65% for four types of actions in hockey. When similar poses are merged to three and two classes, the accuracy rate increases to 71% and 78%, proving the potential of the methodology for automated action recognition in hockey.||en