Robust 3D Human Modeling for Baseball Sports Analytics
Loading...
Date
2024-08-12
Authors
Advisor
Zelek, John
Journal Title
Journal ISSN
Volume Title
Publisher
University of Waterloo
Abstract
In the fast-paced world of baseball, maximizing pitcher performance while minimizing runs relies on understanding subtle variations in mechanics. Traditional analysis methods, reliant on pre-recorded offline numerical data, struggle in the dynamic flow of live games. Although seemingly ideal, broadcast video analysis faces significant challenges due to motion blur, occlusion, and low resolution. This research proposes a novel 3D human modeling technique and a pitch statistics identification system that are robust to the aforementioned challenges.
Specifically, we propose a technique called Distribution and Depth-Aware Human Mesh Recovery (D2A-HMR), a depth and distribution-aware 3D human mesh recovery technique that extracts pseudo-depth from each frame and utilizes a transformer network with self- and cross-attention to create a 3D mesh that extracts the 3D pose coordinates. The network is regularized using various loss functions including a silhouette loss function, joint reprojection loss functions, and a distribution loss function which utilize normalizing flow to learn the deviation between the underlying predicted and ground truth distributions. Furthermore, we propose a focused augmentation strategy specifically designed to address the motion blur issue caused by fast-moving motion.
Following that, we introduce the PitcherNet system, which is built upon the D2A-HMR and motion blur augmentation strategy. PitcherNet proposes an automated analysis system that analyzes pitcher kinematics directly from live broadcast video, providing valuable pitch statistics (pitch velocity, release point, pitch position, release extension, and pitch handedness). The system relies solely on the broadcast videos as its input and leverages computer vision and pattern recognition to generate reliable pitch statistics from the game. First, PitcherNet isolates the pitcher and batter in each frame using a role classification network. Next, PitcherNet extracts the kinematic information representing the pitcher’s joints and surface using a refined version of D2A-HMR model.
Additionally, we enhance the generalizability of the 3D human model by incorporating additional in-the-wild high-resolution videos from the Internet. Finally, PitcherNet employs Temporal Convolutional Network (TCN) and kinematic-driven heuristics to capture the pitch statistics, which can be used to analyze baseball pitchers.
Description
Keywords
computer vision, sports analytics, human pose estimation, 3d human modeling, vision transformers