The Libraries will be performing routine maintenance on UWSpace on October 13th, 2025, from 8 - 9 am ET. UWSpace will be unavailable during this time. Service should resume by 9 am ET.
 

From Far-Field Dynamics to Close-Up Confidence: Action Recognition Across Varying Camera Distances

dc.contributor.authorBuzko, Kseniia
dc.date.accessioned2025-09-22T13:07:10Z
dc.date.available2025-09-22T13:07:10Z
dc.date.issued2025-09-22
dc.date.submitted2025-09-18
dc.description.abstractHuman action recognition (HAR) refers to the task of identifying and classifying human actions within videos or sequences of images. This field has gained significant importance due to its diverse applicability across domains such as sports analytics, human-computer interaction, surveillance, and interpersonal communication. Accurate action recognition becomes especially difficult when the camera distance changes, because the cues that matter shift with scale. For instance, a close-up hinges on facial emotion (such as smiles and eye gaze), whereas a medium shot relies on hand gestures or objects being manipulated. In the context of HAR, we distinguish two primary scenarios that illustrate this challenge. The first is the far-field setting, characterized by subjects positioned at a distance and often exhibiting rapid movement, which leads to frequent occlusions. This scenario is commonly observed in sports broadcasts, where capturing the game’s dynamics is essential. In contrast, the near-field setting involves subjects that are nearby and tend to remain relatively static. This setting enables the capture of subtle yet informative gestures, similar to those observed in presenter-focused videos. Although most studies treat these regimes separately, modern media (films, replays, vlogs) cut or zoom fluidly between them. An effective recognizer must therefore decide dynamically which cues to prioritize: facial emotion in tight close-ups, hand or torso motion in medium shots, and full-body dynamics in wide views. Despite substantial progress, current HAR pipelines rarely adapt across that zoom continuum. This thesis therefore asks: What scale-specific hurdles confront human action recognition in far-field, near-field, and zoom-mixed scenarios, and how can insights from separate case studies keep recognition robust when the camera sweeps from full-body scenes to tight close-ups and back again? To answer, we contribute three scale-aware systems: 1. Hockey Action Identification and Keypose Understanding (HAIKYU) (far-field). For hockey broadcasts, we introduce temporal bounding-box normalization, which removes camera-induced scale jitter, and a 15-keypoint skeleton that adds stick endpoints. Combined with normalization, this improves Top-1 accuracy from 31% to 64%, showing that stick cues are indispensable for ice-hockey actions. 2. Confidence Fostering Identity-preserving Dynamic Transformer (CONFIDANT) (near-field). We curate a 38-class micro-gestures dataset and train an upper-body action recognizer that flags unconfident cues, such as folding arms, crossing fingers, and clasping hands. A diffusion-based video editor then rewrites these segments into confident counterparts, serving as a downstream demonstration of fine-grained recognition. 3. Scale-aware routing framework for mixed-zoom action recognition (Zoom-Gate) (zoom-mixed). A lightweight zoom score derived from the bounding-box area and the density of detected keypoints routes each tracklet to the specialist model best suited to that scale. Experiments confirm that this scale-aware routing, combined with context-specific skeletons, delivers robust performance across mixed-zoom datasets. Collectively, these contributions demonstrate that coupling scale-aware preprocessing with context-specific skeletons can maintain pose-centric HAR reliability across the zoom spectrum. The resulting frameworks open avenues for real-time segmentation, multi-view fusion, and ultimately a unified, scale-invariant action understanding pipeline.
dc.identifier.urihttps://hdl.handle.net/10012/22496
dc.language.isoen
dc.pendingfalse
dc.publisherUniversity of Waterlooen
dc.subjecthuman action recognition
dc.subjectaction classification
dc.subjectvideo understanding
dc.subjectskeleton-based recognition
dc.titleFrom Far-Field Dynamics to Close-Up Confidence: Action Recognition Across Varying Camera Distances
dc.typeMaster Thesis
uws-etd.degreeMaster of Applied Science
uws-etd.degree.departmentSystems Design Engineering
uws-etd.degree.disciplineSystem Design Engineering
uws-etd.degree.grantorUniversity of Waterlooen
uws-etd.embargo.terms0
uws.contributor.advisorDavid, Clausi
uws.contributor.advisorChen, Yuhao
uws.contributor.affiliation1Faculty of Engineering
uws.peerReviewStatusUnrevieweden
uws.published.cityWaterlooen
uws.published.countryCanadaen
uws.published.provinceOntarioen
uws.scholarLevelGraduateen
uws.typeOfResourceTexten

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Buzko_Kseniia.pdf
Size:
18.84 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
6.4 KB
Format:
Item-specific license agreed upon to submission
Description: