Towards Adaptable and Deployable Wildlife Detection from Aerial Imagery

Hsiao, Jayden

Towards Adaptable and Deployable Wildlife Detection from Aerial Imagery

Files

Hsiao_Jayden.pdf (34.1 MB)

Date

2026-04-21

Authors

Hsiao, Jayden

Advisor

Clausi, David A.
Xu, Lincoln

Publisher

University of Waterloo

Abstract

Reliable wildlife monitoring is critical for biodiversity conservation, yet large-scale aerial surveys remain constrained by labour-intensive manual counting and limited model generalization. While camera trap–based approaches are used for species identification and behavioural monitoring at fixed locations, aerial imagery acquired from drones and aircraft enables large-area coverage and is commonly used to estimate population abundance, species distributions, and temporal trends through repeated surveys. However, these workflows remain heavily reliant on manual annotation. Although recent advances in computer vision enable automated wildlife detection, many existing approaches rely on species-specific training data, exhibit poor transferability across environments, and lack mechanisms to incorporate expert corrections into iterative model improvement. These limitations hinder scalability and long-term deployment in operational conservation workflows. We propose OpenWildlife as an open-vocabulary, multi-species wildlife detection framework designed for RGB aerial imagery captured from drones or aircraft, together with a human-in-the-loop (HITL) annotation system that supports incremental model refinement. The framework adapts a language-grounded detection architecture to allow species specification through natural language prompts, enabling flexible identification across terrestrial and marine environments without retraining for each new category. Trained on 15 publicly available wildlife datasets, the model achieves up to 0.981 mAP50 (mean Average Precision at 50% overlap, a standard object detection metric) under fine-tuning and 0.597 mAP50 on seven datasets containing novel species in zero-shot settings, demonstrating generalization to unseen species across diverse vertebrate groups, including mammals and birds. To support practical deployment, the detection model is integrated into a HITL workflow that combines automated pre-labelling, regional human correction, and incremental fine-tuning. While these components are individually well-established, the proposed system integrates them into a unified workflow tailored to aerial wildlife imagery. This design avoids exhaustive full-image annotation in dense scenes and enables expert feedback to directly improve subsequent model iterations. A case study conducted with the Arctic Eider Society on high-resolution aerial surveys of eider ducks in Arctic Canada demonstrates practical impact: the system achieves 77.6% recall with a 22.2% counting error while reducing annotation time by 87.4% compared to fully manual labelling, demonstrating its applicability for semi-automated population abundance estimation from aerial surveys. These results demonstrate that combining open-vocabulary detection with human-in-the-loop learning provides a scalable and adaptable approach for wildlife monitoring, enabling efficient and consistent large-area surveys across diverse species and environments.