Modern Object and Visual Relationship Detection in Images from a Critical, Cognitive and Data Perspective
Loading...
Date
2023-04-27
Authors
Abou Chacra, David
Advisor
Zelek, John
Journal Title
Journal ISSN
Volume Title
Publisher
University of Waterloo
Abstract
Deep learning has dominated the landscape of computer vision for the past decade. Deep learning networks are the top performers on a slew of computer vision challenges (e.g., object detection or image segmentation) and on the most popular datasets. They outperform other approaches by a large margin, each armed with their own tricks to improve upon their predecessors. However recent research highlights several short-comings of deep learning approaches, from poor generalization performance to the difficulty in understanding the rationale behind the decisions they make. More nuanced and human-like tasks such as visual relationship detection still prove difficult for deep learning networks as well.
In this thesis we tackle the problem of scene graph generation: the task of generating a directed graph that describes the relationships between detected objects in an image. We empirically identify, highlight and discuss the shortcomings of modern deep learning approaches to this task along with the reasoning behind these failures. Scene graph generation relies on both object detection and visual relationship detection. Our experiments first tackle object detection (through its more advanced task of instance segmentation) in isolation, then explore visual relationship detection starting with its data and moving on to its deep learning based approaches. Finally we propose and implement Topological Relationship Fields, a novel approach that allows for representing and grounding relationships purely visually. We utilize this representation for a scene graph generation approach that builds upon our findings and tackles the problem radically differently than the current standard approaches.
Description
Keywords
artificial intelligence, machine learning, computer vision, visual relationship detection, scene graphs, human cognition, dataset understanding, adversarial attacks, deep learning, model explainability, statistical modelling, instance segmentation, object detection, network generalization