UWSpace is currently experiencing technical difficulties resulting from its recent migration to a new version of its software. These technical issues are not affecting the submission and browse features of the site. UWaterloo community members may continue submitting items to UWSpace. We apologize for the inconvenience, and are actively working to resolve these technical issues.
 

Modern Object and Visual Relationship Detection in Images from a Critical, Cognitive and Data Perspective

Loading...
Thumbnail Image

Date

2023-04-27

Authors

Abou Chacra, David

Journal Title

Journal ISSN

Volume Title

Publisher

University of Waterloo

Abstract

Deep learning has dominated the landscape of computer vision for the past decade. Deep learning networks are the top performers on a slew of computer vision challenges (e.g., object detection or image segmentation) and on the most popular datasets. They outperform other approaches by a large margin, each armed with their own tricks to improve upon their predecessors. However recent research highlights several short-comings of deep learning approaches, from poor generalization performance to the difficulty in understanding the rationale behind the decisions they make. More nuanced and human-like tasks such as visual relationship detection still prove difficult for deep learning networks as well. In this thesis we tackle the problem of scene graph generation: the task of generating a directed graph that describes the relationships between detected objects in an image. We empirically identify, highlight and discuss the shortcomings of modern deep learning approaches to this task along with the reasoning behind these failures. Scene graph generation relies on both object detection and visual relationship detection. Our experiments first tackle object detection (through its more advanced task of instance segmentation) in isolation, then explore visual relationship detection starting with its data and moving on to its deep learning based approaches. Finally we propose and implement Topological Relationship Fields, a novel approach that allows for representing and grounding relationships purely visually. We utilize this representation for a scene graph generation approach that builds upon our findings and tackles the problem radically differently than the current standard approaches.

Description

Keywords

artificial intelligence, machine learning, computer vision, visual relationship detection, scene graphs, human cognition, dataset understanding, adversarial attacks, deep learning, model explainability, statistical modelling, instance segmentation, object detection, network generalization

LC Keywords

Citation