Self-supervised Video Representation Learning by Exploiting Video Speed Changes

Loading...
Thumbnail Image

Authors

Chen, Lizhe

Advisor

Veksler, Olga

Journal Title

Journal ISSN

Volume Title

Publisher

University of Waterloo

Abstract

In recent research, the self-supervised video representation learning methods have achieved improvement by exploring video’s temporal properties, such as playing speeds and temporal order. These works inspire us to exploit a new artificial supervision signal for self-supervised representation learning: the change of video playing speed. Specifically, we formulate two novel speediness-related pretext tasks, i.e. speediness change classification and speediness change localization, that jointly supervise a shared backbone for video representation learning. This self-supervision approach solves the tasks altogether and encourages the backbone network to learn local and long-ranged motion and context representations. It outperforms prior arts on multiple downstream tasks, such as action recognition, video retrieval, and action localization.

Description

LC Subject Headings

Citation