Self-supervised Video Representation Learning by Exploiting Video Speed Changes
MetadataShow full item record
In recent research, the self-supervised video representation learning methods have achieved improvement by exploring video’s temporal properties, such as playing speeds and temporal order. These works inspire us to exploit a new artificial supervision signal for self-supervised representation learning: the change of video playing speed. Specifically, we formulate two novel speediness-related pretext tasks, i.e. speediness change classification and speediness change localization, that jointly supervise a shared backbone for video representation learn ing. This self-supervision approach solves the tasks altogether and encourages the backbone network to learn local and long-ranged motion and context representations. It outperforms prior arts on multiple downstream tasks, such as action recognition, video retrieval, and action localization.
Cite this version of the work
Lizhe Chen (2022). Self-supervised Video Representation Learning by Exploiting Video Speed Changes. UWSpace. http://hdl.handle.net/10012/18208