Self-supervised Video Representation Learning by Exploiting Video Speed Changes

Chen, Lizhe

View/ Open

Main article (21.53Mb)

Date

2022-04-29

Abstract

In recent research, the self-supervised video representation learning methods have achieved improvement by exploring video’s temporal properties, such as playing speeds and temporal order. These works inspire us to exploit a new artificial supervision signal for self-supervised representation learning: the change of video playing speed. Specifically, we formulate two novel speediness-related pretext tasks, i.e. speediness change classification and speediness change localization, that jointly supervise a shared backbone for video representation learn ing. This self-supervision approach solves the tasks altogether and encourages the backbone network to learn local and long-ranged motion and context representations. It outperforms prior arts on multiple downstream tasks, such as action recognition, video retrieval, and action localization.

URI

http://hdl.handle.net/10012/18208

Collections

Cite this version of the work

Lizhe Chen (2022). Self-supervised Video Representation Learning by Exploiting Video Speed Changes. UWSpace. http://hdl.handle.net/10012/18208

Other formats