Self-supervised Video Representation Learning by Exploiting Video Speed Changes
Abstract
In recent research, the self-supervised video representation learning methods have achieved
improvement by exploring video’s temporal properties, such as playing speeds and temporal
order. These works inspire us to exploit a new artificial supervision signal for self-supervised
representation learning: the change of video playing speed. Specifically, we formulate two
novel speediness-related pretext tasks, i.e. speediness change classification and speediness
change localization, that jointly supervise a shared backbone for video representation learn ing. This self-supervision approach solves the tasks altogether and encourages the backbone
network to learn local and long-ranged motion and context representations. It outperforms
prior arts on multiple downstream tasks, such as action recognition, video retrieval, and
action localization.
Collections
Cite this version of the work
Lizhe Chen
(2022).
Self-supervised Video Representation Learning by Exploiting Video Speed Changes. UWSpace.
http://hdl.handle.net/10012/18208
Other formats