Bootstrapping Objectness from Videos by Relaxed Common Fate and Visual Grouping

Lian, Long; Wu, Zhirong; Yu, Stella X.

Computer Science > Computer Vision and Pattern Recognition

arXiv:2304.08025 (cs)

[Submitted on 17 Apr 2023]

Title:Bootstrapping Objectness from Videos by Relaxed Common Fate and Visual Grouping

Authors:Long Lian, Zhirong Wu, Stella X. Yu

View PDF

Abstract:We study learning object segmentation from unlabeled videos. Humans can easily segment moving objects without knowing what they are. The Gestalt law of common fate, i.e., what move at the same speed belong together, has inspired unsupervised object discovery based on motion segmentation. However, common fate is not a reliable indicator of objectness: Parts of an articulated / deformable object may not move at the same speed, whereas shadows / reflections of an object always move with it but are not part of it.
Our insight is to bootstrap objectness by first learning image features from relaxed common fate and then refining them based on visual appearance grouping within the image itself and across images statistically. Specifically, we learn an image segmenter first in the loop of approximating optical flow with constant segment flow plus small within-segment residual flow, and then by refining it for more coherent appearance and statistical figure-ground relevance.
On unsupervised video object segmentation, using only ResNet and convolutional heads, our model surpasses the state-of-the-art by absolute gains of 7/9/5% on DAVIS16 / STv2 / FBMS59 respectively, demonstrating the effectiveness of our ideas. Our code is publicly available.

Comments:	Accepted by CVPR 2023. An extension of preprint 2212.08816. 19 pages, 11 figures
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2304.08025 [cs.CV]
	(or arXiv:2304.08025v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2304.08025

Submission history

From: Long Lian [view email]
[v1] Mon, 17 Apr 2023 07:18:21 UTC (10,309 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Bootstrapping Objectness from Videos by Relaxed Common Fate and Visual Grouping

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Bootstrapping Objectness from Videos by Relaxed Common Fate and Visual Grouping

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators