How Physics and Background Attributes Impact Video Transformers in Robotic Manipulation: A Case Study on Planar Pushing

Jin, Shutong; Wang, Ruiyu; Zahid, Muhammad; Pokorny, Florian T.

Computer Science > Robotics

arXiv:2310.02044 (cs)

[Submitted on 3 Oct 2023 (v1), last revised 28 Aug 2024 (this version, v4)]

Title:How Physics and Background Attributes Impact Video Transformers in Robotic Manipulation: A Case Study on Planar Pushing

Authors:Shutong Jin, Ruiyu Wang, Muhammad Zahid, Florian T. Pokorny

View PDF HTML (experimental)

Abstract:As model and dataset sizes continue to scale in robot learning, the need to understand how the composition and properties of a dataset affect model performance becomes increasingly urgent to ensure cost-effective data collection and model performance. In this work, we empirically investigate how physics attributes (color, friction coefficient, shape) and scene background characteristics, such as the complexity and dynamics of interactions with background objects, influence the performance of Video Transformers in predicting planar pushing trajectories. We investigate three primary questions: How do physics attributes and background scene characteristics influence model performance? What kind of changes in attributes are most detrimental to model generalization? What proportion of fine-tuning data is required to adapt models to novel scenarios? To facilitate this research, we present CloudGripper-Push-1K, a large real-world vision-based robot pushing dataset comprising 1278 hours and 460,000 videos of planar pushing interactions with objects with different physics and background attributes. We also propose Video Occlusion Transformer (VOT), a generic modular video-transformer-based trajectory prediction framework which features 3 choices of 2D-spatial encoders as the subject of our case study. The dataset and source code are available at this https URL.

Comments:	IEEE/RSJ IROS 2024
Subjects:	Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2310.02044 [cs.RO]
	(or arXiv:2310.02044v4 [cs.RO] for this version)
	https://doi.org/10.48550/arXiv.2310.02044

Submission history

From: Shutong Jin [view email]
[v1] Tue, 3 Oct 2023 13:35:49 UTC (2,045 KB)
[v2] Wed, 11 Oct 2023 09:21:23 UTC (2,045 KB)
[v3] Sun, 17 Mar 2024 10:37:08 UTC (5,097 KB)
[v4] Wed, 28 Aug 2024 09:34:33 UTC (5,097 KB)

Computer Science > Robotics

Title:How Physics and Background Attributes Impact Video Transformers in Robotic Manipulation: A Case Study on Planar Pushing

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Robotics

Title:How Physics and Background Attributes Impact Video Transformers in Robotic Manipulation: A Case Study on Planar Pushing

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators