Ego-exo4d: Understanding skilled human activity from first-and third-person perspectives

K Grauman, A Westbury, L Torresani… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract We present Ego-Exo4D a diverse large-scale multimodal multiview video dataset
and benchmark challenge. Ego-Exo4D centers around simultaneously-captured egocentric …

Learning to predict activity progress by self-supervised video alignment

G Donahue, E Elhamifar - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
In this paper we tackle the problem of self-supervised video alignment and activity progress
prediction using in-the-wild videos. Our proposed self-supervised representation learning …

Put myself in your shoes: Lifting the egocentric perspective from exocentric videos

M Luo, Z Xue, A Dimakis, K Grauman - European Conference on Computer …, 2025 - Springer
We investigate exocentric-to-egocentric cross-view translation, which aims to generate a first-
person (egocentric) view of an actor based on a video recording that captures the actor from …

An outlook into the future of egocentric vision

C Plizzari, G Goletto, A Furnari, S Bansal… - International Journal of …, 2024 - Springer
What will the future be? We wonder! In this survey, we explore the gap between current
research in egocentric vision and the ever-anticipated future, where wearable computing …

Retrieval-augmented egocentric video captioning

J Xu, Y Huang, J Hou, G Chen… - Proceedings of the …, 2024 - openaccess.thecvf.com
Understanding human actions from videos of first-person view poses significant challenges.
Most prior approaches explore representation learning on egocentric videos only while …

Finepseudo: improving pseudo-labelling through temporal-alignablity for semi-supervised fine-grained action recognition

IR Dave, MN Rizve, M Shah - European Conference on Computer Vision, 2025 - Springer
Real-life applications of action recognition often require a fine-grained understanding of
subtle movements, eg, in sports analytics, user interactions in AR/VR, and surgical videos …

Synchronization is all you need: Exocentric-to-egocentric transfer for temporal action segmentation with unlabeled synchronized video pairs

C Quattrocchi, A Furnari, D Di Mauro… - … on Computer Vision, 2025 - Springer
We consider the problem of transferring a temporal action segmentation system initially
designed for exocentric (fixed) cameras to an egocentric scenario, where wearable cameras …

Spherical World-Locking for Audio-Visual Localization in Egocentric Videos

H Yun, R Gao, I Ananthabhotla, A Kumar… - … on Computer Vision, 2025 - Springer
Egocentric videos provide comprehensive contexts for user and scene understanding,
spanning multisensory perception to behavioral interaction. We propose Spherical World …

EgoExoLearn: A Dataset for Bridging Asynchronous Ego-and Exo-centric View of Procedural Activities in Real World

Y Huang, G Chen, J Xu, M Zhang… - Proceedings of the …, 2024 - openaccess.thecvf.com
Being able to map the activities of others into one's own point of view is one fundamental
human skill even from a very early age. Taking a step toward understanding this human …

Fusing Personal and Environmental Cues for Identification and Segmentation of First-Person Camera Wearers in Third-Person Views

Z Zhao, Y Wang, C Wang - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
As wearable cameras become more popular an important question emerges: how to identify
camera wearers within the perspective of conventional static cameras. The drastic difference …