Google Scholar

K Grauman, A Westbury, L Torresani… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract We present Ego-Exo4D a diverse large-scale multimodal multiview video dataset
and benchmark challenge. Ego-Exo4D centers around simultaneously-captured egocentric …

Save Cite Cited by 91 Related articles All 5 versions View as HTML

[PDF] thecvf.com

Learning to predict activity progress by self-supervised video alignment

G Donahue, E Elhamifar - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com

In this paper we tackle the problem of self-supervised video alignment and activity progress
prediction using in-the-wild videos. Our proposed self-supervised representation learning …

Save Cite Cited by 5 Related articles All 2 versions View as HTML

[PDF] arxiv.org

Put myself in your shoes: Lifting the egocentric perspective from exocentric videos

M Luo, Z Xue, A Dimakis, K Grauman - European Conference on Computer …, 2025 - Springer

We investigate exocentric-to-egocentric cross-view translation, which aims to generate a first-
person (egocentric) view of an actor based on a video recording that captures the actor from …

Save Cite Cited by 10 Related articles All 2 versions

[PDF] springer.com

An outlook into the future of egocentric vision

C Plizzari, G Goletto, A Furnari, S Bansal… - International Journal of …, 2024 - Springer

What will the future be? We wonder! In this survey, we explore the gap between current
research in egocentric vision and the ever-anticipated future, where wearable computing …

Save Cite Cited by 29 Related articles All 7 versions

[PDF] thecvf.com

Retrieval-augmented egocentric video captioning

J Xu, Y Huang, J Hou, G Chen… - Proceedings of the …, 2024 - openaccess.thecvf.com

Understanding human actions from videos of first-person view poses significant challenges.
Most prior approaches explore representation learning on egocentric videos only while …

Save Cite Cited by 16 Related articles All 4 versions View as HTML

[PDF] arxiv.org

Finepseudo: improving pseudo-labelling through temporal-alignablity for semi-supervised fine-grained action recognition

IR Dave, MN Rizve, M Shah - European Conference on Computer Vision, 2025 - Springer

Real-life applications of action recognition often require a fine-grained understanding of
subtle movements, eg, in sports analytics, user interactions in AR/VR, and surgical videos …

Save Cite Cited by 2 Related articles All 6 versions

[PDF] arxiv.org

Synchronization is all you need: Exocentric-to-egocentric transfer for temporal action segmentation with unlabeled synchronized video pairs

C Quattrocchi, A Furnari, D Di Mauro… - … on Computer Vision, 2025 - Springer

We consider the problem of transferring a temporal action segmentation system initially
designed for exocentric (fixed) cameras to an egocentric scenario, where wearable cameras …

Save Cite Cited by 3 Related articles All 2 versions

[PDF] arxiv.org

Spherical World-Locking for Audio-Visual Localization in Egocentric Videos

H Yun, R Gao, I Ananthabhotla, A Kumar… - … on Computer Vision, 2025 - Springer

Egocentric videos provide comprehensive contexts for user and scene understanding,
spanning multisensory perception to behavioral interaction. We propose Spherical World …

Save Cite Cited by 1 Related articles All 7 versions

[PDF] thecvf.com

EgoExoLearn: A Dataset for Bridging Asynchronous Ego-and Exo-centric View of Procedural Activities in Real World

Y Huang, G Chen, J Xu, M Zhang… - Proceedings of the …, 2024 - openaccess.thecvf.com

Being able to map the activities of others into one's own point of view is one fundamental
human skill even from a very early age. Taking a step toward understanding this human …

Save Cite Cited by 14 Related articles All 4 versions View as HTML

[PDF] thecvf.com

Fusing Personal and Environmental Cues for Identification and Segmentation of First-Person Camera Wearers in Third-Person Views

Z Zhao, Y Wang, C Wang - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com

As wearable cameras become more popular an important question emerges: how to identify
camera wearers within the perspective of conventional static cameras. The drastic difference …

Save Cite Cited by 2 Related articles All 2 versions View as HTML

Create alert

Cite

Advanced search

Saved to My library

Learning fine-grained view-invariant representations from unpaired ego-exo videos via temporal...

Ego-exo4d: Understanding skilled human activity from first-and third-person perspectives

Learning to predict activity progress by self-supervised video alignment

Put myself in your shoes: Lifting the egocentric perspective from exocentric videos

An outlook into the future of egocentric vision

Retrieval-augmented egocentric video captioning

Finepseudo: improving pseudo-labelling through temporal-alignablity for semi-supervised fine-grained action recognition

Synchronization is all you need: Exocentric-to-egocentric transfer for temporal action segmentation with unlabeled synchronized video pairs

Spherical World-Locking for Audio-Visual Localization in Egocentric Videos

EgoExoLearn: A Dataset for Bridging Asynchronous Ego-and Exo-centric View of Procedural Activities in Real World

Fusing Personal and Environmental Cues for Identification and Segmentation of First-Person Camera Wearers in Third-Person Views