Differentiable Frequency-based Disentanglement for Aerial Video Action Recognition

Kothandaraman, Divya; Lin, Ming; Manocha, Dinesh

Computer Science > Computer Vision and Pattern Recognition

arXiv:2209.09194 (cs)

[Submitted on 15 Sep 2022 (v1), last revised 5 Oct 2022 (this version, v2)]

Title:Differentiable Frequency-based Disentanglement for Aerial Video Action Recognition

Authors:Divya Kothandaraman, Ming Lin, Dinesh Manocha

View PDF

Abstract:We present a learning algorithm for human activity recognition in videos. Our approach is designed for UAV videos, which are mainly acquired from obliquely placed dynamic cameras that contain a human actor along with background motion. Typically, the human actors occupy less than one-tenth of the spatial resolution. Our approach simultaneously harnesses the benefits of frequency domain representations, a classical analysis tool in signal processing, and data driven neural networks. We build a differentiable static-dynamic frequency mask prior to model the salient static and dynamic pixels in the video, crucial for the underlying task of action recognition. We use this differentiable mask prior to enable the neural network to intrinsically learn disentangled feature representations via an identity loss function. Our formulation empowers the network to inherently compute disentangled salient features within its layers. Further, we propose a cost-function encapsulating temporal relevance and spatial content to sample the most important frame within uniformly spaced video segments. We conduct extensive experiments on the UAV Human dataset and the NEC Drone dataset and demonstrate relative improvements of 5.72% - 13.00% over the state-of-the-art and 14.28% - 38.05% over the corresponding baseline model.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2209.09194 [cs.CV]
	(or arXiv:2209.09194v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2209.09194

Submission history

From: Divya Kothandaraman [view email]
[v1] Thu, 15 Sep 2022 22:16:52 UTC (2,124 KB)
[v2] Wed, 5 Oct 2022 01:06:04 UTC (2,125 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Differentiable Frequency-based Disentanglement for Aerial Video Action Recognition

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Differentiable Frequency-based Disentanglement for Aerial Video Action Recognition

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators