Locate Anything on Earth: Advancing Open-Vocabulary Object Detection for Remote Sensing Community

Pan, Jiancheng; Liu, Yanxing; Fu, Yuqian; Ma, Muyuan; Li, Jiahao; Paudel, Danda Pani; Van Gool, Luc; Huang, Xiaomeng

Computer Science > Computer Vision and Pattern Recognition

arXiv:2408.09110 (cs)

[Submitted on 17 Aug 2024 (v1), last revised 13 Feb 2025 (this version, v2)]

Title:Locate Anything on Earth: Advancing Open-Vocabulary Object Detection for Remote Sensing Community

Authors:Jiancheng Pan, Yanxing Liu, Yuqian Fu, Muyuan Ma, Jiahao Li, Danda Pani Paudel, Luc Van Gool, Xiaomeng Huang

View PDF HTML (experimental)

Abstract:Object detection, particularly open-vocabulary object detection, plays a crucial role in Earth sciences, such as environmental monitoring, natural disaster assessment, and land-use planning. However, existing open-vocabulary detectors, primarily trained on natural-world images, struggle to generalize to remote sensing images due to a significant data domain gap. Thus, this paper aims to advance the development of open-vocabulary object detection in remote sensing community. To achieve this, we first reformulate the task as Locate Anything on Earth (LAE) with the goal of detecting any novel concepts on Earth. We then developed the LAE-Label Engine which collects, auto-annotates, and unifies up to 10 remote sensing datasets creating the LAE-1M - the first large-scale remote sensing object detection dataset with broad category coverage. Using the LAE-1M, we further propose and train the novel LAE-DINO Model, the first open-vocabulary foundation object detector for the LAE task, featuring Dynamic Vocabulary Construction (DVC) and Visual-Guided Text Prompt Learning (VisGT) modules. DVC dynamically constructs vocabulary for each training batch, while VisGT maps visual features to semantic space, enhancing text features. We comprehensively conduct experiments on established remote sensing benchmark DIOR, DOTAv2.0, as well as our newly introduced 80-class LAE-80C benchmark. Results demonstrate the advantages of the LAE-1M dataset and the effectiveness of the LAE-DINO method.

Comments:	15 pages, 11 figures
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2408.09110 [cs.CV]
	(or arXiv:2408.09110v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2408.09110

Submission history

From: Jiancheng Pan [view email]
[v1] Sat, 17 Aug 2024 06:24:43 UTC (3,680 KB)
[v2] Thu, 13 Feb 2025 18:01:16 UTC (16,228 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Locate Anything on Earth: Advancing Open-Vocabulary Object Detection for Remote Sensing Community

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Locate Anything on Earth: Advancing Open-Vocabulary Object Detection for Remote Sensing Community

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators