LidarCLIP or: How I Learned to Talk to Point Clouds

Hess, Georg; Tonderski, Adam; Petersson, Christoffer; Åström, Kalle; Svensson, Lennart

Computer Science > Computer Vision and Pattern Recognition

arXiv:2212.06858 (cs)

[Submitted on 13 Dec 2022 (v1), last revised 2 May 2023 (this version, v3)]

Title:LidarCLIP or: How I Learned to Talk to Point Clouds

Authors:Georg Hess, Adam Tonderski, Christoffer Petersson, Kalle Åström, Lennart Svensson

View PDF

Abstract:Research connecting text and images has recently seen several breakthroughs, with models like CLIP, DALL-E 2, and Stable Diffusion. However, the connection between text and other visual modalities, such as lidar data, has received less attention, prohibited by the lack of text-lidar datasets. In this work, we propose LidarCLIP, a mapping from automotive point clouds to a pre-existing CLIP embedding space. Using image-lidar pairs, we supervise a point cloud encoder with the image CLIP embeddings, effectively relating text and lidar data with the image domain as an intermediary. We show the effectiveness of LidarCLIP by demonstrating that lidar-based retrieval is generally on par with image-based retrieval, but with complementary strengths and weaknesses. By combining image and lidar features, we improve upon both single-modality methods and enable a targeted search for challenging detection scenarios under adverse sensor conditions. We also explore zero-shot classification and show that LidarCLIP outperforms existing attempts to use CLIP for point clouds by a large margin. Finally, we leverage our compatibility with CLIP to explore a range of applications, such as point cloud captioning and lidar-to-image generation, without any additional training. Code and pre-trained models are available at this https URL.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as:	arXiv:2212.06858 [cs.CV]
	(or arXiv:2212.06858v3 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2212.06858

Submission history

From: Adam Tonderski [view email]
[v1] Tue, 13 Dec 2022 19:02:35 UTC (30,450 KB)
[v2] Thu, 9 Mar 2023 16:00:00 UTC (30,865 KB)
[v3] Tue, 2 May 2023 13:53:40 UTC (30,865 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:LidarCLIP or: How I Learned to Talk to Point Clouds

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:LidarCLIP or: How I Learned to Talk to Point Clouds

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators