Multimodal LLMs Struggle with Basic Visual Network Analysis: a VNA Benchmark

Williams, Evan M.; Carley, Kathleen M.

Computer Science > Computer Vision and Pattern Recognition

arXiv:2405.06634 (cs)

[Submitted on 10 May 2024 (v1), last revised 10 Jun 2024 (this version, v2)]

Title:Multimodal LLMs Struggle with Basic Visual Network Analysis: a VNA Benchmark

Authors:Evan M. Williams, Kathleen M. Carley

View PDF HTML (experimental)

Abstract:We evaluate the zero-shot ability of GPT-4 and LLaVa to perform simple Visual Network Analysis (VNA) tasks on small-scale graphs. We evaluate the Vision Language Models (VLMs) on 5 tasks related to three foundational network science concepts: identifying nodes of maximal degree on a rendered graph, identifying whether signed triads are balanced or unbalanced, and counting components. The tasks are structured to be easy for a human who understands the underlying graph theoretic concepts, and can all be solved by counting the appropriate elements in graphs. We find that while GPT-4 consistently outperforms LLaVa, both models struggle with every visual network analysis task we propose. We publicly release the first benchmark for the evaluation of VLMs on foundational VNA tasks.

Comments:	11 pages, 3 figures
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Cite as:	arXiv:2405.06634 [cs.CV]
	(or arXiv:2405.06634v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2405.06634

Submission history

From: Evan Williams [view email]
[v1] Fri, 10 May 2024 17:51:35 UTC (1,635 KB)
[v2] Mon, 10 Jun 2024 15:28:16 UTC (1,635 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Multimodal LLMs Struggle with Basic Visual Network Analysis: a VNA Benchmark

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Multimodal LLMs Struggle with Basic Visual Network Analysis: a VNA Benchmark

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators