×
Sep 10, 2024 · Our experiments indicate that LIME reduces the number of samples by 76% and evaluation time by 77%, while also providing a more effective means ...
Sep 10, 2024 · We propose LIME (Less Is More for MLLM Evaluation), a refined and efficient benchmark curated using a semi-automated pipeline.
Sep 27, 2024 · Multimodal Large Language Models (MLLMs) are measured on numerous benchmarks like image captioning, visual question answer, and reasoning.
Sep 12, 2024 · Multimodal Large Language Models (MLLMs) are evaluated on various benchmarks, such as image captioning, visual question answering, ...
Sep 28, 2024 · The Eliminate Answer Leakage module filters samples whose answers can be inferred without images. Finally, we curate the LIME-M: Less Is More ...
Missing: MLLM | Show results with:MLLM
Sep 22, 2024 · LIME-M is a new method for evaluating large language models (LLMs) that uses less data than traditional benchmarks.
The Semi-Automated Screening Process filters out samples that cannot distinguish the model's capabilities by synthesizing various MLLMs and manually evaluating ...
Missing: MLLM | Show results with:MLLM
Oct 7, 2024 · LIME: LESS IS MORE FOR MLLM EVALUATION. Annoucement. [2024-10.01] We have released both the dataset and the data duration pipeline! [2024 ...
May 14, 2024 · I've been hearing a lot about changing rankings for LLMs. I don't know much about benchmarks, but I can share my experience. I mainly use LLMs ...
Missing: LIME: | Show results with:LIME:
For quickly start using LIME-M, we recommend following the lmms-eval tutorial to quickly deploy the evaluation environment. also you can install by following ...