×
InfiMM: Advancing Multimodal Understanding with an Open-Sourced Visual Language Model. from aclanthology.org
In this work, we present InfiMM, an advanced Multimodal Large Language Model that adapts to intricate vision-language tasks.
Mar 3, 2024 · Our work introduces InfiMM-HD, a novel architecture specifically designed for processing images of different resolutions with low computational overhead.
InfiMM, inspired by the Flamingo architecture, sets itself apart with unique training data and diverse large language models (LLMs).
Sep 23, 2024 · Infimm-eval: Complex open-ended reasoning evaluation for multi-modal large language models. Language is not all you need: Aligning ...
Sep 19, 2024 · We introduce InfiMM-WebMath-40B, a high-quality dataset of interleaved image-text documents. It comprises 24 million web pages, 85 million associated image ...
Aug 22, 2023 · InfiMM, inspired by the Flamingo architecture, sets itself apart with unique training data and diverse large language models (LLMs).
Co-authors ; InfiMM: Advancing Multimodal Understanding with an Open-Sourced Visual Language Model. H Liu, Q You, Y Wang, X Han, B Zhai, Y Liu, W Chen, Y Jian, Y ...
The introduction of a large-scale, open-source multimodal dataset specifically for mathematical reasoning is a notable advancement. · It addresses a critical ...
InfiMM: Advancing Multimodal Understanding with an Open-Sourced Visual Language Model. H Liu, Q You, Y Wang, X Han, B Zhai, Y Liu, W Chen, Y Jian, Y Tao ...
InfiMM: Advancing Multimodal Understanding with an Open-Sourced Visual Language Model. from github.com
Advancing Multimodal Large Language Models in Chart Question Answering with Visualization-Referenced Instruction Tuning.