Mar 3, 2024 · Our work introduces InfiMM-HD, a novel architecture specifically designed for processing images of different resolutions with low computational overhead.
InfiMM, inspired by the Flamingo architecture, sets itself apart with unique training data and diverse large language models (LLMs).
Sep 23, 2024 · Infimm-eval: Complex open-ended reasoning evaluation for multi-modal large language models. Language is not all you need: Aligning ...
Sep 19, 2024 · We introduce InfiMM-WebMath-40B, a high-quality dataset of interleaved image-text documents. It comprises 24 million web pages, 85 million associated image ...
Aug 22, 2023 · InfiMM, inspired by the Flamingo architecture, sets itself apart with unique training data and diverse large language models (LLMs).
Co-authors ; InfiMM: Advancing Multimodal Understanding with an Open-Sourced Visual Language Model. H Liu, Q You, Y Wang, X Han, B Zhai, Y Liu, W Chen, Y Jian, Y ...
The introduction of a large-scale, open-source multimodal dataset specifically for mathematical reasoning is a notable advancement. · It addresses a critical ...
InfiMM: Advancing Multimodal Understanding with an Open-Sourced Visual Language Model. H Liu, Q You, Y Wang, X Han, B Zhai, Y Liu, W Chen, Y Jian, Y Tao ...