InfiMM: Advancing Multimodal Understanding with an Open-Sourced Visual Language Model.

AllImages Books News Maps Videos Shopping

Advancing Multimodal Understanding with an Open-Sourced Visual ...

aclanthology.org › 2024.findings-acl.27

In this work, we present InfiMM, an advanced Multimodal Large Language Model that adapts to intricate vision-language tasks.

InfiMM-HD: A Leap Forward in High-Resolution Multimodal ... - arXiv

arxiv.org › cs

Mar 3, 2024 · Our work introduces InfiMM-HD, a novel architecture specifically designed for processing images of different resolutions with low computational overhead.

Infi-MM (InfiMM) - Hugging Face

huggingface.co › Infi-MM

InfiMM, inspired by the Flamingo architecture, sets itself apart with unique training data and diverse large language models (LLMs).

Advancing Multimodal Understanding with an Open-Sourced Visual ...

www.researchgate.net › publication › 38...

Sep 23, 2024 · Infimm-eval: Complex open-ended reasoning evaluation for multi-modal large language models. Language is not all you need: Aligning ...

InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for ... - arXiv

arxiv.org › html

Sep 19, 2024 · We introduce InfiMM-WebMath-40B, a high-quality dataset of interleaved image-text documents. It comprises 24 million web pages, 85 million associated image ...

Infi-MM/infimm-vicuna13b - Hugging Face

huggingface.co › Infi-MM › infimm-vic...

Aug 22, 2023 · InfiMM, inspired by the Flamingo architecture, sets itself apart with unique training data and diverse large language models (LLMs).

‪Haogeng Liu‬ - ‪Google Scholar‬

scholar.google.com › citations

Co-authors ; InfiMM: Advancing Multimodal Understanding with an Open-Sourced Visual Language Model. H Liu, Q You, Y Wang, X Han, B Zhai, Y Liu, W Chen, Y Jian, Y ...

InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for ...

openreview.net › forum

The introduction of a large-scale, open-source multimodal dataset specifically for mathematical reasoning is a notable advancement. · It addresses a critical ...

‪Bohan Zhai‬ - ‪Google Scholar‬

scholar.google.com › citations

InfiMM: Advancing Multimodal Understanding with an Open-Sourced Visual Language Model. H Liu, Q You, Y Wang, X Han, B Zhai, Y Liu, W Chen, Y Jian, Y Tao ...

Awesome-Multimodal-LLM-for-Math ... - GitHub

github.com › InfiMM › Awesome-Multi...

InfiMM: Advancing Multimodal Understanding with an Open-Sourced Visual Language Model. from github.com

Advancing Multimodal Large Language Models in Chart Question Answering with Visualization-Referenced Instruction Tuning.