Google Scholar

TH Rafi, R Mahjabin, E Ghosh, YW Ko… - Artificial Intelligence …, 2024 - Springer

Deep neural networks (DNNs) have proven explicit contributions in making autonomous
driving cars and related tasks such as semantic segmentation, motion tracking, object …

Save Cite Cited by 1 Related articles

[PDF] ieee.org

Unveiling deception in arabic: optimization of deceptive text detection across formal and informal genres

F Alhayan, H Himdi, B Alharbi - IEEE Access, 2024 - ieeexplore.ieee.org

In recent years, social media has significantly influenced how we share information and
exchange messages. However, a significant issue arises from the fast dissemination of …

Save Cite Cited by 1 Related articles All 2 versions

[PDF] thecvf.com

What Makes Multimodal In-Context Learning Work?

FB Baldassini, M Shukor, M Cord… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract Large Language Models have demonstrated remarkable performance across
various tasks exhibiting the capacity to swiftly acquire new skills such as through In-Context …

Save Cite Cited by 8 Related articles All 2 versions View as HTML

[PDF] thecvf.com

Rethinking the evaluation protocol of domain generalization

H Yu, X Zhang, R Xu, J Liu, Y He… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract Domain generalization aims to solve the challenge of Out-of-Distribution (OOD)
generalization by leveraging common knowledge learned from multiple training domains to …

Save Cite Cited by 4 Related articles All 5 versions View as HTML

[PDF] arxiv.org

Auto-Encoding Morph-Tokens for Multimodal LLM

K Pan, S Tang, J Li, Z Fan, W Chow, S Yan… - arXiv preprint arXiv …, 2024 - arxiv.org

For multimodal LLMs, the synergy of visual comprehension (textual output) and generation
(visual output) presents an ongoing challenge. This is due to a conflicting objective: for …

Save Cite Cited by 9 Related articles All 3 versions View as HTML

[PDF] arxiv.org

Many-Shot In-Context Learning in Multimodal Foundation Models

Y Jiang, J Irvin, JH Wang, MA Chaudhry… - arXiv preprint arXiv …, 2024 - arxiv.org

Large language models are well-known to be effective at few-shot in-context learning (ICL).
Recent advancements in multimodal foundation models have enabled unprecedentedly …

Save Cite Cited by 13 Related articles All 3 versions View as HTML

[PDF] openreview.net

Counterfactually Augmented Event Matching for De-biased Temporal Sentence Grounding

X Jiang, Z Wei, S Li, X Xu, J Song… - Proceedings of the 32nd …, 2024 - dl.acm.org

Temporal Sentence Grounding (TSG), which aims to localize events in untrimmed videos
with a given language query, has been widely studied in the last decades. However …

[PDF] arxiv.org

TopViewRS: Vision-Language Models as Top-View Spatial Reasoners

C Li, C Zhang, H Zhou, N Collier, A Korhonen… - arXiv preprint arXiv …, 2024 - arxiv.org

Top-view perspective denotes a typical way in which humans read and reason over different
types of maps, and it is vital for localization and navigation of humans as well as ofnon …

Save Cite Cited by 4 Related articles All 2 versions View as HTML

[PDF] arxiv.org

In-context learning in presence of spurious correlations

H Harutyunyan, R Darbinyan, S Karapetyan… - arXiv preprint arXiv …, 2024 - arxiv.org

Large language models exhibit a remarkable capacity for in-context learning, where they
learn to solve tasks given a few examples. Recent work has shown that transformers can be …

Save Cite Related articles View as HTML

[PDF] arxiv.org

A Picture is Worth A Thousand Numbers: Enabling LLMs Reason about Time Series via Visualization

H Liu, C Liu, BA Prakash - arXiv preprint arXiv:2411.06018, 2024 - arxiv.org

Large language models (LLMs), with demonstrated reasoning abilities across multiple
domains, are largely underexplored for time-series reasoning (TsR), which is ubiquitous in …

Create alert

Cite

Advanced search

Saved to My library

On the out-of-distribution generalization of multimodal large language models

Domain generalization for semantic segmentation: a survey

Unveiling deception in arabic: optimization of deceptive text detection across formal and informal genres

What Makes Multimodal In-Context Learning Work?

Rethinking the evaluation protocol of domain generalization

Auto-Encoding Morph-Tokens for Multimodal LLM

Many-Shot In-Context Learning in Multimodal Foundation Models

Counterfactually Augmented Event Matching for De-biased Temporal Sentence Grounding

TopViewRS: Vision-Language Models as Top-View Spatial Reasoners

In-context learning in presence of spurious correlations

A Picture is Worth A Thousand Numbers: Enabling LLMs Reason about Time Series via Visualization