Domain generalization for semantic segmentation: a survey

TH Rafi, R Mahjabin, E Ghosh, YW Ko… - Artificial Intelligence …, 2024 - Springer
Deep neural networks (DNNs) have proven explicit contributions in making autonomous
driving cars and related tasks such as semantic segmentation, motion tracking, object …

Unveiling deception in arabic: optimization of deceptive text detection across formal and informal genres

F Alhayan, H Himdi, B Alharbi - IEEE Access, 2024 - ieeexplore.ieee.org
In recent years, social media has significantly influenced how we share information and
exchange messages. However, a significant issue arises from the fast dissemination of …

What Makes Multimodal In-Context Learning Work?

FB Baldassini, M Shukor, M Cord… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract Large Language Models have demonstrated remarkable performance across
various tasks exhibiting the capacity to swiftly acquire new skills such as through In-Context …

Rethinking the evaluation protocol of domain generalization

H Yu, X Zhang, R Xu, J Liu, Y He… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract Domain generalization aims to solve the challenge of Out-of-Distribution (OOD)
generalization by leveraging common knowledge learned from multiple training domains to …

Auto-Encoding Morph-Tokens for Multimodal LLM

K Pan, S Tang, J Li, Z Fan, W Chow, S Yan… - arXiv preprint arXiv …, 2024 - arxiv.org
For multimodal LLMs, the synergy of visual comprehension (textual output) and generation
(visual output) presents an ongoing challenge. This is due to a conflicting objective: for …

Many-Shot In-Context Learning in Multimodal Foundation Models

Y Jiang, J Irvin, JH Wang, MA Chaudhry… - arXiv preprint arXiv …, 2024 - arxiv.org
Large language models are well-known to be effective at few-shot in-context learning (ICL).
Recent advancements in multimodal foundation models have enabled unprecedentedly …

Counterfactually Augmented Event Matching for De-biased Temporal Sentence Grounding

X Jiang, Z Wei, S Li, X Xu, J Song… - Proceedings of the 32nd …, 2024 - dl.acm.org
Temporal Sentence Grounding (TSG), which aims to localize events in untrimmed videos
with a given language query, has been widely studied in the last decades. However …

TopViewRS: Vision-Language Models as Top-View Spatial Reasoners

C Li, C Zhang, H Zhou, N Collier, A Korhonen… - arXiv preprint arXiv …, 2024 - arxiv.org
Top-view perspective denotes a typical way in which humans read and reason over different
types of maps, and it is vital for localization and navigation of humans as well as ofnon …

In-context learning in presence of spurious correlations

H Harutyunyan, R Darbinyan, S Karapetyan… - arXiv preprint arXiv …, 2024 - arxiv.org
Large language models exhibit a remarkable capacity for in-context learning, where they
learn to solve tasks given a few examples. Recent work has shown that transformers can be …

A Picture is Worth A Thousand Numbers: Enabling LLMs Reason about Time Series via Visualization

H Liu, C Liu, BA Prakash - arXiv preprint arXiv:2411.06018, 2024 - arxiv.org
Large language models (LLMs), with demonstrated reasoning abilities across multiple
domains, are largely underexplored for time-series reasoning (TsR), which is ubiquitous in …