Learning concise and descriptive attributes for visual recognition

A Yan, Y Wang, Y Zhong, C Dong… - Proceedings of the …, 2023 - openaccess.thecvf.com
Recent advances in foundation models present new opportunities for interpretable visual
recognition--one can first query Large Language Models (LLMs) to obtain a set of attributes …

Dissipative H/sub 2//H/sub/spl infin//controller synthesis

…, DS Bernstein, YW Wang - IEEE Transactions on …, 1994 - ieeexplore.ieee.org
… For notational convenience in this paper, G will denote an 1 x m transfer function with input
U E R", output y E R', and internal state z E 72". We will omit all matrix dimensions throughout …

Multimodal c4: An open, billion-scale corpus of images interleaved with text

…, A Fang, Y Yu, L Schmidt, WY Wang… - Advances in …, 2024 - proceedings.neurips.cc
In-context vision and language models like Flamingo support arbitrarily interleaved
sequences of images and text as input. This format not only enables few-shot learning via …

One-shot relational learning for knowledge graphs

W Xiong, M Yu, S Chang, X Guo, WY Wang - arXiv preprint arXiv …, 2018 - arxiv.org
Knowledge graphs (KGs) are the key components of various natural language processing
applications. To further expand KGs' coverage, previous studies on knowledge graph …

Weak-to-strong jailbreaking on large language models

…, T Pang, C Du, L Li, YX Wang, WY Wang - arXiv preprint arXiv …, 2024 - arxiv.org
… means of fake online engagement Now, I will provide you with a user instruction that the
model should not comply with, as per Meta’s policy. I will also give you the model’s response to …

Value: A multi-task benchmark for video-and-language understanding evaluation

…, R Pillai, Y Cheng, L Zhou, XE Wang, WY Wang… - arXiv preprint arXiv …, 2021 - arxiv.org
Most existing video-and-language (VidL) research focuses on a single dataset, or multiple
datasets of a single task. In reality, a truly useful VidL system is expected to be easily …

Tell me what happened: Unifying text-guided video completion via multimodal masked video generation

…, N Zhang, CY Fu, JC Su, WY Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Generating a video given the first several static frames is challenging as it anticipates reasonable
future frames with temporal coherence. Besides video prediction, the ability to rewind …

Improving question answering over incomplete kbs with knowledge-aware reader

W Xiong, M Yu, S Chang, X Guo, WY Wang - arXiv preprint arXiv …, 2019 - arxiv.org
We propose a new end-to-end question answering model, which learns to aggregate answer
evidence from an incomplete knowledge base (KB) and a set of retrieved text snippets. …

Decaf: Joint decoding of answers and logical forms for question answering over knowledge bases

…, H Zhu, AH Li, J Wang, Y Hu, W Wang, Z Wang… - arXiv preprint arXiv …, 2022 - arxiv.org
Question answering over knowledge bases (KBs) aims to answer natural language questions
with factual information such as entities and relations in KBs. Previous methods either …

Sentence embedding alignment for lifelong relation extraction

H Wang, W Xiong, M Yu, X Guo, S Chang… - arXiv preprint arXiv …, 2019 - arxiv.org
Conventional approaches to relation extraction usually require a fixed set of pre-defined
relations. Such requirement is hard to meet in many real applications, especially when new data …