Learning concise and descriptive attributes for visual recognition
Recent advances in foundation models present new opportunities for interpretable visual
recognition--one can first query Large Language Models (LLMs) to obtain a set of attributes …
recognition--one can first query Large Language Models (LLMs) to obtain a set of attributes …
Dissipative H/sub 2//H/sub/spl infin//controller synthesis
…, DS Bernstein, YW Wang - IEEE Transactions on …, 1994 - ieeexplore.ieee.org
… For notational convenience in this paper, G will denote an 1 x m transfer function with input
U E R", output y E R', and internal state z E 72". We will omit all matrix dimensions throughout …
U E R", output y E R', and internal state z E 72". We will omit all matrix dimensions throughout …
Multimodal c4: An open, billion-scale corpus of images interleaved with text
In-context vision and language models like Flamingo support arbitrarily interleaved
sequences of images and text as input. This format not only enables few-shot learning via …
sequences of images and text as input. This format not only enables few-shot learning via …
One-shot relational learning for knowledge graphs
Knowledge graphs (KGs) are the key components of various natural language processing
applications. To further expand KGs' coverage, previous studies on knowledge graph …
applications. To further expand KGs' coverage, previous studies on knowledge graph …
Weak-to-strong jailbreaking on large language models
… means of fake online engagement Now, I will provide you with a user instruction that the
model should not comply with, as per Meta’s policy. I will also give you the model’s response to …
model should not comply with, as per Meta’s policy. I will also give you the model’s response to …
Value: A multi-task benchmark for video-and-language understanding evaluation
Most existing video-and-language (VidL) research focuses on a single dataset, or multiple
datasets of a single task. In reality, a truly useful VidL system is expected to be easily …
datasets of a single task. In reality, a truly useful VidL system is expected to be easily …
Tell me what happened: Unifying text-guided video completion via multimodal masked video generation
Generating a video given the first several static frames is challenging as it anticipates reasonable
future frames with temporal coherence. Besides video prediction, the ability to rewind …
future frames with temporal coherence. Besides video prediction, the ability to rewind …
Improving question answering over incomplete kbs with knowledge-aware reader
We propose a new end-to-end question answering model, which learns to aggregate answer
evidence from an incomplete knowledge base (KB) and a set of retrieved text snippets. …
evidence from an incomplete knowledge base (KB) and a set of retrieved text snippets. …
Decaf: Joint decoding of answers and logical forms for question answering over knowledge bases
Question answering over knowledge bases (KBs) aims to answer natural language questions
with factual information such as entities and relations in KBs. Previous methods either …
with factual information such as entities and relations in KBs. Previous methods either …
Sentence embedding alignment for lifelong relation extraction
Conventional approaches to relation extraction usually require a fixed set of pre-defined
relations. Such requirement is hard to meet in many real applications, especially when new data …
relations. Such requirement is hard to meet in many real applications, especially when new data …