×
Mar 27, 2023 · We propose a sample and computation-efficient model, named \textbf{Seer}, by inflating the pretrained text-to-image (T2I) stable diffusion models along the ...
We propose a sample and computation-efficient model, named Seer, by inflating the pretrained text-to-image (T2I) stable diffusion models along the temporal ...
People also ask
This repository is the official PyTorch implementation for Seer introduced in the paper: Seer: Language Instructed Video Prediction with Latent Diffusion ...
Nov 21, 2023 · This paper introduces the Seer model, a Language-Instructed Video Prediction with Latent Diffusion approach, for the text-conditioned video ...
With the well-designed architecture, Seer makes it possible to generate high-fidelity, coherent, and instruction-aligned video frames by fine-tuning a few ...
People also search for
For the visual model, we extend the 2D latent diffusion model (Rombach et al.,. 2022) to data and computation-efficient 3D network to model spatial dependencies ...
Seer: Language Instructed Video Prediction with Latent Diffusion Models ... It is a highly challenging Figure 1: Seer is an efficient video diffusion model that ...
Seer: Language Instructed Video Prediction with Latent Diffusion Models.
Seer: Language instructed video prediction with latent diffusion models. X Gu, C Wen, W Ye, J Song, Y Gao. arXiv preprint arXiv:2303.14897, 2023. 23, 2023 ; Any- ...
Seer: Language Instructed Video Prediction with Latent Diffusion Models Xianfan Gu, Chuan Wen, Jiaming Song, Yang Gao. ICLR 2024 PDF/Website/Arxiv.