Mar 27, 2023 · We propose a sample and computation-efficient model, named \textbf{Seer}, by inflating the pretrained text-to-image (T2I) stable diffusion models along the ...
We propose a sample and computation-efficient model, named Seer, by inflating the pretrained text-to-image (T2I) stable diffusion models along the temporal ...
People also ask
What is the difference between diffusion model and latent diffusion model?
What is semantic guidance for diffusion models?
This repository is the official PyTorch implementation for Seer introduced in the paper: Seer: Language Instructed Video Prediction with Latent Diffusion ...
Nov 21, 2023 · This paper introduces the Seer model, a Language-Instructed Video Prediction with Latent Diffusion approach, for the text-conditioned video ...
With the well-designed architecture, Seer makes it possible to generate high-fidelity, coherent, and instruction-aligned video frames by fine-tuning a few ...
For the visual model, we extend the 2D latent diffusion model (Rombach et al.,. 2022) to data and computation-efficient 3D network to model spatial dependencies ...
Seer: Language Instructed Video Prediction with Latent Diffusion Models ... It is a highly challenging Figure 1: Seer is an efficient video diffusion model that ...
Seer: Language Instructed Video Prediction with Latent Diffusion Models.
Seer: Language instructed video prediction with latent diffusion models. X Gu, C Wen, W Ye, J Song, Y Gao. arXiv preprint arXiv:2303.14897, 2023. 23, 2023 ; Any- ...
Seer: Language Instructed Video Prediction with Latent Diffusion Models Xianfan Gu, Chuan Wen, Jiaming Song, Yang Gao. ICLR 2024 PDF/Website/Arxiv.