A Collection of Video Generation Studies
This GitHub repository summarizes papers and resources related to the video generation task.
If you have any suggestions about this repository, please feel free to start a new issue or pull requests.
Recent news of this GitHub repo are listed as follows.
🔥 Click to see more information.
[Jun. 17th] All NeurIPS 2023 papers and references are updated.
[Apr. 26th] Update a new direction: Personalized Video Generation.
[Mar. 28th] The official AAAI 2024 paper list are released! Official version of PDFs and BibTeX references are updated accordingly.
Contents
To-Do Lists
Products
Papers
Survey Papers
Text-to-Video Generation
Year 2024
Year 2023
Year 2022
Year 2021
Image-to-Video Generation
Year 2024
Year 2023
Year 2022
Personalized Video Generation
Year 2024
Year 2023
Video Editing
Year 2023
Audio-to-Video Generation
Year 2024
Year 2023
Datasets
Q&A
References
Star History
To-Do Lists
Latest Papers
Update ECCV 2024 Papers
Update CVPR 2024 Papers
Update PDFs and References of ⚠️ Papers
Update Published Versions of References
Update AAAI 2024 Papers
Update PDFs and References of ⚠️ Papers
Update Published Versions of References
Update ICLR 2024 Papers
Update NeurIPS 2023 Papers
Previously Published Papers
Update Previous CVPR papers
Update Previous ICCV papers
Update Previous ECCV papers
Update Previous NeurIPS papers
Update Previous ICLR papers
Update Previous AAAI papers
Update Previous ACM MM papers
Regular Maintenance of Preprint arXiv Papers and Missed Papers
Products
NameOrganizationYearResearch PaperWebsiteSpecialtiesSoraOpenAI2024linklink-LumiereGoogle2024linklink-VideoPoetGoogle2023-link-W.A.I.TGoogle2023linklink-Gen-2Runaway2023-link-Gen-1Runaway2023-link-Animate AnyoneAlibaba2023linklink-Outfit AnyoneAlibaba2023-link-Stable VideoStabilityAI2023linklink-PixelingHiDream.ai2023-link-DomoAIDomoAI2023-link-EmuMeta2023linklink-GenmoGenmo2023-link-NeverEndsNeverEnds2023-link-MoonvalleyMoonvalley2023-link-Morph StudioMorph2023-link-PikaPika2023-link-PixelDanceByteDance2023linklink-
Papers
Survey Papers
Year 2024
arXiv
Video Diffusion Models: A Survey [Paper]
Year 2023
arXiv
A Survey on Video Diffusion Models [Paper]
Text-to-Video Generation
Year 2024
CVPR
Vlogger: Make Your Dream A Vlog [Paper] [Code]
Make Pixels Dance: High-Dynamic Video Generation [Paper] [Project] [Demo]
VGen: Hierarchical Spatio-temporal Decoupling for Text-to-Video Generation [Paper] [Code] [Project]
GenTron: Delving Deep into Diffusion Transformers for Image and Video Generation [Paper] [Project]
SimDA: Simple Diffusion Adapter for Efficient Video Generation [Paper] [Code] [Project]
MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation [Paper] [Project] [Video]
Generative Rendering: Controllable 4D-Guided Video Generation with 2D Diffusion Models [Paper] [Project]
PEEKABOO: Interactive Video Generation via Masked-Diffusion [Paper] [Code] [Project] [Demo]
EvalCrafter: Benchmarking and Evaluating Large Video Generation Models [Paper] [Code] [Project]
A Recipe for Scaling up Text-to-Video Generation with Text-free Videos [Paper] [Code] [Project]
BIVDiff: A Training-free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion Models [Paper] [Project]
Mind the Time: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis [Paper] [Project]
Animate Anyone: Consistent and Controllable Image-to-video Synthesis for Character Animation [Paper] [Code] [Project]
MotionDirector: Motion Customization of Text-to-Video Diffusion Models [Paper] [Code]
Hierarchical Patch-wise Diffusion Models for High-Resolution Video Generation [Paper] [Project]
DiffPerformer: Iterative Learning of Consistent Latent Guidance for Diffusion-based Human Video Generation [Paper] [Code]
Grid Diffusion Models for Text-to-Video Generation [Paper] [Code] [Video]
ICLR
VDT: General-purpose Video Diffusion Transformers via Mask Modeling [Paper] [Code] [Project]
VersVideo: Leveraging Enhanced Temporal Diffusion Models for Versatile Video Generation [Paper]
AAAI
Follow Your Pose: Pose-Guided Text-to-Video Generation using Pose-Free Videos [Paper] [Code] [Project]
E2HQV: High-Quality Video Generation from Event Camera via Theory-Inspired Model-Aided Deep Learning [Paper]
ConditionVideo: Training-Free Condition-Guided Text-to-Video Generation [Paper] [Code] [Project]
F3-Pruning: A Training-Free and Generalized Pruning Strategy towards Faster and Finer Text to-Video Synthesis [Paper]
arXiv
Lumiere: A Space-Time Diffusion Model for Video Generation [Paper] [Project]
Boximator: Generating Rich and Controllable Motions for Video Synthesis [Paper] [Project] [Video]
World Model on Million-Length Video And Language With RingAttention [Paper] [Code] [Project]
Direct-a-Video: Customized Video Generation with User-Directed Camera Movement and Object Motion [Paper] [Project]
WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens [Paper] [Code] [Project]
MagicVideo-V2: Multi-Stage High-Aesthetic Video Generation [Paper] [Project]
Latte: Latent Diffusion Transformer for Video Generation [Paper] [Code] [Project]
Mora: Enabling Generalist Video Generation via A Multi-Agent Framework [Paper] [Code]
StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text [Paper] [Code] [Project] [Video]
VIDiff: Translating Videos via Multi-Modal Instructions with Diffusion Models [Paper]
StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation [Paper] [Code] [Project] [Demo]
Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model [Paper] [Code] [Project]
Others
Sora: Video Generation Models as World Simulators [Paper]
Year 2023
CVPR
Align your Latents: High-resolution Video Synthesis with Latent Diffusion Models [Paper] [Project] [Reproduced code]
Text2Video-Zero: Text-to-image Diffusion Models are Zero-shot Video Generators