A Collection of Video Generation Studies

This GitHub repository summarizes papers and resources related to the video generation task.

If you have any suggestions about this repository, please feel free to start a new issue or pull requests.

Recent news of this GitHub repo are listed as follows.

🔥 Click to see more information.

[Jun. 17th] All NeurIPS 2023 papers and references are updated.

[Apr. 26th] Update a new direction: Personalized Video Generation.

[Mar. 28th] The official AAAI 2024 paper list are released! Official version of PDFs and BibTeX references are updated accordingly.

To-Do Lists

Products

Papers

Survey Papers

Text-to-Video Generation

Year 2024

Year 2023

Year 2022

Year 2021

Image-to-Video Generation

Year 2024

Year 2023

Year 2022

Personalized Video Generation

Year 2024

Year 2023

Video Editing

Year 2023

Audio-to-Video Generation

Year 2024

Year 2023

Datasets

Q&A

References

Star History

To-Do Lists

Products

NameOrganizationYearResearch PaperWebsiteSpecialtiesSoraOpenAI2024linklink-LumiereGoogle2024linklink-VideoPoetGoogle2023-link-W.A.I.TGoogle2023linklink-Gen-2Runaway2023-link-Gen-1Runaway2023-link-Animate AnyoneAlibaba2023linklink-Outfit AnyoneAlibaba2023-link-Stable VideoStabilityAI2023linklink-PixelingHiDream.ai2023-link-DomoAIDomoAI2023-link-EmuMeta2023linklink-GenmoGenmo2023-link-NeverEndsNeverEnds2023-link-MoonvalleyMoonvalley2023-link-Morph StudioMorph2023-link-PikaPika2023-link-PixelDanceByteDance2023linklink-

Papers

Survey Papers

Year 2024

arXiv

Video Diffusion Models: A Survey [Paper]

Year 2023

arXiv

A Survey on Video Diffusion Models [Paper]

Text-to-Video Generation

Year 2024

CVPR

Vlogger: Make Your Dream A Vlog [Paper] [Code]

Make Pixels Dance: High-Dynamic Video Generation [Paper] [Project] [Demo]

VGen: Hierarchical Spatio-temporal Decoupling for Text-to-Video Generation [Paper] [Code] [Project]

GenTron: Delving Deep into Diffusion Transformers for Image and Video Generation [Paper] [Project]

SimDA: Simple Diffusion Adapter for Efficient Video Generation [Paper] [Code] [Project]

MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation [Paper] [Project] [Video]

Generative Rendering: Controllable 4D-Guided Video Generation with 2D Diffusion Models [Paper] [Project]

PEEKABOO: Interactive Video Generation via Masked-Diffusion [Paper] [Code] [Project] [Demo]

EvalCrafter: Benchmarking and Evaluating Large Video Generation Models [Paper] [Code] [Project]

A Recipe for Scaling up Text-to-Video Generation with Text-free Videos [Paper] [Code] [Project]

BIVDiff: A Training-free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion Models [Paper] [Project]

Mind the Time: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis [Paper] [Project]

Animate Anyone: Consistent and Controllable Image-to-video Synthesis for Character Animation [Paper] [Code] [Project]

MotionDirector: Motion Customization of Text-to-Video Diffusion Models [Paper] [Code]

Hierarchical Patch-wise Diffusion Models for High-Resolution Video Generation [Paper] [Project]

DiffPerformer: Iterative Learning of Consistent Latent Guidance for Diffusion-based Human Video Generation [Paper] [Code]

Grid Diffusion Models for Text-to-Video Generation [Paper] [Code] [Video]

ICLR

VDT: General-purpose Video Diffusion Transformers via Mask Modeling [Paper] [Code] [Project]

VersVideo: Leveraging Enhanced Temporal Diffusion Models for Versatile Video Generation [Paper]

AAAI

Follow Your Pose: Pose-Guided Text-to-Video Generation using Pose-Free Videos [Paper] [Code] [Project]

E2HQV: High-Quality Video Generation from Event Camera via Theory-Inspired Model-Aided Deep Learning [Paper]

ConditionVideo: Training-Free Condition-Guided Text-to-Video Generation [Paper] [Code] [Project]

F3-Pruning: A Training-Free and Generalized Pruning Strategy towards Faster and Finer Text to-Video Synthesis [Paper]

arXiv

Lumiere: A Space-Time Diffusion Model for Video Generation [Paper] [Project]

Boximator: Generating Rich and Controllable Motions for Video Synthesis [Paper] [Project] [Video]

World Model on Million-Length Video And Language With RingAttention [Paper] [Code] [Project]

Direct-a-Video: Customized Video Generation with User-Directed Camera Movement and Object Motion [Paper] [Project]

WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens [Paper] [Code] [Project]

MagicVideo-V2: Multi-Stage High-Aesthetic Video Generation [Paper] [Project]

Latte: Latent Diffusion Transformer for Video Generation [Paper] [Code] [Project]

Mora: Enabling Generalist Video Generation via A Multi-Agent Framework [Paper] [Code]

StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text [Paper] [Code] [Project] [Video]

VIDiff: Translating Videos via Multi-Modal Instructions with Diffusion Models [Paper]

StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation [Paper] [Code] [Project] [Demo]

Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model [Paper] [Code] [Project]

Others

Sora: Video Generation Models as World Simulators [Paper]

Year 2023

CVPR

Align your Latents: High-resolution Video Synthesis with Latent Diffusion Models [Paper] [Project] [Reproduced code]

Text2Video-Zero: Text-to-image Diffusion Models are Zero-shot Video Generators

近期文章

近期评论

归档

分类

近期文章

近期评论

归档

分类

发表回复取消回复

近期文章

近期评论

归档

分类

A Collection of Video Generation Studies

Contents

To-Do Lists

Products

Papers

Survey Papers

Text-to-Video Generation

发表回复 取消回复

相关文章

Wink Studio

Zubtitle

Usetwain-一款基于人工智能的沟通助手

万兴智演

发表回复取消回复