Maria

saying “video2video” is much simpler than “text2video”, I also want to add basic editing and semantic alignment is also simpler than this.

similar models, since video generating models are usually multimodal

maria, A Visual Experience Powered Conversational Agent, suggested by incident

OFA Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

GEN-2 by runway research with paper

cogvideo able to process chinese and english input

make a video in pytorch text to video generation

nuwa text to video generation

vidshow Simple CLI to generate slideshow video with native FFMPEG

Twitch-Best-Of create best-of videos on twitch without token

Ningyov galgame effects