2024-03-10

Ai Assisted Content Creation, Gameplay Video Recording, Trending Topics

code to video

https://github.com/redotvideo/revideo

fishaudio voice cloning

omniparse data serialization

Video understanding and video embedding can be achieved with ViViT (in huggingface).

Video generation agent tutorial

MoneyPrinterTurbo

Mini Gemini

Use enhancr for frame interpolation, super resolution and scaling. The pro version contains faster models.

The app is built using electron forge.

Interpolation gets worse with higher resolution, that’s why I wouldn’t upscale first.

enhancr is built upon the following models:

Interpolation

RIFE (NCNN) - megvii-research/ECCV2022-RIFE - powered by styler00dollar/VapourSynth-RIFE-NCNN-Vulkan

RIFE (TensorRT) - megvii-research/ECCV2022-RIFE - powered by AmusementClub/vs-mlrt & styler00dollar/VSGAN-tensorrt-docker

GMFSS - Union (PyTorch/TensorRT) - 98mxr/GMFSS_Union - powered by HolyWu/vs-gmfss_union

GMFSS - Fortuna (PyTorch/TensorRT) - 98mxr/GMFSS_Fortuna - powered by HolyWu/vs-gmfss_fortuna

CAIN (NCNN) - myungsub/CAIN - powered by mafiosnik/vsynth-cain-NCNN-vulkan (unreleased)

CAIN (DirectML) - myungsub/CAIN - powered by AmusementClub/vs-mlrt

CAIN (TensorRT) - myungsub/CAIN - powered by HubertSotnowski/cain-TensorRT

Upscaling

ShuffleCUGAN (NCNN) - styler00dollar/VSGAN-tensorrt-docker - powered by AmusementClub/vs-mlrt

ShuffleCUGAN (TensorRT) - styler00dollar/VSGAN-tensorrt-docker - powered by AmusementClub/vs-mlrt

RealESRGAN (NCNN) - xinntao/Real-ESRGAN - powered by AmusementClub/vs-mlrt

RealESRGAN (DirectML) - xinntao/Real-ESRGAN - powered by AmusementClub/vs-mlrt

RealESRGAN (TensorRT) - xinntao/Real-ESRGAN - powered by AmusementClub/vs-mlrt

RealCUGAN (TensorRT) - bilibili/ailab/Real-CUGAN - powered by AmusementClub/vs-mlrt

SwinIR (TensorRT) - JingyunLiang/SwinIR - powered by mafiosnik777/SwinIR-TensorRT (unreleased)

Restoration

DPIR (DirectML) - cszn/DPIR - powered by AmusementClub/vs-mlrt

DPIR (TensorRT) - cszn/DPIR - powered by AmusementClub/vs-mlrt

SCUNet (TensorRT) - cszn/SCUNet - powered by mafiosnik777/SCUNet-TensorRT (unreleased)

Kdenlive has many video editing features, like automatic scene split, video stabilzation.

To extract existing hard-coded subtitles in videos, use videosubfinder, which is used in Cradle, an Red Dead Redemption II agent.

To check if audio is recorded, we can view amplitude instead of hearing.

1
2

ffprobe -f lavfi -i "amovie=<audio_or_video_filepath>,astats=metadata=1:reset=1" -show_entries frame=pkt_pts_time:frame_tags=lavfi.astats.Overall.RMS_level -of default=noprint_wrappers=1:nokey=1 -sexagesimal -v error

AI toolbox: a comprehensive content creation toolbox with links to related projects

Use streamlit to write interactive interfaces for video labeling, editing and registration, tracking viewer counts.

Grided image can be used for image selection prompting and image condensation, putting multiple images together to save processing power during tasks like video rating.

When you play video games on low end devices, you can tune down the resolution and image quality, to ensure 30 FPS.

If you change screen resolution during screen recording, you might lose your view.

Train a video grading system with recent and relevant video grades, and when evaluating put grading context into the prompt, thus generalize the system.

Get system predicted labels of video content to train a label predictor out of it, providing necessary context of test video for improving the grading system accuracy.

Taskmatrix is a multimodal agent framework suitable for multiple types of image editing, using diffusion models.

You can learn what the viewers are craving about via recommendation engines, dynamic posts and latest bangumi releases.

Post the same content across multiple platforms to increase view counts.

2022-12-08

Make-A-Video And Its Related Text To Video Projects

saying “video2video” is much simpler than “text2video”, I also want to add basic editing and semantic alignment is also simpler than this.

similar models, since video generating models are usually multimodal

maria, A Visual Experience Powered Conversational Agent, suggested by incident

OFA Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

GEN-2 by runway research with paper

according to its paper, it’s been compared to a range of models

cogvideo able to process chinese and english input

make a video in pytorch text to video generation

make a video in tensorflow

nuwa text to video generation

mocogan

mocogan-hd

tgan-pytorch

there are also some projects being a video generator but not so much deeplearning involved

redditube

Automatic-Youtube-Reddit-Text-To-Speech-Video-Generator-and-Uploader

tools for slideshow, video effects, presentations

phenomenon

vidshow Simple CLI to generate slideshow video with native FFMPEG

Twitch-Best-Of create best-of videos on twitch without token

Ningyov galgame effects

2022-11-09

Generate Noise Image, Noise Video, Noise Audio With Ffmpeg For Test

simulating tv noise

ffmpeg -f lavfi -i nullsrc=s=1280x720 -filter_complex \
"geq=random(1)*255:128:128;aevalsrc=-2+random(0)" \
-t 5 output.mkv

ffmpeg -f rawvideo -video_size 1280x720 -pixel_format yuv420p -framerate 25 \
-i /dev/urandom -ar 48000 -ac 2 -f s16le -i /dev/urandom -codec:a copy \
-t 5 output.mkv

2022-10-09

Video Generation/Modification (Vfx) From Text

Sora is the new SOTA video generation model from OpenAI.

Following up projects:

达摩院放出了文本生成视频模型，支持英文输入

huggingface space

model weights:

weight path	weight size	model name	author
text-to-video-ms-1.7b	unknown	unknown	damo-vilab
modelscope-damo-text-to-video-synthesis	unknown	unknown	damo-vilab
text-to-video-ms-1.7b-legacy	unknown	unknown	damo-vilab

can also use from modelscope:

from modelscope.pipelines import pipeline
from modelscope.utils.constant import Tasks
p = pipeline('text-to-video-synthesis', 'damo/text-to-video-synthesis')

PAIR now releases Text2Video-Zero which leverages existing stable diffusion models to generate video. also released a bunch of controlnet dreambooth weights.

lucidrains is a workaholic on transformer implementations. we should scrape all the repos and index them. there are faster language models to train.

Phenaki Video, which uses Mask GIT to produce text guided videos of up to 2 minutes in length, in Pytorch

dreamix (not open-source)

instruct-pix2pix requires 16GB+ VRAM

text2live modify video by text prompt (such as add fire in mouth)

recurrent-interface-network-pytorch using diffusion to generate images and video

high quality! imagegen-video code with demo and paper

抄视频视频的时间要讲究看看是抄一年前的好还是抄刚刚发布的好

在发布的一个视频当中最多抄某个作者的两三个符合要求的片段

use editly smooth/slick transitions and subtitles to beat the copy-detection algorithm, also consider color change in ffmpeg

动态专栏也可以抄

make-a-video

谷歌AI歌手震撼来袭！AudioLM简单听几秒，便能谱曲写歌 https://www.kuxai.com/article/398

Text to Video/Music to video generator GAN

https://www.youtube.com/watch?v=V8MlYa_yhF0

https://netease-gameai.github.io/ChoreoMaster/Paper.pdf

该系统可依据音乐风格生成爵士、二次元、街舞等不同类型的舞蹈动画。给定一段音乐，舞蹈演员可以自动生成高质量的舞蹈动作序列以伴随输入音乐的风格、节奏和结构。为了实现这一目标，我们引入了一种新的面向编舞的编舞音乐嵌入框架，它成功地构建了一个统一的舞蹈音乐嵌入空间音乐和舞蹈短语之间的风格和节奏关系。

https://www.youtube.com/watch?v=VrVsAcgFK_4

该方法提出了一个基于cross-modal transformer的架构模型和一个新的3D舞蹈数据集，该数据集包含了根据真实舞者重建的3D运动

项目地址: https://google.github.io/aichoreographer

数据集地址: https://google.github.io/aistplusplus_dataset/

欢迎点赞、评论、分享、收藏！

video generation using music based on bigGAN:

https://github.com/Remideza/MichelAI/

bigGAN Large Scale GAN Training for High Fidelity Natural Image Synthesis:

https://github.com/ajbrock/BigGAN-PyTorch

dance video generation self-supervised:

https://github.com/xrenaa/Music-Dance-Video-Synthesis

show me what and tell me how based on openai clip by snap research with pretrained models, able to generate arbitrary video based on text description:

https://github.com/snap-research/MMVID

text to video generator based on vqgan and clip with primitive colab notebooks by kapwing the online video editor:

https://www.kapwing.com/ai-video-generator