code to video
https://github.com/redotvideo/revideo
fishaudio voice cloning
omniparse data serialization
Video understanding and video embedding can be achieved with ViViT (in huggingface).
Video generation agent tutorial
Use enhancr for frame interpolation, super resolution and scaling. The pro version contains faster models.
The app is built using electron forge.
Interpolation gets worse with higher resolution, that’s why I wouldn’t upscale first.
enhancr is built upon the following models:
Interpolation
RIFE (NCNN) - megvii-research/ECCV2022-RIFE - powered by styler00dollar/VapourSynth-RIFE-NCNN-Vulkan
RIFE (TensorRT) - megvii-research/ECCV2022-RIFE - powered by AmusementClub/vs-mlrt & styler00dollar/VSGAN-tensorrt-docker
GMFSS - Union (PyTorch/TensorRT) - 98mxr/GMFSS_Union - powered by HolyWu/vs-gmfss_union
GMFSS - Fortuna (PyTorch/TensorRT) - 98mxr/GMFSS_Fortuna - powered by HolyWu/vs-gmfss_fortuna
CAIN (NCNN) - myungsub/CAIN - powered by mafiosnik/vsynth-cain-NCNN-vulkan (unreleased)
CAIN (DirectML) - myungsub/CAIN - powered by AmusementClub/vs-mlrt
CAIN (TensorRT) - myungsub/CAIN - powered by HubertSotnowski/cain-TensorRT
Upscaling
ShuffleCUGAN (NCNN) - styler00dollar/VSGAN-tensorrt-docker - powered by AmusementClub/vs-mlrt
ShuffleCUGAN (TensorRT) - styler00dollar/VSGAN-tensorrt-docker - powered by AmusementClub/vs-mlrt
RealESRGAN (NCNN) - xinntao/Real-ESRGAN - powered by AmusementClub/vs-mlrt
RealESRGAN (DirectML) - xinntao/Real-ESRGAN - powered by AmusementClub/vs-mlrt
RealESRGAN (TensorRT) - xinntao/Real-ESRGAN - powered by AmusementClub/vs-mlrt
RealCUGAN (TensorRT) - bilibili/ailab/Real-CUGAN - powered by AmusementClub/vs-mlrt
SwinIR (TensorRT) - JingyunLiang/SwinIR - powered by mafiosnik777/SwinIR-TensorRT (unreleased)
Restoration
DPIR (DirectML) - cszn/DPIR - powered by AmusementClub/vs-mlrt
DPIR (TensorRT) - cszn/DPIR - powered by AmusementClub/vs-mlrt
SCUNet (TensorRT) - cszn/SCUNet - powered by mafiosnik777/SCUNet-TensorRT (unreleased)
Kdenlive has many video editing features, like automatic scene split, video stabilzation.
To extract existing hard-coded subtitles in videos, use videosubfinder, which is used in Cradle, an Red Dead Redemption II agent.
To check if audio is recorded, we can view amplitude instead of hearing.
1 | ffprobe -f lavfi -i "amovie=<audio_or_video_filepath>,astats=metadata=1:reset=1" -show_entries frame=pkt_pts_time:frame_tags=lavfi.astats.Overall.RMS_level -of default=noprint_wrappers=1:nokey=1 -sexagesimal -v error |
AI toolbox: a comprehensive content creation toolbox with links to related projects
Use streamlit
to write interactive interfaces for video labeling, editing and registration, tracking viewer counts.
Grided image can be used for image selection prompting and image condensation, putting multiple images together to save processing power during tasks like video rating.
When you play video games on low end devices, you can tune down the resolution and image quality, to ensure 30 FPS.
If you change screen resolution during screen recording, you might lose your view.
Train a video grading system with recent and relevant video grades, and when evaluating put grading context into the prompt, thus generalize the system.
Get system predicted labels of video content to train a label predictor out of it, providing necessary context of test video for improving the grading system accuracy.
Taskmatrix is a multimodal agent framework suitable for multiple types of image editing, using diffusion models.
You can learn what the viewers are craving about via recommendation engines, dynamic posts and latest bangumi releases.
Post the same content across multiple platforms to increase view counts.