生成高质量的艺术人像视频是计算机图形学和视觉中一项重要且理想的任务。虽然已经提出了一系列基于强大的 StyleGAN 成功的人像图像卡通化模型，但这些面向图像的方法在应用于视频时存在明显的局限性，在这项工作中，我们通过引入一种新颖的 VToonify 框架来研究具有挑战性的可控高分辨率肖像视频风格迁移。具体来说，VToonify 利用StyleGAN 的中高分辨率层基于编码器提取的多尺度内容特征来渲染高质量的艺术肖像，以更好地保留帧细节。作为输入，有助于输出具有自然运动的完整面部区域。 amework 与现有的基于 StyleGAN 的图像卡通化模型兼容，以将其扩展到视频卡通化，并继承了这些模型的吸引人的特性，可灵活地控制颜色和强度。这项工作展示了基于 Toonify 和 DualStyleGAN 的 VToonify 的两个实例，用于基于集合广泛的实验结果证明了我们提出的 VToonify 框架在生成具有灵活风格控制的高质量和时间连贯的艺术肖像视频方面优于现有方法的有效性

all in one colab text to talking face generation, also consider paddlespeech example:

https://github.com/ChintanTrivedi/ask-fake-ai-karen

avaliable from paddlegan as an example used in paddlespeech, the artificial host.

lip-sync accurate wav2lip:

https://github.com/Rudrabha/Wav2Lip

lipgan generate realistic lip-sync talking head animation(fully_pythonic branch or google colab notebook):

https://github.com/Rudrabha/LipGAN

google’s lipsync implementation, using tensorflow facemesh:

https://github.com/google/lipsync

https://lipsync.withyoutube.com/

https://github.com/tensorflow/tfjs-models/tree/master/facemesh

network reverse engineering for wombo.ai:

https://github.com/the-garlic-os/wombo-reverse-engineering

matamata using vosk models, recommend to use gentle lip-sync method:

https://github.com/AI-Spawn/Auto-Lip-Sync

https://github.com/Matamata-Animator/Matamata-Core

https://github.com/Yey007/Auto-Lip-Sync

ai-based lip reading might be irrelevant to lip-sync video generation:

https://github.com/eflood23/lipsync

2022-05-13

Attractive Dynamic Plus Attractive Video

Some contents are viral to the users. Will add extra watches if combined with related video or essay.

May apply the same rule to other platforms. Must select those with largest views, or verified by trained grading models. Native language only, or we have to translate and verify/convey it into native form. Post it to QQ, other platforms in the form of pictures, links.

Anime smile detection/ segmentation

when an anime head is detected, cut it out and create dataset with labels. may augmented it with grayscale or edge detection.

segmentation using labeled data and train it on pretrained models. using anme head detection as double verification. no double heads.

ppse recognition may be applied without further training, or else.

我分析需要YOLO确定人物位置 CNN判断服装类型人物性别 ocr识别字幕音频分析识别语气性别音乐类型再用seq2seq来把所有的输出概括成我的描述

或者看看有没有文字转关键词的模型

可以的话加上人物姿态估计动漫人物的

关于光流算法：

熵就是梯度的标准差

一段范围的熵就是起始时间到末尾的熵的标准差

或者起始到末尾的梯度的标准差

踩点音乐识别搞笑视频收集

now we have audioFlux, alternative to librosa, but faster

audioowl for tempo, beat and notes identification:

https://github.com/dodiku/AudioOwl

cnn based audio segmentation toolkit allow to detect speech, music and speaker gender:

https://github.com/ina-foss/inaSpeechSegmenter

speech music detection using keras:

https://github.com/qlemaire22/speech-music-detection

awesome deep learning music:

https://github.com/ybayle/awesome-deep-learning-music

music genre classification/ Music Classification/ Music Recommendation/ Music search

https://github.com/mlachmish/MusicGenreClassification

https://github.com/kristijanbartol/Deep-Music-Tagger

https://github.com/tae-jun/resemul

https://github.com/Insiyaa/Music-Genre-Classification

music recognization service:

audioid soundhound

maybe you should consider some chinese tools? none there.

music radar recognize music:

https://github.com/keshavbhatt/music-radar

mousai using free audd api to recognize music:

https://github.com/SeaDve/Mousai

music emotion recognization:

https://github.com/SeungHeonDoh/Music_Emotion_Recognition

music tagging and recognization, using acoustic ids and community based music database:

https://github.com/metabrainz/picard

https://musicbrainz.org/doc/AcoustID

mixingbear(alike neuralmix):

https://github.com/dodiku/MixingBear

madmom

https://github.com/CPJKU/madmom

http://madmom.readthedocs.org

音乐分类综合音频分析包

pyaudioanalysis

mathematica audio slience removal segmentation:

https://zhuanlan.zhihu.com/p/43165678

music21 for music recognition:

https://zhuanlan.zhihu.com/p/35140033

music21 for midi analysis:

https://pypi.org/project/music21/

https://music21.readthedocs.io/en/latest

https://zhuanlan.zhihu.com/p/73564852

sound recognition and localization:

https://reality.ai/automotive-sound-recognition-localization/

urbansound8k dataset ( 6gb ):

https://www.kaggle.com/datasets/chrisfilo/urbansound8k

fourier transform cat meow detection:

https://github.com/EricDavidWells/MeowDetector

building sound event classifier:

https://ignitarium.com/building-an-ai-based-sound-event-classifier/

real time continuous sound event classification(usually via silence detection):

https://medium.com/@chathuranga.15/real-time-sound-event-classification-83e892cf187e

https://medium.com/@chathuranga.15/sound-event-classification-using-machine-learning-8768092beafc

cry detection:

https://www.amberou.com/cry-detection

https://github.com/umangkk5/Infant-Cry-Detection-System/blob/master/site-packages/soundfile.py

urbansound classifier:

https://github.com/awln/urban8k-audio-classifier

laugh detection:

https://github.com/ideo/LaughDetection

gun shot detection:

https://github.com/hasnainnaeem/Gunshot-Detection-in-Audio

dog bark detector:

https://github.com/t04glovern/dog-bark-detection

https://devopstar.com/2020/04/13/dog-bark-detector-machine-learning-model

https://dsp.stackexchange.com/questions/23466/detect-dog-barks

获得音乐识别api 最好是qq音乐识别国内识别引擎

不能识别就分析简介有没有BGM

踩点 bpm以前的autoup项目里有看看其他的分析软件有没有 premiere一键踩点插件可能有开源库支持

已有的踩点视频可以切出无文字的片段根据音乐结构区分高潮开始中间等部分根据音乐类型标签归类视频

搞笑视频的话有纯笑声比较好动作幅度大的不要有对话反向截图收集类似视频

2022-05-10

Video Cutting With Captioners, Video Classifiers, Audio Classifier, Audio Categorizer

you can cut based on video highlights, usually generated by counting “replay overlaps”, avaliable from youtube and bilibili, again needs supervised learning to recognize patterns and emit signals which we want

COCA using vit and palm for video captioning

audio classifier tutorial

audio tagger visualize how audio classifier works

need to identify sounds like dog bark and gun shots, sobs, laughs. Open sourced.

May use sound analyzers.

audio2midi:

https://gist.github.com/natowi/d26c7e97443ec97e8032fb7e7596f0b0

Recurrent Neural Network for generating piano MIDI-files from audio (MP3, WAV, etc.)

https://github.com/BShakhovsky/PolyphonicPianoTranscription

A python program which performs an FFT on an audio file and produces a MIDI file from the results

https://github.com/NFJones/audio-to-midi

Extract the melody from an audio file and export to MIDI

https://github.com/justinsalamon/audio_to_midi_melodia

Performs pitch detection on a polyphonic audio source and outputs to MIDI

https://github.com/corbanbrook/spectrotune

Program to detect pitch from wav files and write in time quantized MIDI

https://github.com/vaibhavnayel/Audio-to-MIDI-converter

A CNN which converts piano audio to a simplified MIDI format

https://github.com/hartmetzls/audio_to_midi

An application of vocal melody extraction.

https://github.com/bill317996/Audio-to-midi

Transcribes polyphonic piano pieces from audio (MP3, WAV, etc.) into MIDI-files

https://github.com/BShakhovsky/PianoAudioToMidi

Polyphonic pitch tracking in real time using machine learning algorithms

https://github.com/jaym910/polyphonic_track

Audio to MIDI converter

https://github.com/sbaeunker/audioToMidiConverter

Explore Transcribing Techniques to auto convert audio to midi

https://github.com/Goldspear/audio2midi

PitchToMIDI

https://github.com/KatoIppei/PitchToMIDI See releases

Piano & Drums

https://github.com/magenta/magenta/tree/master/magenta/models/onsets_frames_transcription

Tony: a tool for melody transcription

https://www.sonicvisualiser.org/tony/ https://github.com/sonic-visualiser/tony https://code.soundsoftware.ac.uk/projects/tony (https://github.com/mikulas-mrva/tony2max)

MusicTranscription

https://github.com/ClaraBing/CS229-MusicTranscription

pYIN

https://code.soundsoftware.ac.uk/projects/pyin https://github.com/ronggong/pypYIN (python)

Onsets and Frames Transcription (Piano & Drums)

https://github.com/magenta/magenta/tree/master/magenta/models/onsets_frames_transcription https://piano-scribe.glitch.me/

WaoN

https://sourceforge.net/projects/waon/

audio2midi conversion works great with prior source separation https://github.com/deezer/spleeter or others like https://github.com/rgcda/Musisep

Video Anticensor For Bilibili Tarot

paddlegan coloring images

could use p5 to do part of the job.

video:

style transfer

glitch

picture to sketch -> ai painting

grayscale -> ai coloring

dithering

chroma shift(hue)

(gradient/video) overlay

dashing/filtering, could be done in 2 frames or more

random pixel noise

text:

inverted canny edge

handwritten font

italic

pixelize, blur

boxing texts

slashing texts

rotating texts (30 degree?)

coloring texts

different font size

(randomly) censor words into letters

reshape (decrese height or width)

audio:

vocoder

style change

pitch change

Copilot/Codex alternative

use chatgpt instead, when it is free.

tsinghua (again!) introduced a similar open source model called codegeex, having better performance than incoder (by meta) and codegen with vscode plugin support, able to generate and translate code. the info is found on tuna events and you can download video/scripts for some events. trained on humaneval-x dataset for code generation. it also provides blog and podcast

Codegen

https://github.com/salesforce/CodeGen

copilot self-hosted powered by codegen (lots of vram, maybe for mac studio 128gb, however it only supports nvidia gpu):

https://github.com/moyix/fauxpilot

code autocomplete

https://github.com/shibing624/code-autocomplete

codegpt python token completion

https://huggingface.co/mrm8488/CodeGPT-small-finetuned-python-token-completion

codegpt

https://huggingface.co/microsoft/CodeGPT-small-py-adaptedGPT2

https://huggingface.co/microsoft/CodeGPT-small-py

https://github.com/microsoft/CodeXGLUE/issues/75