2022-08-08

识别视频语言

speechbrain has features of Speech Recognition, Speaker Recognition, Speech Enhancement, Speech Processing, Multi Microphone Processing, Text-to-Speech, and also supports Spoken Language Understanding, Language Modeling, Diarization, Speech Translation, Language Identification, Voice Activity Detection, Sound classification, Grapheme-to-Phoneme, and many others.

概述

视频里面的语言分为图片上面打出来的字幕以及人说的话

涉及到的问题分别为：图片文字的语言分类以及音频语言分类

音频识别

online speech recognition

pip install SpeechRecognition

offline, need to provide language id:

https://pypi.org/project/automatic-speech-recognition/

use paddlespeech if possible, for chinese and english

图片语言识别

use google cloud to detect language type in image:

https://github.com/deduced/ml-ocr-lang-detection

Detects and Recognizes text and font language in an image

https://github.com/JAIJANYANI/Language-Detection-in-Image

图片语言文字分类可以用easyocr实现加载多个模型比如中文加英文加日语 b站其他语言的可能也不怎么受欢迎最多再加韩语

可以从视频简介标题链接里面提取出句子每个句子进行语言分类确定要使用的OCR模型也有可能出现描述语言和视频图片文字语言不一致的情况

wolfram language提供了一个图片分类器分类出来的结果可能很有意思可以结合苹果的图片关注区域生成器来结合使用

ImageIdentify[pictureObj]

这个方法还支持subcategory分类支持多输出具体看文档

https://www.imageidentify.com/about/how-it-works

wolfram支持cloud deploy 到wolfram cloud不过那样可能不行

文本语言识别分类

lingua performs good in short text, can be used in java or kotlin

supporting detecting different languages:

cld2 containing useful vectors containing text spans python binding

>>> import pycld2 as cld2
>>> text_content = """ A accès aux chiens et aux frontaux qui lui ont été il peut consulter et modifier ses collections et exporter Cet article concerne le pays européen aujourd’hui appelé République française.
Pour d’autres usages du nom France, Pour une aide rapide et effective, veuiller trouver votre aide dans le menu ci-dessus.
Welcome, to this world of Data Scientist. Today is a lovely day."""
>>> _, _, _, detected_language = cld2.detect(text_content,  returnVectors=True)
>>> print(detected_language)
((0, 323, 'FRENCH', 'fr'), (323, 64, 'ENGLISH', 'en'))

original cld3 is designed for chromium and it relies on chromium code to run

official cld3 python bindings

additional Python language related library from geeksforgeeks:

textblob is a natural language processing toolkit

from textblob import TextBlob
text = "это компьютерный портал для гиков. It was a beautiful day ."
lang = TextBlob(text)
print(lang.detect_language())
# ru

langid performs good in short text

textcat (r package)

google language detection library in python: langdetect

javascript:

https://github.com/wooorm/franc

python version of franc:

pyfranc

wlatlang.org provides whatlang-rs as rust package, also whatlang-py as python bindings

Cut Music Segments With Lyrics and BPM

def compare(a,b,reverse=False):

seg_low, seg_high = get_allowed_segments(bpm, low, high, tolerance=0.8) # the tolerance is compared with a common function called compare. it can be customized to output only value >=1 or vice versa.

candidates = sorted_lyrics_nearby_bpm_candidates + sorted_remained_bpm_candidates # priortize lyrics candidates.

踩点音乐识别搞笑视频收集

now we have audioFlux, alternative to librosa, but faster

audioowl for tempo, beat and notes identification:

https://github.com/dodiku/AudioOwl

cnn based audio segmentation toolkit allow to detect speech, music and speaker gender:

https://github.com/ina-foss/inaSpeechSegmenter

speech music detection using keras:

https://github.com/qlemaire22/speech-music-detection

awesome deep learning music:

https://github.com/ybayle/awesome-deep-learning-music

music genre classification/ Music Classification/ Music Recommendation/ Music search

https://github.com/mlachmish/MusicGenreClassification

https://github.com/kristijanbartol/Deep-Music-Tagger

https://github.com/tae-jun/resemul

https://github.com/Insiyaa/Music-Genre-Classification

music recognization service:

audioid soundhound

maybe you should consider some chinese tools? none there.

music radar recognize music:

https://github.com/keshavbhatt/music-radar

mousai using free audd api to recognize music:

https://github.com/SeaDve/Mousai

music emotion recognization:

https://github.com/SeungHeonDoh/Music_Emotion_Recognition

music tagging and recognization, using acoustic ids and community based music database:

https://github.com/metabrainz/picard

https://musicbrainz.org/doc/AcoustID

mixingbear(alike neuralmix):

https://github.com/dodiku/MixingBear

madmom

https://github.com/CPJKU/madmom

http://madmom.readthedocs.org

音乐分类综合音频分析包

pyaudioanalysis

mathematica audio slience removal segmentation:

https://zhuanlan.zhihu.com/p/43165678

music21 for music recognition:

https://zhuanlan.zhihu.com/p/35140033

music21 for midi analysis:

https://pypi.org/project/music21/

https://music21.readthedocs.io/en/latest

https://zhuanlan.zhihu.com/p/73564852

sound recognition and localization:

https://reality.ai/automotive-sound-recognition-localization/

urbansound8k dataset ( 6gb ):

https://www.kaggle.com/datasets/chrisfilo/urbansound8k

fourier transform cat meow detection:

https://github.com/EricDavidWells/MeowDetector

building sound event classifier:

https://ignitarium.com/building-an-ai-based-sound-event-classifier/

real time continuous sound event classification(usually via silence detection):

https://medium.com/@chathuranga.15/real-time-sound-event-classification-83e892cf187e

https://medium.com/@chathuranga.15/sound-event-classification-using-machine-learning-8768092beafc

cry detection:

https://www.amberou.com/cry-detection

https://github.com/umangkk5/Infant-Cry-Detection-System/blob/master/site-packages/soundfile.py

urbansound classifier:

https://github.com/awln/urban8k-audio-classifier

laugh detection:

https://github.com/ideo/LaughDetection

gun shot detection:

https://github.com/hasnainnaeem/Gunshot-Detection-in-Audio

dog bark detector:

https://github.com/t04glovern/dog-bark-detection

https://devopstar.com/2020/04/13/dog-bark-detector-machine-learning-model

https://dsp.stackexchange.com/questions/23466/detect-dog-barks

获得音乐识别api 最好是qq音乐识别国内识别引擎

不能识别就分析简介有没有BGM

踩点 bpm以前的autoup项目里有看看其他的分析软件有没有 premiere一键踩点插件可能有开源库支持

已有的踩点视频可以切出无文字的片段根据音乐结构区分高潮开始中间等部分根据音乐类型标签归类视频

搞笑视频的话有纯笑声比较好动作幅度大的不要有对话反向截图收集类似视频

2022-05-10

Video Cutting With Captioners, Video Classifiers, Audio Classifier, Audio Categorizer

you can cut based on video highlights, usually generated by counting “replay overlaps”, avaliable from youtube and bilibili, again needs supervised learning to recognize patterns and emit signals which we want

COCA using vit and palm for video captioning

audio classifier tutorial

audio tagger visualize how audio classifier works

need to identify sounds like dog bark and gun shots, sobs, laughs. Open sourced.

May use sound analyzers.

audio2midi:

https://gist.github.com/natowi/d26c7e97443ec97e8032fb7e7596f0b0

Recurrent Neural Network for generating piano MIDI-files from audio (MP3, WAV, etc.)

https://github.com/BShakhovsky/PolyphonicPianoTranscription

A python program which performs an FFT on an audio file and produces a MIDI file from the results

https://github.com/NFJones/audio-to-midi

Extract the melody from an audio file and export to MIDI