2022-08-08
识别视频语言

speechbrain has features of Speech Recognition, Speaker Recognition, Speech Enhancement, Speech Processing, Multi Microphone Processing, Text-to-Speech, and also supports Spoken Language Understanding, Language Modeling, Diarization, Speech Translation, Language Identification, Voice Activity Detection, Sound classification, Grapheme-to-Phoneme, and many others.

概述

视频里面的语言分为图片上面打出来的字幕以及人说的话

涉及到的问题分别为: 图片文字的语言分类 以及音频语言分类

音频识别

online speech recognition

pip install SpeechRecognition

offline, need to provide language id:

https://pypi.org/project/automatic-speech-recognition/

use paddlespeech if possible, for chinese and english

图片语言识别

use google cloud to detect language type in image:

https://github.com/deduced/ml-ocr-lang-detection

Detects and Recognizes text and font language in an image

https://github.com/JAIJANYANI/Language-Detection-in-Image

图片语言文字分类 可以用easyocr实现 加载多个模型 比如 中文加英文加日语 b站其他语言的可能也不怎么受欢迎 最多再加韩语

可以从视频简介 标题 链接里面提取出句子 每个句子进行语言分类 确定要使用的OCR模型 也有可能出现描述语言和视频图片文字语言不一致的情况

wolfram language提供了一个图片分类器 分类出来的结果可能很有意思 可以结合苹果的图片关注区域生成器来结合使用

ImageIdentify[pictureObj]

这个方法还支持subcategory分类 支持多输出 具体看文档

https://www.imageidentify.com/about/how-it-works

wolfram支持cloud deploy 到wolfram cloud不过那样可能不行

文本语言识别分类

lingua performs good in short text, can be used in java or kotlin

supporting detecting different languages:

cld2 containing useful vectors containing text spans python binding

1
2
3
4
5
6
7
8
>>> import pycld2 as cld2
>>> text_content = """ A accès aux chiens et aux frontaux qui lui ont été il peut consulter et modifier ses collections et exporter Cet article concerne le pays européen aujourd’hui appelé République française.
Pour d’autres usages du nom France, Pour une aide rapide et effective, veuiller trouver votre aide dans le menu ci-dessus.
Welcome, to this world of Data Scientist. Today is a lovely day."""
>>> _, _, _, detected_language = cld2.detect(text_content, returnVectors=True)
>>> print(detected_language)
((0, 323, 'FRENCH', 'fr'), (323, 64, 'ENGLISH', 'en'))

original cld3 is designed for chromium and it relies on chromium code to run

official cld3 python bindings

additional Python language related library from geeksforgeeks:

textblob is a natural language processing toolkit

1
2
3
4
5
6
from textblob import TextBlob
text = "это компьютерный портал для гиков. It was a beautiful day ."
lang = TextBlob(text)
print(lang.detect_language())
# ru

langid performs good in short text

textcat (r package)

google language detection library in python: langdetect

javascript:

https://github.com/wooorm/franc

python version of franc:

pyfranc

wlatlang.org provides whatlang-rs as rust package, also whatlang-py as python bindings

Read More

2022-07-08
Cut Music Scenes With Lyrics And Bpm

Cut Music Segments With Lyrics and BPM

def compare(a,b,reverse=False):

seg_low, seg_high = get_allowed_segments(bpm, low, high, tolerance=0.8) # the tolerance is compared with a common function called compare. it can be customized to output only value >=1 or vice versa.

candidates = sorted_lyrics_nearby_bpm_candidates + sorted_remained_bpm_candidates # priortize lyrics candidates.

Read More

2022-05-11
踩点 音乐识别

踩点 音乐识别 搞笑视频收集

now we have audioFlux, alternative to librosa, but faster


audioowl for tempo, beat and notes identification:

https://github.com/dodiku/AudioOwl

cnn based audio segmentation toolkit allow to detect speech, music and speaker gender:

https://github.com/ina-foss/inaSpeechSegmenter

speech music detection using keras:

https://github.com/qlemaire22/speech-music-detection

awesome deep learning music:

https://github.com/ybayle/awesome-deep-learning-music

music genre classification/ Music Classification/ Music Recommendation/ Music search

https://github.com/mlachmish/MusicGenreClassification

https://github.com/kristijanbartol/Deep-Music-Tagger

https://github.com/tae-jun/resemul

https://github.com/Insiyaa/Music-Genre-Classification

music recognization service:

audioid soundhound

maybe you should consider some chinese tools? none there.

music radar recognize music:

https://github.com/keshavbhatt/music-radar

mousai using free audd api to recognize music:

https://github.com/SeaDve/Mousai

music emotion recognization:

https://github.com/SeungHeonDoh/Music_Emotion_Recognition

music tagging and recognization, using acoustic ids and community based music database:

https://github.com/metabrainz/picard

https://musicbrainz.org/doc/AcoustID

mixingbear(alike neuralmix):

https://github.com/dodiku/MixingBear

madmom

https://github.com/CPJKU/madmom

http://madmom.readthedocs.org

音乐分类 综合音频分析包

pyaudioanalysis

mathematica audio slience removal segmentation:

https://zhuanlan.zhihu.com/p/43165678

music21 for music recognition:

https://zhuanlan.zhihu.com/p/35140033

music21 for midi analysis:

https://pypi.org/project/music21/

https://music21.readthedocs.io/en/latest

https://zhuanlan.zhihu.com/p/73564852

sound recognition and localization:

https://reality.ai/automotive-sound-recognition-localization/

urbansound8k dataset ( 6gb ):

https://www.kaggle.com/datasets/chrisfilo/urbansound8k

fourier transform cat meow detection:

https://github.com/EricDavidWells/MeowDetector

building sound event classifier:

https://ignitarium.com/building-an-ai-based-sound-event-classifier/

real time continuous sound event classification(usually via silence detection):

https://medium.com/@chathuranga.15/real-time-sound-event-classification-83e892cf187e

https://medium.com/@chathuranga.15/real-time-sound-event-classification-83e892cf187e

https://medium.com/@chathuranga.15/sound-event-classification-using-machine-learning-8768092beafc

cry detection:

https://www.amberou.com/cry-detection

https://github.com/umangkk5/Infant-Cry-Detection-System/blob/master/site-packages/soundfile.py

urbansound classifier:

https://github.com/awln/urban8k-audio-classifier

laugh detection:

https://github.com/ideo/LaughDetection

gun shot detection:

https://github.com/hasnainnaeem/Gunshot-Detection-in-Audio

dog bark detector:

https://github.com/t04glovern/dog-bark-detection

https://devopstar.com/2020/04/13/dog-bark-detector-machine-learning-model

https://dsp.stackexchange.com/questions/23466/detect-dog-barks

获得音乐识别api 最好是qq音乐识别 国内识别引擎

不能识别就分析简介 有没有BGM

踩点 bpm以前的autoup项目里有 看看其他的分析软件有没有 premiere一键踩点插件可能有开源库支持

已有的踩点视频 可以切出无文字的片段 根据音乐结构区分高潮 开始 中间等部分 根据音乐类型标签归类视频

搞笑视频的话 有纯笑声比较好 动作幅度大的 不要有对话 反向截图 收集类似视频

Read More

2022-05-10
Video Cutting With Captioners, Video Classifiers, Audio Classifier, Audio Categorizer

you can cut based on video highlights, usually generated by counting “replay overlaps”, avaliable from youtube and bilibili, again needs supervised learning to recognize patterns and emit signals which we want

COCA using vit and palm for video captioning

audio classifier tutorial

audio tagger visualize how audio classifier works

need to identify sounds like dog bark and gun shots, sobs, laughs. Open sourced.

May use sound analyzers.

audio2midi:

https://gist.github.com/natowi/d26c7e97443ec97e8032fb7e7596f0b0

Recurrent Neural Network for generating piano MIDI-files from audio (MP3, WAV, etc.)

https://github.com/BShakhovsky/PolyphonicPianoTranscription

A python program which performs an FFT on an audio file and produces a MIDI file from the results

https://github.com/NFJones/audio-to-midi

Extract the melody from an audio file and export to MIDI

https://github.com/justinsalamon/audio_to_midi_melodia

Performs pitch detection on a polyphonic audio source and outputs to MIDI

https://github.com/corbanbrook/spectrotune

Program to detect pitch from wav files and write in time quantized MIDI

https://github.com/vaibhavnayel/Audio-to-MIDI-converter

A CNN which converts piano audio to a simplified MIDI format

https://github.com/hartmetzls/audio_to_midi

An application of vocal melody extraction.

https://github.com/bill317996/Audio-to-midi

Transcribes polyphonic piano pieces from audio (MP3, WAV, etc.) into MIDI-files

https://github.com/BShakhovsky/PianoAudioToMidi

Polyphonic pitch tracking in real time using machine learning algorithms

https://github.com/jaym910/polyphonic_track

Audio to MIDI converter

https://github.com/sbaeunker/audioToMidiConverter

Explore Transcribing Techniques to auto convert audio to midi

https://github.com/Goldspear/audio2midi

PitchToMIDI

https://github.com/KatoIppei/PitchToMIDI See releases

Piano & Drums

https://github.com/magenta/magenta/tree/master/magenta/models/onsets_frames_transcription

Tony: a tool for melody transcription

https://www.sonicvisualiser.org/tony/ https://github.com/sonic-visualiser/tony https://code.soundsoftware.ac.uk/projects/tony (https://github.com/mikulas-mrva/tony2max)

MusicTranscription

https://github.com/ClaraBing/CS229-MusicTranscription

pYIN

https://code.soundsoftware.ac.uk/projects/pyin https://github.com/ronggong/pypYIN (python)

Onsets and Frames Transcription (Piano & Drums)

https://github.com/magenta/magenta/tree/master/magenta/models/onsets_frames_transcription https://piano-scribe.glitch.me/

WaoN

https://sourceforge.net/projects/waon/

audio2midi conversion works great with prior source separation https://github.com/deezer/spleeter or others like https://github.com/rgcda/Musisep

Read More