speechbrain has features of Speech Recognition, Speaker Recognition, Speech Enhancement, Speech Processing, Multi Microphone Processing, Text-to-Speech, and also supports Spoken Language Understanding, Language Modeling, Diarization, Speech Translation, Language Identification, Voice Activity Detection, Sound classification, Grapheme-to-Phoneme, and many others.
概述
视频里面的语言分为图片上面打出来的字幕以及人说的话
涉及到的问题分别为: 图片文字的语言分类 以及音频语言分类
音频识别
online speech recognition
pip install SpeechRecognition
offline, need to provide language id:
https://pypi.org/project/automatic-speech-recognition/
use paddlespeech if possible, for chinese and english
图片语言识别
use google cloud to detect language type in image:
https://github.com/deduced/ml-ocr-lang-detection
Detects and Recognizes text and font language in an image
https://github.com/JAIJANYANI/Language-Detection-in-Image
图片语言文字分类 可以用easyocr实现 加载多个模型 比如 中文加英文加日语 b站其他语言的可能也不怎么受欢迎 最多再加韩语
可以从视频简介 标题 链接里面提取出句子 每个句子进行语言分类 确定要使用的OCR模型 也有可能出现描述语言和视频图片文字语言不一致的情况
wolfram language提供了一个图片分类器 分类出来的结果可能很有意思 可以结合苹果的图片关注区域生成器来结合使用
ImageIdentify[pictureObj]
这个方法还支持subcategory分类 支持多输出 具体看文档
https://www.imageidentify.com/about/how-it-works
wolfram支持cloud deploy 到wolfram cloud不过那样可能不行
文本语言识别分类
lingua performs good in short text, can be used in java or kotlin
supporting detecting different languages:
cld2 containing useful vectors containing text spans python binding
1 | import pycld2 as cld2 |
original cld3 is designed for chromium and it relies on chromium code to run
additional Python language related library from geeksforgeeks:
textblob is a natural language processing toolkit
1 | from textblob import TextBlob |
langid performs good in short text
google language detection library in python: langdetect
javascript:
https://github.com/wooorm/franc
python version of franc:
pyfranc
wlatlang.org provides whatlang-rs as rust package, also whatlang-py as python bindings