2022-08-08
识别视频语言

speechbrain has features of Speech Recognition, Speaker Recognition, Speech Enhancement, Speech Processing, Multi Microphone Processing, Text-to-Speech, and also supports Spoken Language Understanding, Language Modeling, Diarization, Speech Translation, Language Identification, Voice Activity Detection, Sound classification, Grapheme-to-Phoneme, and many others.

概述

视频里面的语言分为图片上面打出来的字幕以及人说的话

涉及到的问题分别为: 图片文字的语言分类 以及音频语言分类

音频识别

online speech recognition

pip install SpeechRecognition

offline, need to provide language id:

https://pypi.org/project/automatic-speech-recognition/

use paddlespeech if possible, for chinese and english

图片语言识别

use google cloud to detect language type in image:

https://github.com/deduced/ml-ocr-lang-detection

Detects and Recognizes text and font language in an image

https://github.com/JAIJANYANI/Language-Detection-in-Image

图片语言文字分类 可以用easyocr实现 加载多个模型 比如 中文加英文加日语 b站其他语言的可能也不怎么受欢迎 最多再加韩语

可以从视频简介 标题 链接里面提取出句子 每个句子进行语言分类 确定要使用的OCR模型 也有可能出现描述语言和视频图片文字语言不一致的情况

wolfram language提供了一个图片分类器 分类出来的结果可能很有意思 可以结合苹果的图片关注区域生成器来结合使用

ImageIdentify[pictureObj]

这个方法还支持subcategory分类 支持多输出 具体看文档

https://www.imageidentify.com/about/how-it-works

wolfram支持cloud deploy 到wolfram cloud不过那样可能不行

文本语言识别分类

lingua performs good in short text, can be used in java or kotlin

supporting detecting different languages:

cld2 containing useful vectors containing text spans python binding

1
2
3
4
5
6
7
8
>>> import pycld2 as cld2
>>> text_content = """ A accès aux chiens et aux frontaux qui lui ont été il peut consulter et modifier ses collections et exporter Cet article concerne le pays européen aujourd’hui appelé République française.
Pour d’autres usages du nom France, Pour une aide rapide et effective, veuiller trouver votre aide dans le menu ci-dessus.
Welcome, to this world of Data Scientist. Today is a lovely day."""
>>> _, _, _, detected_language = cld2.detect(text_content, returnVectors=True)
>>> print(detected_language)
((0, 323, 'FRENCH', 'fr'), (323, 64, 'ENGLISH', 'en'))

original cld3 is designed for chromium and it relies on chromium code to run

official cld3 python bindings

additional Python language related library from geeksforgeeks:

textblob is a natural language processing toolkit

1
2
3
4
5
6
from textblob import TextBlob
text = "это компьютерный портал для гиков. It was a beautiful day ."
lang = TextBlob(text)
print(lang.detect_language())
# ru

langid performs good in short text

textcat (r package)

google language detection library in python: langdetect

javascript:

https://github.com/wooorm/franc

python version of franc:

pyfranc

wlatlang.org provides whatlang-rs as rust package, also whatlang-py as python bindings

Read More

2022-02-16
Media Repurpose Tool

Consider recording the media before real-time processing.

Scan the object via taobao streaming and make it dance.

Transplant lolita pictures to bilibili.

Share dialogs/info from soul/qq/wechat.

Repurpose a wide range of streaming platforms.

use ocr to filter out text info

find new title from danmaku or comments

我们都知道,视频主要由画面和音频组成。但还有一个元素同样包含了巨大的信息量 —— 字幕。结合现有的自然语言处理模型,我们便能实现对话抽取式的自动剪辑。PS: 软件图标是我妹妹Hannah设计滴~

项目开源地址:https://github.com/COLOR-SKY/DialogueExtractor

学业之余我会陆续更新工具教程。

Read More