语音转文字 Stt Speech To Text

speech-to-text
STT tools
APIs
online STT services
offline STT options
characterAPP’s API
Sogou input method
free STT services
metavoiceio
pyannote
speechbrain
VOSK
PaddleSpeech
Google USM
whisper.cpp
whisperX
whisper GUI Buzz
OpenAI’s whisper
multilingual support
noise and music detection
translation capabilities
time accuracy through forced alignment
This article provides a comprehensive overview of various speech-to-text tools and APIs, both online and offline. It covers options such as characterAPP’s API, Sogou input method, and numerous free STT services for online solutions. For offline solutions, it discusses metavoiceio, pyannote, speechbrain, VOSK, PaddleSpeech, Google USM (Universal Speech Model), whisper.cpp, whisperX, whisper GUI Buzz, and OpenAI’s whisper. The article highlights features like multilingual support, noise and music detection, translation capabilities, and improved time accuracy through forced alignment for these tools.
Published

September 17, 2022


语音转文字 asr stt speech to text

online

字说APP的api

逆向搜狗输入法 绕过签名验证

搜狗输入法apk的api

微软stt

https://github.com/cuberwr/bilibiliSTT

多家免费stt

https://github.com/1c7/Translate-Subtitle-File

offline

https://github.com/metavoiceio/metavoice-src

pyannote segment audio according to different speakers, detect voice activity

speechbrain very advanced speech related ai library, with almost everything related to speech

vosk

paddlespeech


paper of Google USM (universal speech model) supporting 1000 languages


whisper.cpp perform fast voice to text operation using cpu rather than gpu

whisperx improve time accuracy with forced alignment

whisper gui buzz

whisper by openai, with multilingual and translation avaliable, can detect under background music and noise, with slience,