语音转文字 Stt Speech To Text
speech-to-text
STT tools
APIs
online STT services
offline STT options
characterAPP’s API
Sogou input method
free STT services
metavoiceio
pyannote
speechbrain
VOSK
PaddleSpeech
Google USM
whisper.cpp
whisperX
whisper GUI Buzz
OpenAI’s whisper
multilingual support
noise and music detection
translation capabilities
time accuracy through forced alignment
This article provides a comprehensive overview of various speech-to-text tools and APIs, both online and offline. It covers options such as characterAPP’s API, Sogou input method, and numerous free STT services for online solutions. For offline solutions, it discusses metavoiceio, pyannote, speechbrain, VOSK, PaddleSpeech, Google USM (Universal Speech Model), whisper.cpp, whisperX, whisper GUI Buzz, and OpenAI’s whisper. The article highlights features like multilingual support, noise and music detection, translation capabilities, and improved time accuracy through forced alignment for these tools.
语音转文字 asr stt speech to text
online
字说APP的api
搜狗输入法apk的api
微软stt
https://github.com/cuberwr/bilibiliSTT
多家免费stt
https://github.com/1c7/Translate-Subtitle-File
offline
https://github.com/metavoiceio/metavoice-src
pyannote segment audio according to different speakers, detect voice activity
speechbrain very advanced speech related ai library, with almost everything related to speech
vosk
paddlespeech
paper of Google USM (universal speech model) supporting 1000 languages
whisper.cpp perform fast voice to text operation using cpu rather than gpu
whisperx improve time accuracy with forced alignment
whisper by openai, with multilingual and translation avaliable, can detect under background music and noise, with slience,