语音转文字 Stt Speech To Text
This article provides a comprehensive overview of various speech-to-text tools and APIs, both online and offline. It covers options such as characterAPP’s API, Sogou input method, and numerous free STT services for online solutions. For offline solutions, it discusses metavoiceio, pyannote, speechbrain, VOSK, PaddleSpeech, Google USM (Universal Speech Model), whisper.cpp, whisperX, whisper GUI Buzz, and OpenAI’s whisper. The article highlights features like multilingual support, noise and music detection, translation capabilities, and improved time accuracy through forced alignment for these tools.
语音转文字 asr stt speech to text
online
字说APP的api
搜狗输入法apk的api
微软stt
https://github.com/cuberwr/bilibiliSTT
多家免费stt
https://github.com/1c7/Translate-Subtitle-File
offline
https://github.com/metavoiceio/metavoice-src
pyannote segment audio according to different speakers, detect voice activity
speechbrain very advanced speech related ai library, with almost everything related to speech
vosk
paddlespeech
paper of Google USM (universal speech model) supporting 1000 languages
whisper.cpp perform fast voice to text operation using cpu rather than gpu
whisperx improve time accuracy with forced alignment
whisper by openai, with multilingual and translation avaliable, can detect under background music and noise, with slience,