This article provides a comprehensive overview of various speech-to-text tools and APIs, both online and offline. It covers options such as characterAPP’s API, Sogou input method, and numerous free STT services for online solutions. For offline solutions, it discusses metavoiceio, pyannote, speechbrain, VOSK, PaddleSpeech, Google USM (Universal Speech Model), whisper.cpp, whisperX, whisper GUI Buzz, and OpenAI’s whisper. The article highlights features like multilingual support, noise and music detection, translation capabilities, and improved time accuracy through forced alignment for these tools.

语音转文字 asr stt speech to text

online

字说APP的api

逆向搜狗输入法 绕过签名验证

搜狗输入法apk的api

微软stt

https://github.com/cuberwr/bilibiliSTT

多家免费stt

https://github.com/1c7/Translate-Subtitle-File

offline

https://github.com/metavoiceio/metavoice-src

pyannote segment audio according to different speakers, detect voice activity

speechbrain very advanced speech related ai library, with almost everything related to speech

vosk

paddlespeech


paper of Google USM (universal speech model) supporting 1000 languages


whisper.cpp perform fast voice to text operation using cpu rather than gpu

whisperx improve time accuracy with forced alignment

whisper gui buzz

whisper by openai, with multilingual and translation avaliable, can detect under background music and noise, with slience,

Comments