语音转文字 Stt Speech To Text

speech-to-text

STT tools

APIs

online STT services

offline STT options

characterAPP’s API

Sogou input method

free STT services

metavoiceio

pyannote

speechbrain

VOSK

PaddleSpeech

Google USM

whisper.cpp

whisperX

whisper GUI Buzz

OpenAI’s whisper

multilingual support

noise and music detection

translation capabilities

time accuracy through forced alignment

This article provides a comprehensive overview of various speech-to-text tools and APIs, both online and offline. It covers options such as characterAPP’s API, Sogou input method, and numerous free STT services for online solutions. For offline solutions, it discusses metavoiceio, pyannote, speechbrain, VOSK, PaddleSpeech, Google USM (Universal Speech Model), whisper.cpp, whisperX, whisper GUI Buzz, and OpenAI’s whisper. The article highlights features like multilingual support, noise and music detection, translation capabilities, and improved time accuracy through forced alignment for these tools.

Published

September 17, 2022

语音转文字 asr stt speech to text

online

字说APP的api

逆向搜狗输入法绕过签名验证

搜狗输入法apk的api

微软stt

https://github.com/cuberwr/bilibiliSTT

多家免费stt

https://github.com/1c7/Translate-Subtitle-File

offline

https://github.com/metavoiceio/metavoice-src

pyannote segment audio according to different speakers, detect voice activity

speechbrain very advanced speech related ai library, with almost everything related to speech

vosk

paddlespeech

paper of Google USM (universal speech model) supporting 1000 languages

whisper.cpp perform fast voice to text operation using cpu rather than gpu

whisperx improve time accuracy with forced alignment

whisper gui buzz

whisper by openai, with multilingual and translation avaliable, can detect under background music and noise, with slience,