Project structure of: james4ever0/whisper

whisper Open-source speech recognition library and tools
- CHANGELOG.md Latest openai/whisper project updates, fixes, and features.
- data
  - README.md Convert, preprocess, timings, labels, multilingual datasets
- model-card.md Whisper AI: Speech Recognition, Translation, Challenges
- notebooks Notebooks: Contains scripts for ASR, WER calculation, and multilingual tasks.
  - LibriSpeech.py Initialize LibriSpeech, load test-clean split, perform inference, calculate WER.
  - Multilingual_ASR.py Multilingual ASR/MT script, performs package installs and language model training.
- pyproject.toml Python code formatting settings
- README.md Whisper: Multitask Transformer for diverse language speech recognition.
- requirements.txt Python packages and Triton version requirements.
- setup.py Set up Python package with requirements.
- tests Test audio, normalization, timing, and transcription functionality.
  - conftest.py Imports libraries, sets seed.
  - test_audio.py Test audio loading, spectrogram calculation, and consistency.
  - test_normalizer.py Tests number, date, and text normalization classes.
  - test_timing.py Median filter performance test on CPU and GPU.
  - test_tokenizer.py Test tokenizer functionality across languages.
  - test_transcribe.py Test Whisper AI transcription functionality.
- whisper Whisper: Library for audio-to-text, efficient language detection and transcription.
  - __init__.py Whisper: Library for model download, verification, and availability.
  - __main__.py Transcribes audio using Whisper CLI.
  - audio.py Audio preprocessing for Tensor or Torch tensors.
  - decoding.py Whisper model, efficient audio language detection.
  - model.py Audio-to-text model: encoder-decoder, convolutional layers, attention
  - normalizers Text normalizers for multiple languages.
    - __init__.py Text normalizer initialization.
    - basic.py Normalizes text, removes symbols and diacritics.
    - english.py English language number and text normalizer
  - timing.py Median filtering, dynamic time warping, text-to-speech alignment.
  - tokenizer.py Efficient text tokenization with Tiktoken library support.
  - transcribe.py Speech transcription with Whisper model, customizable.
  - triton_ops.py Triton parallel Dynamic Time Warp, CUDA Median Filter
  - utils.py Utility, compression ratio, timestamp formatting. WriteTSV, JSON.
  - version.py Sets version of "whisper" to 20231117