Discover the best open-source text-to-speech, speech-to-text, voice cloning, voice activity detection and LLM models you can self-host — each with its GitHub stars, license and a link to the repo.
Submit an open-source productReal-Time Voice Cloning — Clone a voice in 5 seconds to generate arbitrary speech in real-time
Fish Audio — Open-source AI voice platform with high-quality TTS and voice cloning in multiple languages.
DeepSpeech — DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.
NVIDIA NeMo — A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
SpeechBrain — A PyTorch-based Speech Toolkit
pyannote.audio — Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
Silero VAD — Silero VAD: pre-trained enterprise-grade Voice Activity Detector