Open Source Models for Voice AI — TTS, STT, VAD & LLMs

DeepSeek-V3

DeepSeek-V3 — DeepSeek-V3 — open source.

Large Language Models (LLM)MIT 103,785

View repo

Whisper

Whisper — Robust Speech Recognition via Large-Scale Weak Supervision

Speech-to-Text / TranscriptionMIT 103,363

View repo

Real-Time Voice Cloning

Real-Time Voice Cloning — Clone a voice in 5 seconds to generate arbitrary speech in real-time

Voice Cloning 59,929

View repo

Llama

Llama — Inference code for Llama models

Large Language Models (LLM) 59,461

View repo

VibeVoice

VibeVoice — Open-Source Frontier Voice AI

Text-to-Speech (TTS)MIT 49,535 5,528

View repo

Coqui TTS

Coqui TTS — 🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

Text-to-Speech (TTS)MPL-2.0 45,594 6,120

View repo

ChatTTS

ChatTTS — A generative speech model for daily dialogue.

Text-to-Speech (TTS)AGPL-3.0 39,492

View repo

OpenVoice

Myshell — Instant voice cloning by MIT and MyShell. Audio foundation model.

Voice CloningMIT 36,763

View repo

Voicebox

Voicebox — Open Source Voice Cloning Desktop App for Mac, Windows, and Linux.

Voice Agents & Assistants 31,738

View repo

fish-speech

fish-speech — SOTA Open Source TTS

Text-to-Speech (TTS) 30,892 2,636

View repo

Fish Audio

Fish Audio — Open-source AI voice platform with high-quality TTS and voice cloning in multiple languages.

Voice Agents & Assistants 30,892

View repo

DeepSpeech

DeepSpeech — DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.

Speech-to-Text / TranscriptionMPL-2.0 26,754

View repo

chatterbox

Resemble AI — SoTA open-source TTS

Text-to-Speech (TTS)MIT 25,170 3,344

View repo

CosyVoice

CosyVoice — Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Text-to-Speech (TTS)Apache-2.0 21,777

View repo

LiveKit

LiveKit — Open source platform for building, testing, deploying, and scaling voice, video, and physical AI agents.

Voice Agents & Assistants 19,345

View repo

NVIDIA NeMo

NVIDIA NeMo — A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Speech-to-Text / TranscriptionApache-2.0 17,445

View repo

Kaldi

Kaldi — kaldi-asr/kaldi is the official location of the Kaldi project.

Speech-to-Text / Transcription 15,418

View repo

Vosk

Vosk — Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node

Speech-to-Text / TranscriptionApache-2.0 14,872

View repo

F5-TTS

F5-TTS — Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"

Text-to-Speech (TTS)MIT 14,791

View repo

Pipecat

Pipecat — Open source framework for building multi-modal conversational AI with voice capabilities.

Voice Agents & Assistants 12,950

View repo

SpeechBrain

SpeechBrain — A PyTorch-based Speech Toolkit

Speech-to-Text / TranscriptionApache-2.0 11,640

View repo

pyannote.audio

pyannote.audio — Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

Voice Activity Detection (VAD)MIT 10,154

View repo

ESPnet

ESPnet — End-to-End Speech Processing Toolkit

Speech-to-Text / TranscriptionApache-2.0 9,869

View repo

Silero VAD

Silero VAD — Silero VAD: pre-trained enterprise-grade Voice Activity Detector

Voice Activity Detection (VAD)MIT 9,381

View repo

Moonshine

Moonshine — Very low latency speech to text, intent recognition, and text to speech, for building voice agents and interfaces

Speech-to-Text / Transcription 8,518

View repo

kokoro

kokoro — https://hf.co/hexgrad/Kokoro-82M

Text-to-Speech (TTS)Apache-2.0 7,576 822

View repo

MeloTTS

Myshell — High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.

Text-to-Speech (TTS)MIT 7,504 1,048

View repo

Zonos

Zonos — Zonos-v0.1 is a leading open-weight text-to-speech model trained on more than 200k hours of varied multilingual speech, delivering expressiveness and quality on par with—or even surpassing—top TTS providers.

Text-to-Speech (TTS)Apache-2.0 7,225

View repo

neutts

neutts — On-device TTS model by Neuphonic

Text-to-Speech (TTS) 6,005 639

View repo

Gemma

Gemma — Gemma open-weight LLM library, from Google DeepMind

Large Language Models (LLM)Apache-2.0 5,462

View repo