Technology

DeepSpeech

An open-source speech-to-text engine utilizing Baidu's Deep Speech architecture and implemented via TensorFlow.

DeepSpeech transforms audio into text using a production-ready model trained on thousands of hours of voice data. It leverages a recurrent neural network (RNN) to map spectrograms directly to character sequences, bypassing traditional phonetic engineering. Developed by Mozilla, the engine supports real-time transcription on hardware ranging from Raspberry Pi 4 devices to high-end NVIDIA GPUs. Developers integrate the technology through Python, C, and Java bindings to build private, offline voice interfaces without relying on proprietary cloud providers.

https://github.com/mozilla/DeepSpeech

1 project · 1 city

Related technologies

Amazon Transcribe 2 BERT 179 BLOOM 115 CMU Sphinx 2 Database 8 Google Cloud Speech-to-Text 2 GPT-3 191 GPT-4 528 IBM Watson Speech to Text 2 Kaldi 2 Llama-2 227 PaLM 2 116 RoBERTa 118 Vision models 1 wav2vec 2 1 Whisper 25 YouTube Shorts 1

Recent Talks & Demos

Showing 1-1 of 1

Members-Only

YT shorts finder

Austin Sep 12

Vision models Whisper