Members-Only
Recent Talks & Demos are for members only
You must be an AI Tinkerers active member to view these talks and demos.
Elder Czech STT Performance
Discover which of five common STT providers performs best on elderly Czech voices, with surprising findings on volume leveling impacting accuracy.
Results of analysis of 5 common realtime STT providers (Cartesia, Deepgram, Eleven Labs, Google, Speechmatics) on cca. 1 hour of 20 different elderly czech voices (Paměť národa). It is more results, the code itself is fairly trivial…
- CartesiaCartesia develops Sonic: a generative AI model delivering high-fidelity text-to-speech with sub-100ms latency for real-time voice applications.Cartesia specializes in real-time multimodal intelligence, led by its flagship Sonic model. The system achieves industry-leading speeds (under 100 milliseconds of latency) to provide natural, expressive voice synthesis across dozens of languages. Developers use the Cartesia API to power interactive agents, gaming NPCs, and accessibility tools with lifelike audio quality. By prioritizing performance and scale, the platform eliminates the lag found in traditional speech synthesis.
- DeepgramDeepgram is the end-to-end Voice AI platform: delivering real-time, highly accurate Speech-to-Text (STT), Text-to-Speech (TTS), and conversational Voice Agents via a developer-first API.Deepgram is your enterprise-grade Voice AI platform, built on a proprietary end-to-end deep learning network for unmatched speed and accuracy. Our core APIs (STT, TTS, and the unified Voice Agent API) handle everything from real-time transcription to synthesizing natural speech. Specifically, models like Nova-3 and Flux deliver 2-4x better accuracy on alphanumeric data than competitors, and batch transcription processes one hour of audio in less than 30 seconds (120x real-time). We support flexible deployment (public cloud, private cloud, or self-hosted) and offer advanced features like diarization (up to 10 speakers) and custom model training, ensuring your voice applications—from contact centers to conversational AI—are fast, precise, and scalable.
- ElevenLabsElevenLabs delivers emotionally rich, human-like AI voice synthesis: text-to-speech, professional voice cloning, and AI dubbing across 30+ languages.ElevenLabs is the premier AI voice platform (founded 2022 by Piotr Dąbkowski and Mati Staniszewski), leveraging deep learning for superior audio. Key offerings include the expressive Eleven v3 text-to-speech model, professional Voice Cloning from minimal audio, and AI Dubbing for translating content into 30+ languages while preserving the original voice. The low-latency API (e.g., Flash v2.5 at 75ms) powers diverse applications: audiobooks, video voiceovers, and conversational AI agents for over a million users.
- Cloud Speech-to-Text APIGoogle’s API converts audio to text instantly across 125+ languages using advanced neural network models.This tool applies Google’s machine learning to transcribe audio files or live streams with precision. It supports 125+ languages (including specific dialects) and filters background noise for clear results in crowded settings. Use it to automate subtitles, power voice-controlled hardware, or process call center data for insights. Key features include automatic punctuation and speaker diarization: a utility that identifies different voices in a single conversation. It integrates directly with Google Cloud Storage for batch processing large datasets.
- SpeechmaticsSpeechmatics provides high-accuracy autonomous speech recognition (ASR) across 50+ languages through a single, unified API.This engine powers global brands (Deloitte, Vonage) by training on 1.1 million hours of unlabelled audio to master diverse accents and noisy environments. Its Ursa model consistently beats Big Tech benchmarks in Word Error Rate (WER) across global dialects. You can use it for real-time transcription, media subtitling, and sentiment analysis. Deployment is flexible: choose between secure on-premises containers or a low-latency cloud API.
Related projects
Stop Wasting Time Giving AI Context It Should Already Have
Prague
Stop re-explaining your work to AI. This talk demonstrates MemoryLane, a system that provides AI with real-time context…
AI for language learning
Prague
Learn how we built an AI‑powered language app, tackling translation errors, quirky DALL‑E bugs, native‑speaker corrections, and designing…
share of voice
Prague
This talk covers an n8n/html app fetching Facebook ads data for AI analysis and comparisons, ideal for data-heavy…
tbai – A mini HuggingFace for robots
Prague
This talk covers tbai, the backbone for legged robot research, detailing its full deployment pipeline from communication and…
Zero shot voice cloning vs fine-tuning
San Francisco
This talk compares zero-shot voice cloning and fine-tuning methods, demonstrating voice cloning from short samples using state-of-the-art models…
ElevenLabs TTS Demo
Berlin
A live demonstration of ElevenLabs’ text‑to‑speech platform and API, showcasing practical use cases, example projects, and ideas for…