Members-Only
Recent Talks & Demos are for members only
You must be an AI Tinkerers active member to view these talks and demos.
Xing Xing: Efficient AI Music
Presenting efficient AI music generation via vocal-conditioned diffusion and showcasing a scalable karaoke app using various off-the-market AI tools.
I will be presenting my work on AI music. First, I will demonstrate that low parameter count and efficiency could be achieved for AI music generation. This comes from my recent publication on music diffusion that is conditioned on the vocal. Second, I will showcase my project Xing Xing. It is a karaoke singing app that separates tracks and has live transcription. It uses vaiious off-the-market AI tools and models. It demonstrates scalability and different ways that musical AI could be applied
Lightweight latent diffusion model using Soft Alignment Attention for vocal-conditioned music generation.
- PyTorchPyTorch is the open-source machine learning framework: it provides a Python-first tensor library with strong GPU acceleration and a dynamic computation graph for building deep neural networks.PyTorch, developed by Meta AI, is a premier open-source deep learning framework favored in both research and production environments. Its core is a powerful tensor library (like NumPy) optimized for GPU acceleration, delivering 50x or greater speedups for complex computations. The key differentiator is its 'Pythonic' design and dynamic computation graph (eager execution), which allows for rapid prototyping and simplified debugging compared to static-graph frameworks. Leveraging its Autograd system for automatic differentiation, practitioners build and train models for computer vision and NLP; major companies like Tesla (Autopilot) and Microsoft utilize PyTorch for critical AI applications.
- DemucsDemucs: The state-of-the-art AI model for music source separation, using a Hybrid Transformer architecture to isolate individual audio stems.Demucs (Deep Extractor for Music Sources) is a powerful, open-source model developed by Meta AI (Facebook Research) for high-fidelity audio source separation. It operates directly on the raw waveform, bypassing traditional spectrogram-based methods to minimize artifacts. The latest version, Hybrid Transformer Demucs (HTDemucs), utilizes a dual-domain U-Net and cross-domain transformer to achieve a competitive 9.20 dB SDR on the MUSDB HQ test set, a benchmark for separating music into constituent tracks: vocals, drums, bass, and accompaniment. This makes it the go-to tool for musicians and researchers needing clean, fast extraction of stems for remixing or analysis.
- FastAPIFastAPI is a modern, high-performance Python web framework for building APIs with automatic OpenAPI documentation.FastAPI is a robust, high-speed Python web framework: it is built on Starlette (for async capabilities) and Pydantic (for data validation and serialization). Leveraging standard Python 3.8+ type hints, the framework automatically generates interactive API documentation (Swagger UI/ReDoc) and enforces data validation, effectively reducing developer-induced errors by an estimated 40%. This architecture delivers performance on par with Node.js and Go, significantly increasing feature development speed (up to 300% faster). It is production-ready, fully supporting OpenAPI and JSON Schema standards for all API specifications.
- NextNext.js is the full-stack React framework: it delivers high-performance web applications via hybrid rendering and powerful, Rust-based tooling.This is the React Framework for production: Next.js enables you to build full-stack web applications with zero configuration and maximum efficiency. It supports a hybrid rendering approach (Server-Side Rendering, Static Site Generation, and Incremental Static Regeneration) for optimal speed and SEO performance. Key features include React Server Components, Server Actions for running server code directly, and the App Router for advanced routing and nested layouts. Developed by Vercel, it leverages Rust-based tools like Turbopack and the Speedy Web Compiler for the fastest possible builds and a superior developer experience.
- OpenAI WhisperWhisper is OpenAI's robust, open-source Automatic Speech Recognition (ASR) system, trained on 680,000 hours of diverse audio.This is Whisper: a high-performance, general-purpose ASR model from OpenAI. It was trained on a massive 680,000 hours of multilingual, multitask data, resulting in exceptional robustness against accents, background noise, and technical language. The model is a Transformer sequence-to-sequence architecture, engineered for multiple tasks: multilingual transcription, speech-to-English translation, and language identification. Developers leverage the open-source code and various model sizes (tiny, base, small, medium, large) to balance transcription speed with near human-level accuracy for diverse applications.
Related projects
Building Community in Toronto
Toronto
Learn how a 300-person Toronto founder community was built, featuring an app using vector similarity and agentic web…
Mindfulness without a Mind
Toronto
Experience custom guided meditations generated in real-time using voice AI. This exhibit showcases practical applications of generative AI…
3D Traffic Twins with Video Generation with RAP
Toronto
Reconstruct messy traffic camera feeds into interactive 3D twins, infer flows, and generate "what-if" videos showing safer city…
Uplift Entrepreneurs & Businesses in ways only dreamed possible.
Toronto
This presentation details a tool designed to address significant privacy and capability gaps for startups and existing businesses,…
DeepSeek-ing a Needle in a Haystack
Toronto
Learn how to use DeepSeek R1 agentic workflows and temporal prompting to filter, rank, and retrieve the most relevant…
JobsYo: Building an AI-based but Human-driven Job Search, Research and Apply Ecosystem
Toronto
See a live demo of an AI job search platform featuring multi-model API routing, context engineering, agentic job…