Members-Only
Recent Talks & Demos are for members only
You must be an AI Tinkerers active member to view these talks and demos.
Matryoshka Embeddings: AI Filtering
Learn how to overcome "Insight Blindness" in AI generation with Matryoshka embeddings. This demo shows a production pipeline filtering noise for usable long context and high-signal synthesis.
We realized that standard “Persistence” (chat history) is broken. In a production environment with thousands of chats and documents, context windows turn into noise unless the user is hyper-conscious of their usage. We call this “Insight Blindness.” I will demo the pipeline we built to fix this. We are moving beyond standard RAG by implementing Nomic Embed v1.5 with Matryoshka Representation Learning. I will show how we use adaptive, variable-density embeddings to filter noise before it ever hits the context window, and then pipe the high-signal clusters into Google Gemini for synthesis. No slides—just a walkthrough of the ingestion pipeline, the Matryoshka embedding layer, and the synthesis engin
- Nomic Embed v1An open-source, 8192-context text embedding model that beats OpenAI on MTEB benchmarks using Matryoshka dimensionality.Nomic Embed v1 delivers top-tier performance on the MTEB benchmark, outclassing OpenAI's text-embedding-3-small in retrieval accuracy. It handles long-form content via an 8192-token context window and supports Matryoshka embeddings for flexible vector sizes (64 to 768 dimensions). The model carries an Apache 2.0 license and provides full training data transparency. It is the go-to choice for developers building high-efficiency RAG systems and document search tools.
- Google GeminiGemini is Google's most capable, multimodal AI model: it seamlessly reasons across text, code, audio, image, and video.Gemini is Google's foundational, multimodal AI model, engineered to natively understand and combine text, code, image, audio, and video inputs. The technology is optimized across three sizes: Ultra (for highly complex tasks), Pro (for broad task scaling), and Nano (for efficient on-device performance). Gemini Ultra, for example, achieved a 90.0% score on the MMLU benchmark, surpassing human experts. It functions as a powerful AI assistant, integrated across Google services like Gmail and Maps, and features advanced tools like Deep Research and custom AI experts (Gems). Its Pro version offers a long context window, handling up to 1,500 pages or 30k lines of code simultaneously.
- PythonPython: The high-level, general-purpose language built for readability, powering everything from web backends to advanced machine learning models.Python is the high-level, general-purpose language prioritizing clear, readable syntax (via significant indentation), ensuring rapid development for any team . Its ecosystem is massive: use it for robust web development with frameworks like Django and Flask, or leverage its power in data science with libraries such as Pandas and NumPy . The Python Package Index (PyPI) provides thousands of community-contributed modules, offering immediate solutions for tasks from network programming to GUI creation . The language is actively maintained by the Python Software Foundation (PSF), with the stable release currently at Python 3.14.0 (as of November 2025) .
- FastAPIFastAPI is a modern, high-performance Python web framework for building APIs with automatic OpenAPI documentation.FastAPI is a robust, high-speed Python web framework: it is built on Starlette (for async capabilities) and Pydantic (for data validation and serialization). Leveraging standard Python 3.8+ type hints, the framework automatically generates interactive API documentation (Swagger UI/ReDoc) and enforces data validation, effectively reducing developer-induced errors by an estimated 40%. This architecture delivers performance on par with Node.js and Go, significantly increasing feature development speed (up to 300% faster). It is production-ready, fully supporting OpenAPI and JSON Schema standards for all API specifications.
Related projects
Building an AI on-call engineer
Dhaka
Explore Aster, an AI on-call engineer. Learn its current functionality, architectural choices, and practical LLM engineering patterns from…
Vibe Coding to Production: A PM’s LLM Pipeline for OE Data Cleaning
Dhaka
Learn how to build a real OE data cleaning system using Claude, GPT-4 mini, and Apps Script, moving…
The art of inferencing everywhere, how to embed and run language models natively.
Dhaka
Learn to embed and run language models natively in apps and websites for enhanced security and reduced latency,…
Building a Persistent Memory & Stateful Second Brain AI Agent
Dhaka
Learn how to build a stateful AI agent with persistent memory using Obsidian and context engineering. This talk…
How to build good skills for LLMs
Dhaka
Learn to build effective LLM skills that enhance capabilities without overwhelming context, preventing hallucinations. This talk shares practical…
AI in Education
Pune
The talk explains Sahay, an AI platform using RAG and LLMs to deliver personalized learning paths, career counseling,…