Members-Only
Recent Talks & Demos are for members only
You must be an AI Tinkerers active member to view these talks and demos.
fastworkflow: SOTA with Small Models
See live demos showing fastworkflow outperforming large models on Tau Bench using 7B and 14B models, plus a roadmap.
During the last talk a few months ago, we demo’d fastworkflow results on Tau Bench retail and airline benchmarks comparable to GPT4o. With our latest refinements, fastworkflow beats ALL models including Sonnet 4.5! And it does this using models as small as 7B and 14B parameters. I will demonstrate the results and talk about the refinements that made this possible as well as our roadmap for the next 6 months
fastworkflow builds deterministic, fault-tolerant, conversational AI workflows using Python, DSPy, and LLMs.
- GPT-4GPT-4 is OpenAI’s large multimodal model: it processes both text and image inputs, delivering human-level performance on complex professional and academic benchmarks.This is OpenAI’s latest milestone in scaling deep learning: a large multimodal model accepting both text and image inputs. It demonstrates a significant capability leap over its predecessor, scoring in the top 10% on a simulated bar exam (GPT-3.5 scored in the bottom 10%). The model handles nuanced instructions and long-form content, supporting context windows up to 32,768 tokens (32K model). This capacity allows processing up to 25,000 words in a single, complex prompt. GPT-4 is engineered for enhanced reliability, steerability, and advanced reasoning across diverse tasks.
- Claude-3Claude-3 is Anthropic's state-of-the-art multimodal model family (Opus, Sonnet, Haiku), setting new industry benchmarks for intelligence, speed, and vision capabilities.Claude-3, developed by Anthropic, is a powerful family of three generative AI models: Opus, Sonnet, and Haiku. Opus, the flagship, excels in complex reasoning, outperforming peers on key benchmarks (MMLU, GPQA) and supporting a 200,000-token context window. Sonnet offers an optimal balance for enterprise workloads, delivering performance that is 2x faster than its predecessor, Claude 2.1. Haiku is the fastest and most cost-effective option, capable of processing a 10,000-token research paper (including charts) in under three seconds. All three models are multimodal, featuring strong vision capabilities for analyzing charts, diagrams, and PDFs alongside text, enabling advanced data extraction and analysis.
- Llama-2Llama 2 is Meta AI's powerful, openly accessible family of large language models (LLMs), featuring models from 7B to 70B parameters for research and commercial applications.Llama 2 is Meta AI's next-generation LLM family, released for free research and commercial use. The collection includes both pre-trained foundation models and instruction-tuned 'Chat' variants, scaling from 7 billion (7B) up to 70 billion (70B) parameters. Key technical upgrades over Llama 1 involve training on 2 trillion tokens (40% more data) and doubling the context length to 4096 tokens. The Llama-2-chat models were rigorously aligned using Reinforcement Learning from Human Feedback (RLHF), positioning them as a top-tier, openly available option for developers building advanced generative AI solutions.
- LangChainThe open-source framework for building and deploying reliable, data-aware Large Language Model (LLM) applications.LangChain is the essential framework for engineering LLM-powered applications: it simplifies connecting models (like GPT-4 or Claude) to external data, computation, and APIs. The platform provides a modular set of components—Chains, Agents, Tools, and Memory—allowing developers to quickly build complex workflows like Retrieval-Augmented Generation (RAG) pipelines and sophisticated conversational agents. Its Python and JavaScript libraries, combined with LangChain Expression Language (LCEL), offer a standardized interface for rapid prototyping and moving applications to production with confidence.
- TransformersThe deep learning architecture that revolutionized sequence modeling (NLP, vision) by replacing recurrent units with a parallelizable multi-head self-attention mechanism.The Transformer: a neural network architecture introduced in the landmark 2017 paper, "Attention Is All You Need." It eliminated the sequential processing bottleneck of prior Recurrent Neural Networks (RNNs) by relying solely on self-attention, enabling massive parallelization and significantly faster training (up to 10x faster) on modern hardware. This efficiency allowed for the creation of large-scale pre-trained models: BERT (encoder-only) and the generative GPT series (decoder-only). The architecture is now foundational to all modern Large Language Models (LLMs) and drives the current state-of-the-art in AI.
Related projects
Full Mamba (SSM) with Agent Attention and Fast Feed Forward Sparse Activations
Los Angeles
Explore a custom Mamba implementation integrating agent attention and sparse feed-forward activations, demonstrating faster language modeling and promising…
Groq at the speed of light
Houston
Explore Groq's LPU™ technology for fast, affordable AI inference. See a voice-powered customer service "phone tree" proof of…
No-code front-ends with AI assisted backend
Seattle
This talk demonstrates building a no-code React frontend with a custom Python/FastAPI AI backend to create animal mashup…
Extracting RFC 5545 RRULE Compliant Schedule Data in valid JSON with only 0.6B Parameters
Seattle
This talk demonstrates using the small Osmosis 0.6B model to extract complex, unstructured RFC 5545 schedule data into…
AI speed dating
Seattle
Learn how an AI host uses QR‑linked WhatsApp, function calls, Airtable matching, and RAG to register participants, create…
DeepSeek-ing a Needle in a Haystack
Toronto
Learn how to use DeepSeek R1 agentic workflows and temporal prompting to filter, rank, and retrieve the most relevant…