Members-Only
Recent Talks & Demos are for members only
You must be an AI Tinkerers active member to view these talks and demos.
OpenCode Local Models on DGX
See OpenCode agents run with local LLMs on a DGX Spark, comparing performance metrics against OpenRouter and Anthropic models.
I’ll run my OpenCode setup with multiple agents that will use LLMs from a locally hosted NVidia DGX Spark. I’ll have the DGX Spark with me.
- I’ll show the GPU usage and other metrics as the models are doing their job to show how fast/slow this setup it
- I’ll also run some of the agents on OpenRouter and Anthropic hosted LLM’s to show the difference in quality
- OpenCodeOpenCode is the open-source AI coding agent (CLI tool), integrating LLMs like GPT-5 and Claude Sonnet 4 directly into the terminal for fast, context-aware development.OpenCode is the open-source AI coding agent, built for terminal-first developers who demand speed and privacy. It connects your local files, Git history, and a choice of LLMs (e.g., OpenAI's GPT-5 Nano, Anthropic's Claude Sonnet 4) to execute complex tasks directly from the command line . The tool bypasses IDE and browser dependencies, allowing developers to triage issues, fix errors, or implement features with commands like `opencode fix error in main.go` . With over 26,000 GitHub stars by October 2025, OpenCode delivers a secure, context-aware coding partner that keeps your code local and your workflow efficient .
- NVIDIA DGX SparkThe desktop AI supercomputer: DGX Spark delivers 1 petaFLOP of FP4 performance via the GB10 Grace Blackwell Superchip.This is the DGX Spark: your personal AI supercomputer, built for serious local development. It packs the GB10 Grace Blackwell Superchip (20-core Arm CPU, Blackwell GPU) and 128GB of unified memory into a compact desktop form factor (1.2 kg). You can prototype, fine-tune, or inference models up to 200 billion parameters right at your desk. It ships ready with DGX OS and the full NVIDIA AI software stack (CUDA, TensorRT), ensuring a seamless path from local work to data center deployment.
- Olmo-3The Allen Institute for AI’s latest open-source language model featuring 440 billion tokens of training and full pipeline transparency.OLMo-2 (the architecture driving the Olmo-3 series) delivers a state-of-the-art open language model framework built by the Allen Institute for AI (AI2). This iteration prioritizes data integrity and reproducibility: providing the full training code, weights, and the Dolma dataset (3 trillion tokens). By utilizing a 7-billion parameter dense architecture, it matches or exceeds Llama 3 performance on benchmarks like MMLU and GSM8K while remaining entirely accessible for academic and commercial audit.
- GLM-4Zhipu AI’s flagship large language model featuring a 128k context window and performance metrics rivaling GPT-4.GLM-4 is Zhipu AI’s high-performance foundational model designed to compete directly with GPT-4. It handles a 128,000-token context window (enough for a 300-page document) and executes complex tasks via its All Tools framework: browsing, code execution, and image generation. Performance metrics on MMLU and GSM8K confirm its top-tier status in reasoning and mathematics. The ecosystem includes specialized versions like GLM-4-9B for edge deployment and the full-scale API for enterprise applications. It remains the leading choice for bilingual Chinese-English deployments requiring precision and scale.
- GPT-OSS:20bA 20-billion parameter open-source language model optimized for high-throughput inference and transparent architectural auditing.GPT-OSS:20b delivers a robust alternative to proprietary systems by utilizing a 20B parameter dense transformer architecture. It balances computational efficiency with deep reasoning capabilities (ideal for complex coding tasks and long-form content generation). Built on open datasets, this model allows developers to self-host on enterprise hardware like the NVIDIA A100 (80GB) while maintaining full control over data privacy and fine-tuning weights.
Related projects
Building a Product Powered by Local LLM's Only - Using LLM's in Cursor
Seattle
This talk covers building privacy-focused applications using locally hosted LLMs, challenges faced, creative solutions, and insights from a…
AI oncall engineer that actually works
Seattle
This talk demonstrates OncallNinja, an AI-powered oncall engineer that helps teams resolve software incidents efficiently, with insights from…
Building Computers for Agents
Seattle
Learn about a new microVM for safely sandboxing AI agents locally, mitigating risks from their potentially adverse actions…
AI speed dating
Seattle
Learn how an AI host uses QR‑linked WhatsApp, function calls, Airtable matching, and RAG to register participants, create…
Talk to your Obsidian Notebooks
Seattle
Demonstrates building a private Obsidian plugin that uses local LLMs to query personal notes, covering architecture, Claude integration,…
No-code front-ends with AI assisted backend
Seattle
This talk demonstrates building a no-code React frontend with a custom Python/FastAPI AI backend to create animal mashup…