Members-Only
Recent Talks & Demos are for members only
You must be an AI Tinkerers active member to view these talks and demos.
Agents Hill-Climbing with Data Science
Dozens of coding agents attempt to reverse engineer a Spotify color assignment algorithm within a custom environment, showcasing data science thinking for agent improvement.
In this project, I give dozens of coding agents access to an environment where they try to reverse engineer an algorithm created by Spotify that assigns a tasteful background color to an image of an album cover. Seemingly trivial, the correct solution requires a complex set of techniques, heuristics, and parameters. I let dozens of different models on different coding platforms loose on the task and have some interesting findings.
The environment includes some ‘training’ data and the model’s task is to code up a good solution. The agent has access to a set of scripts to run predictions, analyze results, and view failing samples individually. As a task, it tests an agent’s ability to ideate and analyze results over a long conversation.
The purpose of the talk is not just to show off this particular project, but to showcase how proper environment setup and ‘data science thinking’ can enable coding agents to hill-climb towards better solutions faster. These ideas are relevant far beyond clearly defined X -> Y tasks. I use similar techniques regularly when building and benchmarking agentic systems.
Rough demo plan:
- Intro to the task, set off a coding agent live to come up with a solution.
- Walk through the environment, show off some interesting results and analysis of previous runs.
- Talk about the general idea of building hill-climbing environments for LLMs.
Benchmarking agentic LLMs reversing Spotify's album color assignment via iterative optimization.
- Claude CodeAnthropic's agentic coding tool: Unleash Claude's raw power directly in your terminal or IDE to turn complex, hours-long workflows into a single command.Claude Code is Anthropic’s powerful agentic coding assistant, designed for high-velocity development. It operates natively within your terminal, IDE (VS Code, JetBrains), or via a web interface, allowing you to delegate complex tasks like feature building, bug fixing, and codebase navigation. The agent plans, edits files, executes commands, and creates commits, maintaining awareness of your entire project structure. Internally, Anthropic engineers using Claude Code reported a 67% increase in productivity, demonstrating its capacity to deliver significant gains for Pro and Max plan users.
- OpenCodeOpenCode is the open-source AI coding agent (CLI tool), integrating LLMs like GPT-5 and Claude Sonnet 4 directly into the terminal for fast, context-aware development.OpenCode is the open-source AI coding agent, built for terminal-first developers who demand speed and privacy. It connects your local files, Git history, and a choice of LLMs (e.g., OpenAI's GPT-5 Nano, Anthropic's Claude Sonnet 4) to execute complex tasks directly from the command line . The tool bypasses IDE and browser dependencies, allowing developers to triage issues, fix errors, or implement features with commands like `opencode fix error in main.go` . With over 26,000 GitHub stars by October 2025, OpenCode delivers a secure, context-aware coding partner that keeps your code local and your workflow efficient .
- Vercel AI GatewayThe Vercel AI Gateway provides a unified API endpoint for accessing over 100 large language models (LLMs), including top providers like OpenAI and AnthropicThe Vercel AI Gateway provides a unified API endpoint for accessing over 100 large language models (LLMs), including top providers like OpenAI and Anthropic. This centralized interface eliminates the overhead of managing multiple API keys and provider accounts, streamlining your production AI workloads. Key features include intelligent failover for increased uptime, built-in observability for detailed usage and cost tracking, and automatic prompt caching. Utilize the Bring Your Own Key (BYOK) option for a 0% markup on token pricing, ensuring cost-efficiency and maximum throughput.
- CursorThe AI-native code editor designed for high-velocity development through deep LLM integration.Cursor is a fork of VS Code that embeds AI directly into the development workflow while maintaining full extension compatibility. It leverages models like Claude 3.5 Sonnet and GPT-4o to power features such as Cmd+K for inline edits and Cmd+L for codebase-wide chat. By indexing local files, Cursor provides precise context for its predictive 'Tab' completions and multi-file 'Composer' mode. This setup allows engineers to move from high-level intent to functional code without leaving the editor or losing context.
- TypeScriptTypeScript is an open-source superset of JavaScript: it adds static typing and compiles to clean, standards-based JavaScript.TypeScript is a high-level, open-source language developed by Microsoft: it acts as a superset of JavaScript, adding a powerful static type system. This system enables compile-time type checking, catching errors before runtime (a critical benefit for large-scale applications). The TypeScript Compiler (TSC) reliably transpiles all code into clean, standards-based JavaScript (ES3 or newer), ensuring compatibility across any browser or host environment (Node.js, React.js, etc.).
Related projects
Our in-house analytics agent & Lessons learned from agent-to-agent communication
London
A walkthrough of building a self‑serve BI tool using Streamlit on EC2 with Claude‑generated queries, followed by real‑world…
Building Conversational AI Agents
London
Learn practical steps to design, develop, and deploy conversational AI agents, covering architecture, language models, training data, evaluation,…
AI Coding with repository context
London
Explore practical methods for fine‑tuning and running local AI models to generate code, apply edits, and adapt them…
Patch Party: Live-Fixing LLM Agents
London
This talk demonstrates a live feedback loop that detects and corrects agent failures in real time using a…
AI for Capital Markets, Agents, Evaluation
London
Exploring real‑deployment of generative models in finance, practical challenges, evaluation strategies including for LLM‑generated outputs, and agentic pipeline…
Accounting AI agent under the hood
London
Demo of an accounting AI agent showing bookkeeping workflow, chain-of-thought prompting, RAG with vector DB/embeddings, formal verification, and…