Members-Only
Recent Talks & Demos are for members only
You must be an AI Tinkerers active member to view these talks and demos.
Cloudflare: Per-Student AI Agents
Learn how to build an AI tutoring platform with per-student stateful agents using Cloudflare Durable Objects and local SQLite, featuring a multi-model extraction pipeline and async batch processing.
ow we built mcq.sg - an AI tutoring platform where every student gets their own stateful agent.
The defining infrastructure choice: Cloudflare Durable Objects.
Each student has a dedicated DO with built-in SQLite storing mastery data, attempt history, and recommendation queue. The AI agent runs inside the DO - no centralized database, no bottleneck, no cold starts for returning students.
I’ll show the code for:
- Per-student DO with local SQLite schema
- Multi-model extraction pipeline (Moondream for detection, Claude for reasoning)
- Async batch processing for 50% cost savings on non-realtime tasks
Built by 2 people. Previously led engineering on ParkingSG and RedeemSG at Open Government Products.
Adaptive assessment tracks response patterns, generating AI insights to guide subsequent practice sessions.
- Cloudflare Durable ObjectsGlobally-unique, single-threaded serverless instances providing strongly consistent state for distributed applications.Durable Objects are unique Cloudflare Workers: they combine compute with transactional, strongly consistent storage. Each Object instance is globally-unique by ID, acting as a single-point-of-coordination for multiple clients, a critical feature for real-time applications like chat or collaborative editing. The platform guarantees only one instance exists worldwide for a given ID, ensuring strong consistency without the complexity of distributed locks. They scale elastically, support WebSockets and the Alarms API for scheduled tasks, and offer SQLite-backed storage options, effectively delivering stateful serverless at the edge.
- Claude APIAccess Anthropic's state-of-the-art Claude models (Opus, Sonnet, Haiku) via the RESTful Messages API, integrating advanced AI capabilities directly into your applications.The Claude API is Anthropic's direct developer interface for integrating their powerful large language models (LLMs) like Claude 3.5 Sonnet and Opus into production applications. It utilizes a robust Messages API for all conversational and generative interactions, supporting a massive 200,000-token context window for deep document analysis and sustained, complex reasoning. Developers leverage its Constitutional AI framework for built-in safety and utilize key features like Tool Use (function calling) and the Message Batches API for cost-efficient, high-volume processing. This is the direct, pay-as-you-go route for full feature control and cutting-edge model access.
- Moondream APIMoondream API delivers blazingly fast, multi-function visual intelligence: query images, detect objects, and generate captions with a single VLM.This is your direct access to a high-speed, multi-function Visual Language Model (VLM). The Moondream API provides five core endpoints: use `/query` for Visual Question Answering (VQA), `/detect` for bounding box identification, `/point` for precise object coordinates, `/caption` for descriptive summaries, and `/segment` for SVG path masks. Integration is straightforward: utilize the `https://api.moondream.ai/v1/` base URL, authenticate with your API key, and execute specific tasks like asking, 'What is in this image?' or identifying an object's location. Deploy via Moondream Cloud or run locally with Moondream Station for maximum flexibility.
- Cloudflare WorkersDeploy serverless code instantly on Cloudflare's global network, executing with zero cold starts via V8 isolates for ultra-low latency.Cloudflare Workers is your serverless compute platform: run JavaScript, TypeScript, or WebAssembly directly on Cloudflare's global edge network. Leveraging the V8 engine and its isolate architecture ensures near-instant startup (zero cold starts), delivering ultra-low latency to users in over 330 cities. Use Workers to deploy fast edge logic—from custom API gateways and request routing to modifying responses and running AI inference—without managing a single server.
- SQLiteSQLite is a C-language library: a self-contained, serverless, zero-configuration SQL database engine embedded directly into the application process.SQLite is the world's most deployed database engine, functioning as a compact, C-language library (under 900KiB with all features) that eliminates the need for a separate server process. It operates as a serverless, zero-configuration system, storing the entire database (up to 281 terabytes) in a single, cross-platform file. This architecture makes it ideal for countless applications: it is built into all major mobile phones, web browsers, and desktop operating systems. The engine guarantees high reliability, supporting full ACID transactions, and its source code is freely available in the public domain for any use.
- Durable ObjectsDurable Objects deliver globally unique, single-threaded compute instances with strongly consistent, transactional storage, enabling stateful serverless applications.Durable Objects (DOs) are a core primitive of the Cloudflare Workers platform: they combine compute with isolated, persistent storage. Each DO instance is globally addressable by a unique ID, guaranteeing that only one instance of the object executes at any given time, which completely eliminates race conditions and the need for distributed locks. This single-threaded Actor model simplifies building complex stateful systems. Use DOs for real-time coordination: think collaborative editing, multiplayer game sessions, or managing millions of individual user workspaces. The attached storage is fast, transactional, and strongly consistent, supporting up to 10 GB of data or a SQLite backend, all without managing any infrastructure.
- D1Cloudflare’s native serverless SQL database built on SQLite for global, low-latency data storage.D1 brings the reliability of SQLite to the Cloudflare edge, allowing developers to deploy relational databases alongside Workers without the overhead of traditional connection pooling. It handles the heavy lifting of global replication and point-in-time recovery (PITR) automatically. You get a 10GB storage limit per database on the standard plan and the ability to query data using familiar SQL syntax. By keeping data physically close to the user, D1 eliminates the typical round-trip latency found in centralized database architectures.
- R2Cloudflare R2: S3-compatible object storage that delivers zero egress fees and global performance.R2 is Cloudflare's global object storage solution, engineered to eliminate the costly egress fees common with providers like AWS S3. It offers an S3-compatible API, ensuring seamless integration with existing tools and workflows. Developers store unstructured data (e.g., web assets, data lakes) at a competitive rate: Standard Storage starts at $0.015/GB-month. R2 integrates natively with Cloudflare Workers, allowing you to build and deploy high-performance, edge-based applications across Cloudflare's network of 330+ data centers.
- QueuesDecouple microservices and scale asynchronously with high-throughput message buffers.Queues act as the shock absorbers of distributed systems: they ingest bursts of traffic and protect downstream services from overload. By using a producer-consumer pattern (like AWS SQS or RabbitMQ), you eliminate the need for immediate synchronous responses. This architecture ensures 99.99% availability even during database maintenance or API outages. Whether you are processing 1,000 image uploads per second or scheduling delayed email notifications, queues provide the persistence and retry logic necessary to prevent data loss.
- Browser RenderingThe critical process of converting HTML, CSS, and JavaScript into interactive visual pixels via the DOM, CSSOM, and layout engines.Browser rendering transforms raw code into a functional UI through a multi-step pipeline called the Critical Rendering Path. It starts with parsing HTML into the DOM and CSS into the CSSOM, merging them into a Render Tree to calculate exact geometry (Layout). The browser then executes Painting and Compositing to draw pixels on the screen, often targeting a 60fps refresh rate. Modern engines like Blink (Chrome) and Gecko (Firefox) optimize this by offloading heavy tasks to the GPU and using asynchronous decoding to prevent main-thread jank.
Related projects
Give Your Agent Keys, Not the Kingdom
Singapore
See a live demo of Smithery Connect, simplifying AI agent tool connections. Learn to set up secure, scoped…
Oracle & NVIDIA AI‑Q: A Blueprint for High‑Performance Research Automation
Singapore
See a deep-dive demo of an AI Research Assistant using Oracle Database 26ai and NVIDIA's AI stack for…
Multiplayer AI on the Edge
London
Learn how to build persistent, reactive AI assistants using Maggie Appleton’s AI Daemons design, deployed on Cloudflare Workers…
CodeDB: Building a Code Intelligence Server That Cuts Agent Token Usage
Singapore
See how CodeDB indexes codebases to provide structured answers, drastically reducing agent token usage and latency for efficient…
Building a Real-time Voice Agent with Cloudflare's Edge Stack
Montreal
See a live demo of a real-time sales coaching app using Durable Objects, Workers AI, Vectorize, and R2…
From Chatbots to Sandboxes: Why AI Needs an Execution Layer
Singapore
Learn how to move beyond AI demos to build systems that reliably execute real-world tasks, addressing limitations in…