Members-Only
Recent Talks & Demos are for members only
You must be an AI Tinkerers active member to view these talks and demos.
Social Engineering LLMs in Game
Explore how a realistic game uses LLM-driven NPCs to demonstrate social engineering attacks, guard‑rail bypasses, and practical strategies for LLM security.
Previously I wrote a free hacking game where you chat to NPCs to get password info, clues, etc … then I upgraded it to use an LLM for each NPC, and now you use social engineering techniques to trick the LLMs into giving you their passwords, their personal email addresses, etc.
Legacy GPT-4o worked great. GPT-5 caused me some problems … LLMs get ‘too clever’ and the NPC’s narrate how they’re “walking down the hallway to go talk to IT” when you’re trying to get them to give you a password (player pretending to be the IT dept).
(everything is fake - obviously! This is a game! NO real servers were harmed!)
- GPT-5OpenAI's GPT-5: The unified, multimodal foundation model delivering PhD-level reasoning and state-of-the-art coding performance.Copy that: GPT-5 is OpenAI's flagship multimodal model, launched August 7, 2025, as the successor to GPT-4. This is a major architectural shift: it unifies advanced reasoning capabilities (like the 'o-series' models) and rapid response times into a single system, eliminating the need for manual model switching (Source: OpenAI, August 2025). The model demonstrates state-of-the-art performance across technical benchmarks (math, programming, finance) and features a massive 272,000-token context window (Source: Jagran Josh, Voiceflow). Developers access it via the API in variants—including `gpt-5-mini` and `gpt-5-nano`—optimized for latency and cost trade-offs, making this frontier intelligence accessible across all ChatGPT tiers (Source: Botpress, Jagran Josh, Voiceflow).
- GPT-4xGPT-4x: OpenAI's flagship multimodal model, delivering human-level performance on professional benchmarks and processing text, audio, and vision in real-time.This is the GPT-4x system: a next-generation multimodal transformer from OpenAI. It sets a new standard for intelligence, achieving top-10% scores on simulated exams like the Bar Exam, a significant leap from GPT-3.5’s bottom-10% performance. The 'x' denotes its advanced capabilities, specifically its omni-model (GPT-4o) architecture, which processes and generates across text, audio, and image modalities with near-human latency (e.g., 320 milliseconds for audio response). We’ve engineered it for superior reliability, steerability, and complex instruction-following, making it the core engine for advanced AI applications and real-time conversational agents.
Related projects
Lessons from building an LLM-first framework
London
Explore how Tonk enables non‑coders to quickly update multiplayer applets by limiting context to the frontend and using…
Jailbreak RUN - How to gamify LLM security education
Poland
Participants collaboratively attempt to jailbreak an LLM‑powered bot using prompt engineering and RAG manipulation, revealing security flaws and…
Automating the art of deception for the greater good
Dublin
Learn how large language models can automatically generate and customize phishing emails and HTML templates at scale, without…
LLM powered email phishing simulation
Dublin
This talk demonstrates how LLM-powered phishing simulations generate personalized, scalable email lures using external APIs and structured, visual…
One Embedding to Rule Them All: Bending LLMs to Your Will
Dublin
Live demonstration of altering a single embedding vector to direct a language model's output without retraining, showing how…
It’s Just a Game… Until the AI Starts Asking Questions
Prague
Live demo of an experimental choice‑based game where you act as a newly trained LLM, navigating alignment tests,…