Deceiving LLMs in a videogame into surrendering passwords | London .

Members-Only

Recent Talks & Demos are for members only

Exclusive feed

You must be an AI Tinkerers active member to view these talks and demos.

October 29, 2025 · London

Social Engineering LLMs in Game

Explore how a realistic game uses LLM-driven NPCs to demonstrate social engineering attacks, guard‑rail bypasses, and practical strategies for LLM security.

Overview
Tech stack
  • GPT-5
    OpenAI's GPT-5: The unified, multimodal foundation model delivering PhD-level reasoning and state-of-the-art coding performance.
    Copy that: GPT-5 is OpenAI's flagship multimodal model, launched August 7, 2025, as the successor to GPT-4. This is a major architectural shift: it unifies advanced reasoning capabilities (like the 'o-series' models) and rapid response times into a single system, eliminating the need for manual model switching (Source: OpenAI, August 2025). The model demonstrates state-of-the-art performance across technical benchmarks (math, programming, finance) and features a massive 272,000-token context window (Source: Jagran Josh, Voiceflow). Developers access it via the API in variants—including `gpt-5-mini` and `gpt-5-nano`—optimized for latency and cost trade-offs, making this frontier intelligence accessible across all ChatGPT tiers (Source: Botpress, Jagran Josh, Voiceflow).
  • GPT-4x
    GPT-4x: OpenAI's flagship multimodal model, delivering human-level performance on professional benchmarks and processing text, audio, and vision in real-time.
    This is the GPT-4x system: a next-generation multimodal transformer from OpenAI. It sets a new standard for intelligence, achieving top-10% scores on simulated exams like the Bar Exam, a significant leap from GPT-3.5’s bottom-10% performance. The 'x' denotes its advanced capabilities, specifically its omni-model (GPT-4o) architecture, which processes and generates across text, audio, and image modalities with near-human latency (e.g., 320 milliseconds for audio response). We’ve engineered it for superior reliability, steerability, and complex instruction-following, making it the core engine for advanced AI applications and real-time conversational agents.

Related projects