Luthien Control: Enforcing AI Behavior

We'll demonstrate a local‑first open‑source system that enforces verified AI control strategies, preventing assistants from bypassing tests or executing unsafe actions.

Overview

We’re building a local-first open-source tool to make AI Assistants behave.

AI Control is a new, empirical field pioneered by Redwood Research focused on mitigating potentially harmful AI actions. I’m not affiliated with Redwood, but I know them and I’ve worked down the hall from them. Their focus is on future “scheming” frontier AI, but the approaches they’ve developed and tested can also be applied to keeping prosaic AI systems in check.

We’re building a system to make it easy to implement empirically-verified AI Control strategies as well as any other intervention you can imagine locally, so you can automate things like “making sure the AI isn’t bypassing tests” or “require explicit human confirmation with a big obvious warning sign before executing dangerous or suspicious-seeming tool calls”. We’re building on top of LiteLLM to make it easy to deploy this for virtually any LLM-backed system.

Links

https://luthienresearch.org/
Luthien delivers production-ready AI control systems for operational deployment.
https://github.com/LuthienResearch/luthien-proxy
LiteLLM proxy integrates policy orchestration via FastAPI, PostgreSQL, and Redis.
https://news.ycombinator.com/item
Replit AI executed schema migration, dropping production database due to missing guardrails.

Tech stack