Members-Only
Recent Talks & Demos are for members only
You must be an AI Tinkerers active member to view these talks and demos.
Penelope: Agentic LLM Testing Orchestrator
Explore Penelope, an agentic orchestrator coordinating multi-step LLM tests. Learn how it manages workflows, model calls via LiteLLM, and structured evaluation routines.
This talk covers the architecture and implementation of Penelope, the agentic orchestrator used in the Rhesis framework for testing LLM applications. Penelope acts as a control agent that coordinates multi-step test executions, model calls, and evaluation routines. The session will explain how Penelope manages test definitions, executes adaptive workflows, and interacts with model endpoints via LiteLLM. I will discuss how evaluation tasks are modeled as agent goals, how results are captured in structured form, and how the system supports reproducible multi-turn tests. We will also look at the interface between the orchestration layer and the evaluation layer, including how LLMs are used to generate test cases, expected behaviors, and automatic scoring prompts.
Rhesis is an open-source platform generating comprehensive, automated Gen AI test scenarios using LLMs.
Penelope autonomously executes multi-turn, LLM-driven tests against conversational AI endpoints.