Ensemble LLM-as-a-Judge at Scale | San Francisco .

Members-Only

Recent Talks & Demos are for members only

Exclusive feed

You must be an AI Tinkerers active member to view these talks and demos.

October 31, 2025 · San Francisco

Ensemble LLM Judge Bias Reduction

Demonstrates ensemble LLM judging on ELI5 abstracts, using direct and pairwise evaluations across 1.6 M runs to cut bias, showing GPT‑OSS as top, strictest judge.

Overview
Links
Tech stack