Beyond “It Feels Better”: A Reproducible Playbook for Evaluating Fine-Tuned LLMs | Toronto .

Members-Only

Recent Talks & Demos are for members only

Exclusive feed

You must be an AI Tinkerers active member to view these talks and demos.

November 10, 2025 · Toronto

Instruct Lab LLM Evaluation Playbook

A reproducible workflow creates synthetic CS data, fine‑tunes LLMs, and evaluates models with perplexity, token‑level PRF/F1, exact match, SBERT similarity, and length diagnostics.

Overview
Tech stack