Predicting Human Decisions in Agentic Workflows | Toronto .

Members-Only

Recent Talks & Demos are for members only

Exclusive feed

You must be an AI Tinkerers active member to view these talks and demos.

November 10, 2025 · Toronto

Centaur: Agentic Decision Prediction

This talk presents an open-source framework for integrating Centaur's human decision prediction model into agentic workflows to reduce reviewer cognitive load.

Overview
Tech stack
  • Centaur LLM
    Centaur is the Llama 3-based LLM engineered for cognitive fidelity, simulating human biases and decision-making with high accuracy (r=0.86) for *in silico* social science.
    Centaur is a specialized Large Language Model (LLM) developed by researchers from Princeton and Google DeepMind; its core mission is cognitive fidelity, not factual accuracy. The architecture is built on Meta’s Llama 3-8B-Instruct and fine-tuned on the PEER/Psych-101 dataset: over 10 million human responses from 162 classic psychological experiments. This training allows Centaur to replicate the distribution of human answers, including known cognitive biases like the 'conjunction fallacy.' The model demonstrates high predictive power, achieving a correlation of $r=0.86$ with human response patterns on unseen tests. It functions as a critical new instrument for social scientists, enabling rapid *in silico* experimentation on human irrationality.
  • Psych 101 Dataset
    Psych-101 is the massive dataset of human psychological experiment transcripts, detailing over 10 million trial-by-trial choices for cognitive modeling.
    This is the 'Psych-101' dataset: a comprehensive collection of natural language transcripts from human psychological experiments. It aggregates trial-by-trial data from 160 distinct experiments, capturing the decisions of 60,092 participants. The dataset documents an impressive 10,681,650 individual choices, providing a critical resource for researchers. Use this data to train foundation models, understand complex human decision-making processes, and benchmark new cognitive architectures against real-world human behavior.
  • Claude API
    Access Anthropic's state-of-the-art Claude models (Opus, Sonnet, Haiku) via the RESTful Messages API, integrating advanced AI capabilities directly into your applications.
    The Claude API is Anthropic's direct developer interface for integrating their powerful large language models (LLMs) like Claude 3.5 Sonnet and Opus into production applications. It utilizes a robust Messages API for all conversational and generative interactions, supporting a massive 200,000-token context window for deep document analysis and sustained, complex reasoning. Developers leverage its Constitutional AI framework for built-in safety and utilize key features like Tool Use (function calling) and the Message Batches API for cost-efficient, high-volume processing. This is the direct, pay-as-you-go route for full feature control and cutting-edge model access.
  • LangChain
    The open-source framework for building and deploying reliable, data-aware Large Language Model (LLM) applications.
    LangChain is the essential framework for engineering LLM-powered applications: it simplifies connecting models (like GPT-4 or Claude) to external data, computation, and APIs. The platform provides a modular set of components—Chains, Agents, Tools, and Memory—allowing developers to quickly build complex workflows like Retrieval-Augmented Generation (RAG) pipelines and sophisticated conversational agents. Its Python and JavaScript libraries, combined with LangChain Expression Language (LCEL), offer a standardized interface for rapid prototyping and moving applications to production with confidence.
  • Replit
    Replit is the AI-powered, cloud-based development environment: go from natural language idea to deployed full-stack application in minutes, with zero setup.
    Replit is the definitive cloud-based development environment (CDE), enabling developers and teams to bypass complex local setup entirely. It supports hundreds of languages (e.g., Python, Node.js, C++) and features real-time, Google Docs-style collaboration for seamless pair programming. The core differentiator is the integrated Replit Agent: an AI developer that scaffolds, codes, and debugs full-stack applications from natural language prompts, accelerating the 'idea-to-app' cycle to minutes. Projects benefit from built-in version control (Git/GitHub integration) and one-click deployment to production, often leveraging Google Cloud infrastructure.

Related projects