Hyperspeed Inference: New AI Apps

Explore how 1000 tokens/s inference enables new AI app designs, like on-demand OS tools, discussing agentic AI and MCP.

Video

Overview

I will be expanding on a project I started a few months ago using the Cerebras inference engine, which offers speeds up to 1000 tokens/s, to demonstrate how AI models running at this rate requires new ways of thinking about AI app design and UX. The demo will show how an OS might work where AI tools are generated on demand.

Links

Tech stack