Members-Only
Recent Talks & Demos are for members only
You must be an AI Tinkerers active member to view these talks and demos.
LLM Probing for Immediate Inference
This talk details using novel LLM heads to measure internal states, accelerating inference by bypassing token generation, a serious technique for performance improvement.
‘Training’ is an ongoing challenge but ‘Inference’ will be the dominant performance challenge of AI going forward, signalled by Groq acquisition by Nvidia. Autogeneration is slow and expensive, and it’s now the dominant ‘bottleneck’. ‘Probing’ or adding novel architectures onto LLMs (Heads) can accelerated inference by measuring the ‘state’ of an LLM side-stepping the requirement to generate tokens.
I don’t have a super fancy presentation or clean GitHub yet, it’s just lab notes and demo.
It actually works, this is serious, not just a toy.