Members-Only
Recent Talks & Demos are for members only
You must be an AI Tinkerers active member to view these talks and demos.
vLLM: Guided Recommendations
Demonstrates building a recommendation engine with vLLM and an open‑weight model, using guided decoding to restrict outputs, all on Google Colab without finetuning.
I will show how to use vLLM and an openweight model to make a simple recommendation engine and use guided decoding to limit the output of the llm to the allowed items only. No finetuning needed and it will work on google colab so basically no hardware needed either. the code i shared will be a bit more, that is just a draft.
VLLM generates constrained JSON output using Pydantic schemas for information retrieval.