Technology

WebLLM

WebLLM is the high-performance, open-source LLM inference engine that runs large language models directly in your browser with WebGPU acceleration.

WebLLM is a high-performance, open-source inference engine from the MLC-AI team, designed to run Large Language Models (LLMs) entirely within the client's web browser. It leverages WebGPU for hardware acceleration and WebAssembly (WASM) for efficient CPU computations, delivering near-native performance without relying on a backend server. This architecture ensures enhanced user privacy (data stays local) and eliminates cloud API costs, while maintaining full compatibility with the OpenAI API for functionalities like streaming and JSON-mode generation.

https://mlc.ai/web-llm

1 project · 1 city

Related technologies

Chromium 2 WebGPU 6

Recent Talks & Demos

Showing 1-1 of 1

Members-Only

Private Browser Agents

Chicago Aug 14

WebLLM WebGPU