Technology
WebLLM
WebLLM is the high-performance, open-source LLM inference engine that runs large language models directly in your browser with WebGPU acceleration.
WebLLM is a high-performance, open-source inference engine from the MLC-AI team, designed to run Large Language Models (LLMs) entirely within the client's web browser. It leverages WebGPU for hardware acceleration and WebAssembly (WASM) for efficient CPU computations, delivering near-native performance without relying on a backend server. This architecture ensures enhanced user privacy (data stays local) and eliminates cloud API costs, while maintaining full compatibility with the OpenAI API for functionalities like streaming and JSON-mode generation.
Related technologies
Recent Talks & Demos
Showing 1-1 of 1