No servers, no API keys, no data leaves your device. Powered by WebLLM, Transformers.js, and MediaPipe — everything runs locally on your hardware.
Models are compiled ahead-of-time using Apache TVM / MLC (Machine Learning Compilation). The compiler transforms model weights and operations into optimized WebGPU compute shaders that run directly on your GPU.
Used by
SmolLM2 360M, SmolLM2 1.7B, Qwen3 4B, Phi-3.5 Mini, Llama 3.2 1B
Models are stored in the standard ONNX (Open Neural Network Exchange) format. ONNX Runtime Web interprets the model graph at load time and executes it on your GPU via WebGPU, or falls back to WebAssembly on unsupported hardware.
Used by
Qwen3.5 0.8B, Qwen3.5 2B, Qwen3.5 4B
Google's MediaPipe LLM Inference API loads Gemma models in the LiteRT format (formerly TFLite). Supports multimodal input — text and images — all processed on-device via WebGPU.
Used by
Gemma 3n E2B, Gemma 3n E4B