ThinkHere — In-Browser AI

ThinkHere

Run AI entirely in your browser

No servers, no API keys, no data leaves your device. Powered by WebLLM, Transformers.js, and MediaPipe — everything runs locally on your hardware.

First load downloads model weights to your browser — this is a one-time download. After that, the model loads from cache in seconds.

WebLLM and MediaPipe models require WebGPU (Chrome 113+, Edge 113+). Transformers.js models use ONNX Runtime Web. All need network access to huggingface.co

How do these models run in your browser? ↓

Three paths to in-browser AI

WebLLM + MLC

HuggingFaceMLC-format weights

TVM / MLC CompilerAhead-of-time
                  compilation

WebGPU Compute ShadersPre-optimized GPU kernels

Your GPU

Models are compiled ahead-of-time using Apache TVM / MLC (Machine Learning Compilation). The compiler transforms model weights and operations into optimized WebGPU compute shaders that run directly on your GPU.

Trade-offs

Fast inference — kernels are pre-optimized
First run compiles shaders for your GPU (cached after)
Model must be specifically compiled for MLC

Used by

SmolLM2 360M, SmolLM2 1.7B, Qwen3 4B, Phi-3.5 Mini, Llama 3.2 1B

Transformers.js + ONNX Runtime Web

HuggingFaceONNX-format model graph + weights

ONNX Runtime WebBuilds execution plan at load
                  time

WebGPU

WASM fallback

Your GPU / CPU

Models are stored in the standard ONNX (Open Neural Network Exchange) format. ONNX Runtime Web interprets the model graph at load time and executes it on your GPU via WebGPU, or falls back to WebAssembly on unsupported hardware.

Trade-offs

Supports any model exportable to ONNX
Can fall back to WASM if WebGPU is unavailable
Slightly more overhead than pre-compiled kernels

Used by

Qwen3.5 0.8B, Qwen3.5 2B, Qwen3.5 4B

MediaPipe + LiteRT

HuggingFaceLiteRT model file (.litertlm)

MediaPipe GenAILLM Inference API

WebGPU ComputeMultimodal: text + images

Your GPU

Google's MediaPipe LLM Inference API loads Gemma models in the LiteRT format (formerly TFLite). Supports multimodal input — text and images — all processed on-device via WebGPU.

Trade-offs

Multimodal: text and image input
Single large file download (no split shards)
Requires WebGPU — no WASM fallback

Used by

Gemma 3n E2B, Gemma 3n E4B

All three methods use WebGPU for GPU acceleration. All model weights are cached in your browser after the first download — no server involved.

—

Initializing engine…

Download

Compile

Ready

0% — 0s elapsed

Downloading model weights — this only happens once, then it's cached locally.

System Prompt

Generation

Temperature 0.7

Top-P 0.9

Max Tokens 1024

Knowledge Base

Embedding model not loaded

No documents added

Drop file to add as context

Model loaded · all processing happens here

—

0 tokens

Ready

Run AI entirely in your browser

Three paths to in-browser AI

WebLLM + MLC

Transformers.js + ONNX Runtime Web

MediaPipe + LiteRT

System Prompt

Generation

Knowledge Base

Conversations