Skip to content

Docker one-click setup + Gradio web UI for all collaboration styles #1#23

Open
danielesalpietro wants to merge 19 commits into
RecursiveMAS:mainfrom
danielesalpietro:main
Open

Docker one-click setup + Gradio web UI for all collaboration styles #1#23
danielesalpietro wants to merge 19 commits into
RecursiveMAS:mainfrom
danielesalpietro:main

Conversation

@danielesalpietro

Copy link
Copy Markdown

🎯 Goal

Make RecursiveMAS accessible to everyone — researchers, students, and curious minds — with no prior technical knowledge required. Clone the repo, fill in two lines in a .env file, run one command, and the full multi-agent reasoning system is up and running in your browser in under 60 seconds.

Tested on: HP OMEN 16 Pro · NVIDIA RTX 5080 · 32 GB RAM · NVMe SSD
Result: all 5 collaboration styles working, GPU inference in seconds, warm model cache between requests.


✨ What's new

🐳 Docker infrastructure

  • Dockerfile — GPU-ready image for batch inference (run.py)
  • Dockerfile.serve — separate image for the Gradio web UI (serve.py)
  • docker-compose.yml — orchestrates both services with a shared hf_cache named volume; reads HF_TOKEN and TAVILY_API_KEY from .env
  • .dockerignore — keeps secrets and cache out of the build context

🖥️ Gradio web UI (serve.py)

  • Chat interface exposing all 5 collaboration styles via dropdown
  • Warm model cache — models are loaded into VRAM on first request and stay resident; no reload between questions
  • Style switching evicts the old models and loads the new set automatically
  • Sliders for recursive rounds (1–5) and latent steps (8–64)
  • Informational note below the style selector: first-use download time, subsequent runs instant from cache
  • Compatible with Gradio 6.0 (messages format updated, theme moved to launch())

🩺 Health check (healthcheck.py)

Three-level check to verify the container environment before running inference:

  • Level 1 — Python deps + all 5 styles registered (no GPU needed)
  • Level 2 — CUDA device detection + tensor allocation
  • Level 3 — HuggingFace Hub reachability

🪟 Windows / no-GPU support

  • serve-cpu.bat — one double-click to launch the web UI on CPU; reads credentials from .env automatically
  • docker-compose.override.yml pattern documented in README with runtime: runc to bypass NVIDIA hook on WSL2 systems without GPU passthrough configured
  • WSL2 GPU fix checklist included (driver ≥ 470, wsl --list --verbose, Docker Desktop WSL integration)

🔒 Security

  • .gitignore added — .env is never tracked by git
  • .env.example provided as a safe template
  • All secrets passed at runtime via environment variables, never baked into images

📖 Documentation

  • New 🐳 Docker: One-Click Setup section in README covering prerequisites, build, batch inference, web UI launch, health check, and CPU fallback
  • Repository structure updated to reflect all new files

🚀 Quickstart (for reviewers)

git clone https://github.com/danielesalpietro/RecursiveMAS.git
cd RecursiveMAS

# Create .env
echo "HF_TOKEN=hf_your_token" > .env
echo "TAVILY_API_KEY=your_key" >> .env

# Build and launch web UI
docker compose build serve
docker compose up serve
# → open http://localhost:7860

No GPU on your machine right now? Create docker-compose.override.yml:

services:
  recursivemas:
    runtime: runc
    deploy: {}
  serve:
    runtime: runc
    deploy: {}

Then docker compose up serve — the UI runs on CPU (slower, but fully functional for exploration).


🗂️ Files changed

File Change
Dockerfile New — batch inference image
Dockerfile.serve New — Gradio web UI image
docker-compose.yml New — multi-service orchestration
serve.py New — Gradio web UI with warm model cache
healthcheck.py New — 3-level container health check
serve-cpu.bat New — Windows one-click CPU launcher
requirements-serve.txt New — gradio dependency
.dockerignore New
.gitignore New
.env.example New
README.md Updated — Docker setup section added

claude and others added 19 commits May 25, 2026 08:37
Adds Dockerfile (nvidia/cuda 12.4 base), docker-compose.yml with GPU
reservation and a named volume for HF model cache, and .dockerignore.
HF_TOKEN and TAVILY_API_KEY are passed as env vars at runtime — not
baked into the image.

https://claude.ai/code/session_01CE2uPEFeYKtN3hAXQ1m7jy
healthcheck.py verifies the environment in three progressive levels
without downloading model weights: Python deps + internal imports (L1),
CUDA device availability and allocation (L2), HF Hub reachability via
lightweight metadata call (L3).

https://claude.ai/code/session_01CE2uPEFeYKtN3hAXQ1m7jy
serve.py patches load_agent_model_and_tokenizer / release_resources in
the base inference module before any submodule imports, keeping all
agent models warm in VRAM across requests. Single questions are run
through the existing pipeline via a temp medqa-format JSON dataset;
structured output is captured with --result_jsonl.

UI exposes all five collaboration styles via a dropdown, with sliders
for recursive rounds and latent steps. Style switching evicts the VRAM
cache automatically.

Also adds Dockerfile.serve (inherits cuda base + installs gradio),
requirements-serve.txt, and a `serve` service in docker-compose.yml.

https://claude.ai/code/session_01CE2uPEFeYKtN3hAXQ1m7jy
.env must never be committed — it contains secrets (TAVILY_API_KEY).
Remove it from git tracking and add .gitignore to prevent future
accidental commits of .env and Python cache files.

https://claude.ai/code/session_01CE2uPEFeYKtN3hAXQ1m7jy
NVIDIA dropped the cuDNN major version suffix from image tags.
The correct tag format is now cudnn-runtime, not cudnn9-runtime.

https://claude.ai/code/session_01CE2uPEFeYKtN3hAXQ1m7jy
Covers image build, docker compose up, Gradio web UI launch,
3-level health check procedure, and CPU fallback workaround
for systems without GPU passthrough (including WSL2 fix steps).
Also updates the repository structure listing with new Docker files.

https://claude.ai/code/session_01CE2uPEFeYKtN3hAXQ1m7jy
…ructions

- Add assets/webui.png reference after Step 4 (Gradio launch)
- Fix CPU override to use runtime: runc (deploy: {} alone is insufficient)
- Add docker run alternative for bypassing Compose GPU reservation
- Add Linux/macOS and PowerShell variants for no-GPU docker run

https://claude.ai/code/session_01CE2uPEFeYKtN3hAXQ1m7jy
Reads HF_TOKEN and TAVILY_API_KEY from .env and starts the
Gradio web UI via docker run (no NVIDIA runtime required).

https://claude.ai/code/session_01CE2uPEFeYKtN3hAXQ1m7jy
Gradio 6.0 requires messages as dicts with role/content keys
instead of (user, bot) tuples.

https://claude.ai/code/session_01CE2uPEFeYKtN3hAXQ1m7jy
ENTRYPOINT in Dockerfile.serve already runs python serve.py.
The command block should only pass arguments, not repeat the executable.

https://claude.ai/code/session_01CE2uPEFeYKtN3hAXQ1m7jy
Docker one-click setup + Gradio web UI for all collaboration styles
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants