🧪 70-test, 6-model AI benchmark: Gemma 4 vs Gemini Pro vs Flash vs Qwen. 420 verified runs across 13 categories. All prompts, rubrics, runner code & raw results included. Code executed, constraints verified, prompt injection confirmed on Vertex AI Studio.
benchmark machine-learning deep-learning artificial-intelligence gemini gemma model-evaluation ai-safety google-ai multilingual-nlp ai-evaluation llm prompt-injection qwen open-source-ai gemini-pro google-gemma llm-benchmark gemma4 eval-kit
-
Updated
Apr 3, 2026 - Python