Google’s Gemma 4 goes truly open: agentic AI on your device under Apache 2.0
Google’s Gemma 4 goes truly open with Apache 2.0, bringing agentic, multimodal AI to phones, browsers, and PCs. Here’s what’s new and why it matters.
Image used for representation purposes only.
Google makes Gemma 4 official — and truly open
Google introduced Gemma 4 on April 2, 2026, calling it its “most capable” open-weight model family and, for the first time, releasing it under the permissive Apache 2.0 license. The lineup targets everything from phones and IoT boards to developer workstations, signaling a push to put agentic AI directly on user hardware rather than behind a cloud API. (blog.google )
Why it matters
The license change is as newsworthy as the models themselves. Earlier Gemma releases carried Google-specific terms that deterred some enterprises. Apache 2.0 removes that friction, enabling broad commercial use, redistribution, and modification, and aligning Gemma with the wider open-weight ecosystem. Independent coverage underscores the strategic shift: this isn’t a boutique research drop; it’s Google positioning Gemma as a first-class open platform. (opensource.googleblog.com )
The model family at a glance
Google is shipping four variants:
- E2B (Effective 2B) and E4B (Effective 4B) for mobile and edge devices, designed for low latency and local-first experiences.
- 26B Mixture-of-Experts (MoE) prioritizing speed by activating a small subset of experts per token.
- 31B dense for maximum raw quality and fine-tuning headroom.
All models are multimodal for images and video, with E2B/E4B adding native audio input. Context windows run up to 128K on the edge models and up to 256K on the larger checkpoints, a practical boost for repository-scale code and long-document work. Google also highlights native function calling and structured JSON output for building reliable tool-using agents. (blog.google )
Built for the edge — and Android gets a head start
Developers can try Gemma 4 on-device through the new Android AICore Developer Preview. Google says the on-device versions support more than 140 languages, offer industry-leading multimodal understanding, and deliver substantial efficiency gains (up to 4x faster and up to 60% less battery use versus prior on-device models). Code written for Gemma 4 today is forward-compatible with Gemini Nano 4 devices arriving later in 2026, easing the path from prototype to production. (android-developers.googleblog.com )
Tooling and performance: LiteRT‑LM, WebGPU, and a CLI
Beyond the models, Google is rolling out a practical developer stack. LiteRT‑LM, an open, production-ready inference framework for edge deployments, powers a cross-platform path to run Gemma 4 on Android, iOS, desktop, and even in-browser via WebGPU. A new LiteRT‑LM CLI and SDKs (Python, Kotlin, C++) simplify local experiments and embedded integrations. Google’s own benchmarks show E2B and E4B hitting responsive token rates on flagship phones and respectable throughput on Raspberry Pi 5, with further acceleration on Qualcomm NPUs. (developers.googleblog.com )
Quick start (CLI):
litert-lm run model.litertlm --prompt="Summarize this article in one paragraph."
Day‑one ecosystem support
Gemma 4 launches with broad framework and platform integration, including Hugging Face (Transformers and Transformers.js), vLLM, llama.cpp, MLX, Ollama, NVIDIA NIM/NeMo, LM Studio, and more. Weights are available via Hugging Face and Kaggle, with AI Studio access for the 31B and 26B models and AI Edge Gallery for E4B/E2B. Google emphasizes that the models complement its proprietary Gemini line, giving teams both open and hosted options within one ecosystem. (blog.google )
Benchmarks and “intelligence‑per‑parameter”
Google reports the 31B dense checkpoint currently ranks #3 among open models on Arena AI’s text leaderboard, with the 26B MoE at #6, noting performance that outpaces models many times larger. Early media coverage echoes the positioning and adds detail on the speed/quality trade-offs, especially the MoE’s fast token generation thanks to limited expert activation. As always, independent, task-specific evaluations will matter most for production picks, but the early signals are strong. (blog.google )
Where to get it — and how to run it
- Download weights: Hugging Face (Gemma 4 E2B/E4B Litert-LM packages and main model cards), Kaggle; Ollama recipes are available for quick local spins.
- Try on-device: Android AICore Developer Preview; iOS and Android showcase via the Google AI Edge Gallery app.
- Run in browser: WebGPU-backed demos and model files via LiteRT‑LM docs.
The Hugging Face model cards confirm Apache‑2.0 licensing and provide device-level performance and memory profiles for edge variants. (blog.google )
Security, safety, and sovereignty
Google says Gemma 4 follows the same infrastructure security rigor as its proprietary models, while the Apache 2.0 license and local‑first design address sovereignty and data‑residency needs. For regulated industries and public institutions, the ability to inspect, deploy, and adapt models without cloud lock‑in could be decisive. That said, teams should still layer evaluations, guardrails, and red-teaming over any foundation model before broad deployment. (blog.google )
The bottom line
Gemma 4 isn’t just a model refresh; it’s a distribution strategy. By pairing capable open weights with a straightforward license and a practical edge stack, Google is pushing agentic, multimodal AI into phones, browsers, and laptops — where latency, privacy, and cost control matter most. With AICore giving Android an immediate runway and day‑one support across popular tooling, expect rapid experimentation — and fast feedback cycles that will shape what “local AI” looks like through the rest of 2026. (android-developers.googleblog.com )
Related Posts
Google launches Gemma 4 under Apache 2.0: open, multimodal AI from phones to workstations
Google’s Gemma 4 lands under Apache 2.0 with edge-ready E2B/E4B and 26B/31B models, agentic features, and top Arena ranks—built to run on your hardware.
Flutter Push Notifications with Firebase Cloud Messaging (FCM): A Complete Setup Guide
Step-by-step guide to set up FCM push notifications in Flutter for Android and iOS, with code, permissions, background handling, and testing tips.
Google Play’s 2026 Overhaul: Lower Fees, Fortnite’s Return, and Stricter Sideloading Rules
Google Play’s 2026 overhaul: lower fees, Fortnite’s return, verified sideloading, and battery‑drain warnings—what it means for users and developers.