Google has officially launched the preview of Gemma 3n, its most advanced lightweight AI model optimized for mobile performance and on-device experiences.
Engineered with speed, efficiency, and privacy in mind, Gemma 3n represents a major leap forward in the evolution of mobile AI. It’s designed to run seamlessly on phones, tablets, and laptops, giving developers the tools to build real-time, multimodal applications that respect user privacy and work even without internet access.
Built for the Future of On-Device AI
Gemma 3n is the first model built on Google’s new shared architecture, co-developed alongside major mobile chipmakers like Qualcomm, MediaTek, and Samsung System LSI. This architecture powers both Gemma 3n and the upcoming evolution of Gemini Nano, streamlining AI integration across Android and Chrome platforms.
Key Capabilities That Set Gemma 3n Apart
- Blazing Fast Performance: Gemma 3n responds approximately 1.5x faster than its predecessor (Gemma 3 4B) with lower memory usage, thanks to innovations like Per-Layer Embeddings (PLE) and activation quantization.
- Dynamic Flexibility: Featuring a unique “many-in-1” design, the model includes a 4B and embedded 2B submodel, allowing developers to balance performance and resource usage dynamically. Mix-and-match capability enables custom submodels tailored to specific latency and quality needs.
- Multimodal Intelligence: Gemma 3n supports audio, text, image, and video processing natively. Its advanced audio understanding enables high-quality speech-to-text transcription and real-time translation.
- Enhanced Multilingual Support: The model shows strong results across multiple languages, including Japanese, Korean, German, Spanish, and French, scoring 50.1% on WMT24++ benchmarks.
- Privacy-First Design: Since it runs locally, Gemma 3n ensures user data doesn’t leave the device, enabling secure and offline-capable experiences.
Real-World Applications and Use Cases
By using Gemma 3n, developers can unlock new types of intelligent, interactive applications:
- Create live, context-aware apps that interpret visual and audio inputs in real time.
- Enable richer multimodal conversations by combining audio, text, image, and video inputs.
- Power voice-driven interfaces, real-time transcription tools, translation apps, and more.
These capabilities empower developers to push the boundaries of what’s possible in AI-powered mobile experiences.
Run Larger Models with Smaller Footprints
Despite Gemma 3n’s 5B and 8B parameter sizes, the use of PLE allows it to operate with the memory load of smaller 2B and 4B models. This means developers can deploy robust AI functionality on devices with just 2GB–3GB of RAM. Learn more about this in the official documentation.
Accessing the Preview: Get Started Today
Google is offering two entry points to explore Gemma 3n starting today:
- Google AI Studio: Use Gemma 3n right from your browser with no setup required via Google AI Studio.
- Google AI Edge: Build locally on your device with tools provided by Google AI Edge, supporting multimodal input and output.
Responsible AI Development
Gemma 3n, like all of Google’s open models, went through extensive safety evaluations, data governance checks, and alignment tuning. The company remains committed to developing AI responsibly and refining its approach as the technology matures.
For deeper insight on the security and safety enhancements around this evolving AI ecosystem, you might also be interested in how Google DeepMind is reinforcing Gemini against emerging AI threats.
Start Building with Gemma 3n
This release marks a pivotal step toward democratizing powerful AI—and putting it directly into the hands of developers, right on their devices.
Explore more about this announcement and other Google I/O 2025 updates by visiting io.google.