The newest version of Google's Gemma supports vision-language inputs and text outputs, handles context windows up to 128k tokens, and understands more than 140 languages. This update is significant because it allows developers to analyze images, answer questions about images, identify objects, and perform other tasks that involve analyzing and understanding visual data.

Improvements in Gemma 3

Gemma 3 also significantly improves math, coding, and instruction following capabilities. This is a game-changer for developers who want to create more complex and interactive applications. The model comes in four sizes - 1B, 4B, 12B, and 27B - and can be deployed using various options such as Cloud Run and Google GenAI API.

Developers will appreciate the ease of integration with their existing workflows, thanks to the support for popular deep learning frameworks like TensorFlow and PyTorch. The revamped code base is designed to optimize performance on a range of hardware configurations, from cloud-based services to local workstations.

Gemma 3 features a revamped code base with recipes for inference and fine-tuning. The model weights can be downloaded from Kaggle and Hugging Face, and Nvidia has direct support for Gemma 3 models for maximum performance on GPUs of any size. Additionally, the model's architecture is highly customizable, allowing developers to adapt it to their specific use cases.

Key Use Cases

The model's ability to handle large context windows makes it an ideal choice for applications that require a deep understanding of complex data, such as sentiment analysis and language translation. With Gemma 3, developers can create more accurate and efficient models that drive real business value.