Introducing Gemini: The largest and most capable AI model.
Gemini is a significant leap forward in AI's ability to improve our daily lives.
Gemini is a product of a massive collaborative effort by teams at Google. It was designed to be multimodal, which means it can easily understand, operate across, and combine various types of information such as text, code, audio, image, and video.
Gemini was built from scratch with the goal of being able to generalize and seamlessly integrate different forms of data.
Gemini is also our most flexible model yet — able to run efficiently on everything from data centers to mobile devices. According to Google, its state-of-the-art capabilities will significantly enhance how developers and enterprise customers build and scale with AI.
Google optimized Gemini 1.0, our first version, for three different sizes:
- Gemini Ultra — our largest and most capable model for highly complex tasks.
- Gemini Pro — our best model for scaling across a wide range of tasks.
- Gemini Nano — our most efficient model for on-device tasks.
How does it differ from ChatGPT?
TEXT | ||||
Capability | Benchmark Higher is better | Description | Gemini Ultra | GPT-4 API numbers calculated where reported numbers were missing |
General | MMLU | Representation of questions in 57 subjects (incl. STEM, humanities, and others) | 90.0% COT @32* | 86.4% 5-shot* (reported) |
Reasoning | Big-Bench Hard | Diverse set of challenging tasks requiring multi-step reasoning | 83.6% 3-shot | 83.1% |
DROP | Reading comprehension (F1 Score) | 82.4 Variable shots | 80.9 3-shot (reported) | |
HellaSwag | Commonsense reasoning for everyday tasks | 87.8% 10-shot* | 95.3% (reported) | |
Math | GSM8K | Basic arithmetic manipulations (incl. Grade School math problems) | 94.4% maj@32 | 92.0% |
MATH | Challenging math problems (incl. algebra, geometry, pre-calculus, and others) | 53.2% 4-shot | 52.9% 4-shot (API) | |
Code | HumanEval | Python code generation | 74.4% 0-shot (IT)* | 67.0% O-shot* (reported) |
Natural2Code | Python code generation. New held out dataset HumanEval-like, not leaked on the web | 74.9% 0-shot | 73.9% 0-shot (AP |
MULTIMODAL | ||||
Capability | Benchmark | Description Higher is better unless otherwise noted | Gemini | GPT-4V Previous SOTA model listed when capability is not supported in GPT-4V |
Image | MMMU | Multi-discipline college-level reasoning problems | 59.4% Cen uita (pral ony) | 56.8% Optorpassen |
VQAV2 | Natural image understanding | 77.8% Genitura (piel only) | 77.2% | |
TextVQA | OCR on natural images | 82.3% Comi ur (pic only) | 78.0% Catra | |
DOCVQA | Document understanding | 90.9% comi ura (piel only) | 88.4% 0-shot GPT-AV (pixel only) | |
Infographic VQA Infographic understanding | 80.3% Genitura (piel only) | 75.1% O-shot GPT-4V (pixel only) | ||
MathVista | Mathematical reasoning in visual contexts | 53.0% O-shot Gemini Ultra (pixel only*) | 49.9% O-shot GPT-AV | |
Video | VATEX | English video captioning (CIDEr) | 62.7 4-shot Gemini Ultra | 56.0 4-shot DeepMind Flamingo |
Perception Test MCQA | Video question answering | 54.7% O-shot Gemini Ultra | 46.3% 0-shot SeviLA | |
Audio | COVOST 2 (21 languages) | Automatic speech translation (BLEU score) | 40.1 Gemini Pro | 29.1 Whisper v2 |
FLEURS (62 languages) | Automatic speech recognition (based on word error rate, lower is better) | 7.6% Gemini Pro | 17.6% |
Next-generation capabilities
The multimodal models were created by training separate components for different data types, such as text or images, and then combining them to perform certain tasks. However, these models often struggle with complex reasoning and conceptual understanding despite being good at describing images or other more straightforward tasks.
We designed Gemini to be natively multimodal and pre-trained from the start on different modalities. Then we fine-tuned it with additional multimodal data to further refine its effectiveness, according to Google blog. This allows Gemini to understand and reason about various inputs, surpassing existing multimodal models. Its capabilities are state-of-the-art across domains.
So What's New in Google Gemini AI-model?
Gemini is an AI model that has achieved better results than human experts on the MMLU (Massive Multitask Language Understanding), a widely used method for testing AI models' knowledge and problem-solving abilities.
Understanding text, images, audio and more
Gemini 1.0 was created to understand text, images, audio, and more. This helps it to comprehend complicated topics and answer questions with ease. It's especially good at breaking down math and physics concepts.
Advanced coding
Gemini is our first version to write high-quality code in the most popular programming languages, such as Python, Java, C++, and Go. It is one of the leading foundational models for coding worldwide because it can work across languages and analyze complex information.
Gemini Ultra performs exceptionally well on coding benchmarks. It has been evaluated on HumanEval and Natural2Code benchmarks. Natural2Code is an internally developed benchmark that uses author-generated sources instead of web-based information to ensure its accuracy and reliability.
How Gemini Solve Problems?
More reliable, scalable, and efficient
Google trained Gemini 1.0 at scale on our AI-optimized infrastructure using Google's in-house designed Tensor Processing Units (TPUs) v4 and v5e. And we designed it to be our most reliable and scalable model to train and our most efficient to serve.
Google's AI-powered products, like Search, YouTube, Gmail, Google Maps, Google Play, and Android, serve billions of users worldwide. Training large-scale AI models can be costly, but Google has designed custom AI accelerators called TPUs that make it more cost-efficient. Gemini, a newer and more capable TPU model, runs much faster than earlier models. Companies around the world have been able to utilize these AI accelerators to train large-scale AI models cost-effectively.
"Today, we're announcing the most powerful, efficient, and scalable TPU system to date, Cloud TPU v5p, designed for training cutting-edge AI models, Google said. "This next-generation TPU will accelerate Gemini's development and help developers and enterprise customers train large-scale generative AI models faster, allowing new products and capabilities to reach customers sooner."
Building with Gemini
On December 13th, Gemini Pro could be accessible to developers and enterprise customers through the Gemini API in Google AI Studio or Google Cloud Vertex AI.
Google AI Studio is a web-based tool developers can use to create and launch apps using an API key quickly. If you need a customized AI platform that is fully managed, Vertex AI is the solution. With Vertex AI, you can have full control over your data and benefit from additional security, privacy, and compliance features that Google Cloud provides.
Android developers will soon have access to Gemini Nano, the most efficient model for on-device tasks to be made possible through AICore. This new system capability will be available in Android 14. The feature will be first made available on Pixel 8 Pro devices. If you're interested, you can sign up for an early preview of AICore.
Gemini Ultra coming soon
For Gemini Ultra, we're currently completing extensive trust and safety checks, including red-teaming by trusted external parties, and further refining the model using fine-tuning and reinforcement learning from human feedback (RLHF) before making it broadly available.
Google Bard - Next Level Coming
Also, Google announced that they will launch Bard Advanced early next year, which is an advanced AI experience that provides access to their best models and capabilities, starting with Gemini Ultra.