Google Launches AI-Model Gemini, How its Different From ChatGPT?

Introducing Gemini: The largest and most capable AI model.

Gemini is a significant leap forward in AI's ability to improve our daily lives.

Gemini is a product of a massive collaborative effort by teams at Google. It was designed to be multimodal, which means it can easily understand, operate across, and combine various types of information such as text, code, audio, image, and video.

Gemini was built from scratch with the goal of being able to generalize and seamlessly integrate different forms of data.

Gemini is also our most flexible model yet — able to run efficiently on everything from data centers to mobile devices. According to Google, its state-of-the-art capabilities will significantly enhance how developers and enterprise customers build and scale with AI.

Google optimized Gemini 1.0, our first version, for three different sizes:

Gemini Ultra — our largest and most capable model for highly complex tasks.
Gemini Pro — our best model for scaling across a wide range of tasks.
Gemini Nano — our most efficient model for on-device tasks.

How does it differ from ChatGPT?

TEXT

Capability

Benchmark

Higher is better

Description

Gemini Ultra

GPT-4

API numbers calculated where reported numbers were missing

General

MMLU

Representation of questions in 57 subjects (incl. STEM, humanities, and others)

90.0% COT

@32*

86.4%

5-shot* (reported)

Reasoning

Big-Bench Hard

Diverse set of challenging tasks requiring multi-step reasoning

83.6%

3-shot

83.1%

DROP

Reading comprehension

(F1 Score)

82.4

Variable shots

80.9

3-shot (reported)

HellaSwag

Commonsense reasoning for everyday tasks

87.8%

10-shot*

95.3% (reported)

Math

GSM8K

Basic arithmetic manipulations (incl. Grade School math problems)

94.4% maj@32

92.0%

MATH

Challenging math problems (incl. algebra, geometry, pre-calculus, and others)

53.2%

4-shot

52.9%

4-shot (API)

Code

HumanEval

Python code generation

74.4%

0-shot (IT)*

67.0% O-shot* (reported)

Natural2Code

Python code generation. New held out dataset HumanEval-like, not leaked on the web

74.9%

0-shot

73.9%

0-shot (AP

Technical report by Google for details on performance with other methodologies

MULTIMODAL
Capability	Benchmark	Description Higher is better unless otherwise noted	Gemini	GPT-4V Previous SOTA model listed when capability is not supported in GPT-4V
Image	MMMU	Multi-discipline college-level reasoning problems	59.4% Cen uita (pral ony)	56.8% Optorpassen
	VQAV2	Natural image understanding	77.8% Genitura (piel only)	77.2%
	TextVQA	OCR on natural images	82.3% Comi ur (pic only)	78.0% Catra
	DOCVQA	Document understanding	90.9% comi ura (piel only)	88.4% 0-shot GPT-AV (pixel only)
		Infographic VQA Infographic understanding	80.3% Genitura (piel only)	75.1% O-shot GPT-4V (pixel only)
	MathVista	Mathematical reasoning in visual contexts	53.0% O-shot Gemini Ultra (pixel only*)	49.9% O-shot GPT-AV
Video	VATEX	English video captioning (CIDEr)	62.7 4-shot Gemini Ultra	56.0 4-shot DeepMind Flamingo
Video	Perception Test MCQA	Video question answering	54.7% O-shot Gemini Ultra	46.3% 0-shot SeviLA
Audio	COVOST 2 (21 languages)	Automatic speech translation (BLEU score)	40.1 Gemini Pro	29.1 Whisper v2
Audio	FLEURS (62 languages)	Automatic speech recognition (based on word error rate, lower is better)	7.6% Gemini Pro	17.6%

Gemini surpasses state-of-the-art performance on a range of multimodal benchmarks.

Next-generation capabilities

The multimodal models were created by training separate components for different data types, such as text or images, and then combining them to perform certain tasks. However, these models often struggle with complex reasoning and conceptual understanding despite being good at describing images or other more straightforward tasks.

We designed Gemini to be natively multimodal and pre-trained from the start on different modalities. Then we fine-tuned it with additional multimodal data to further refine its effectiveness, according to Google blog. This allows Gemini to understand and reason about various inputs, surpassing existing multimodal models. Its capabilities are state-of-the-art across domains.

So What's New in Google Gemini AI-model?

Gemini is an AI model that has achieved better results than human experts on the MMLU (Massive Multitask Language Understanding), a widely used method for testing AI models' knowledge and problem-solving abilities.

Understanding text, images, audio and more

Gemini 1.0 was created to understand text, images, audio, and more. This helps it to comprehend complicated topics and answer questions with ease. It's especially good at breaking down math and physics concepts.

Advanced coding

Gemini is our first version to write high-quality code in the most popular programming languages, such as Python, Java, C++, and Go. It is one of the leading foundational models for coding worldwide because it can work across languages and analyze complex information.

Gemini Ultra performs exceptionally well on coding benchmarks. It has been evaluated on HumanEval and Natural2Code benchmarks. Natural2Code is an internally developed benchmark that uses author-generated sources instead of web-based information to ensure its accuracy and reliability.

How Gemini Solve Problems?

More reliable, scalable, and efficient

Google trained Gemini 1.0 at scale on our AI-optimized infrastructure using Google's in-house designed Tensor Processing Units (TPUs) v4 and v5e. And we designed it to be our most reliable and scalable model to train and our most efficient to serve.

Google's AI-powered products, like Search, YouTube, Gmail, Google Maps, Google Play, and Android, serve billions of users worldwide. Training large-scale AI models can be costly, but Google has designed custom AI accelerators called TPUs that make it more cost-efficient. Gemini, a newer and more capable TPU model, runs much faster than earlier models. Companies around the world have been able to utilize these AI accelerators to train large-scale AI models cost-effectively.

"Today, we're announcing the most powerful, efficient, and scalable TPU system to date, Cloud TPU v5p, designed for training cutting-edge AI models, Google said. "This next-generation TPU will accelerate Gemini's development and help developers and enterprise customers train large-scale generative AI models faster, allowing new products and capabilities to reach customers sooner."

Building with Gemini

On December 13th, Gemini Pro could be accessible to developers and enterprise customers through the Gemini API in Google AI Studio or Google Cloud Vertex AI.

Google AI Studio is a web-based tool developers can use to create and launch apps using an API key quickly. If you need a customized AI platform that is fully managed, Vertex AI is the solution. With Vertex AI, you can have full control over your data and benefit from additional security, privacy, and compliance features that Google Cloud provides.

Android developers will soon have access to Gemini Nano, the most efficient model for on-device tasks to be made possible through AICore. This new system capability will be available in Android 14. The feature will be first made available on Pixel 8 Pro devices. If you're interested, you can sign up for an early preview of AICore.

Gemini Ultra coming soon

For Gemini Ultra, we're currently completing extensive trust and safety checks, including red-teaming by trusted external parties, and further refining the model using fine-tuning and reinforcement learning from human feedback (RLHF) before making it broadly available.

Google Bard - Next Level Coming

Also, Google announced that they will launch Bard Advanced early next year, which is an advanced AI experience that provides access to their best models and capabilities, starting with Gemini Ultra.

Trending

Top 10 Apps Every Salesperson Should Have

Facebook Messenger Testing 'Add Contact', Lets You Message Non-Friends More Easily

Apple To Launch New iPhone 5SE And iPad Air 3

Apple iRing The Bluetooth Ring Concept

200000 Nude Photos Leaked from SnapChat Due To Security Breach