On December 7, Google announced the launch of Gemini, its highly anticipated new multi-modal AI architecture, including a Nano version optimized for hand-held devices. The announcement was greeted with mixed reviews.
Some users expressed doubts about the claims made by Google or whether the Gemini product was significantly better than GPT-4. Quoting an AI scientist who goes simply by the name “Milind,” Marketing Interactive suggested that Google is playing catch up at this point and that OpenAI and Microsoft might be ahead by six months to a year in bringing their AI models to market.
There was also plenty of public handwringing about a promotional video by Google featuring a blue rubber duck because the demo had been professionally edited after it was recorded.
Despite the tempest in a teapot about the little blue rubber duck, we believe the announcement is essential and deserves our full attention.
Decoding Gemini: How Parameters Shape Its Capabilities
Parameters are, roughly speaking, an index to how capable an AI might be. GPT 4.0 was built on 1.75 trillion parameters.
We don’t know how many parameters were used to build Gemini. Still, Ray Fernandez at Technopedia estimated that Google used between 30 and 65 trillion parameters to make Gemini, which, according to SemiAnalysis, would equate to an architecture that might be between 5 and 20x more potent than GPT-4.
Beyond the model’s power, there are at least four points of differentiation for Gemini.
#1. Multi-modal Architecture: Gemini uses multi-modal architecture from the ground up, unlike the competing architectures, which have text, images, video, and code in separate silos, which forces other companies to roll out those capabilities one by one, complicating the ability for them to work together in an optimum way.
#2. Massive Multitask Language Understanding: Gemini scored higher than its competition on 30 out of 32 third-party benchmarks. On some of those, they were only slightly ahead, and on others, more, but overall, that’s an imposing win-loss record.
In particular, Gemini recorded an essential milestone by outscoring human experts on a tough test called Massive Multitask Language Understanding (MMLU). Gemini scored 90.04% versus a human expert performance, which scored 89.8%, according to the benchmark authors.
#3. Alpha Code2 Capabilities: Simultaneously with the launch of Gemini, Google also launched Alpha Code2, a new, more advanced coding capability that now ranks within the top 15% of entrants on the Codeforces competitive programming platform. That ranking represents a significant improvement over its state-of-the-art predecessor, which previously ranked in the top 50% on that platform.
#4. Nano LLM model: Also simultaneous with the launch of Gemini was the Nano LLM model, which is optimized to run on a handheld device, bringing many of Gemini’s capabilities to edge devices like handheld phones and wearables. For now, that’s a unique advantage for Gemini.
What are the practical implications of Gemini Nano on a handheld device?
Companies like Robosoft Technologies that build apps will collaborate with clients to test the boundaries of what Nano can do for end users using edge devices like cell phones.
Edge computing emphasizes processing data closer to where it is generated, reducing latency and dependence on centralized servers, and cell phones will undoubtedly be first in line to benefit from Nano because they can perform tasks like image recognition, voice processing, and various types of computations on the device itself.
What about Wearables or other Types of Edge Devices?
Google hasn’t said whether Nano can run on wearables or other edge devices, but its design and capabilities suggest it probably can.
First, Nano is a significantly slimmed-down version of the full Gemini AI model, making it resource-efficient and potentially suitable for devices with limited computational power, like wearables.
Also, Nano is designed explicitly for on-device tasks. It doesn’t require constant Internet connectivity, making it ideal for applications where data privacy and offline functionality are crucial — both are relevant for wearables.
In particular, we noticed that Google’s December 2023 “feature drop” for Pixel 8 Pro showcased a couple of on-device features powered by Nano, including “Summarize” in the Recorder app and “Smart Reply” in Gboard. In our opinion, these capabilities could easily translate to wearables.
What about Apple Technology?
There’s no official indication that Nano is compatible with Apple technology. We think such compatibility is unlikely because Google primarily focuses on Android and its ecosystem.
However, the future of AI development is increasingly open-source and collaborative, so it’s possible that partnerships or independent efforts by members of the AI ecosystem — including companies like Robosoft Technologies — could lead to compatibility between Gemini Nano and Apple devices.
Enterprise-Level Use Cases for Gemini Pro
From what we know so far, Gemini Pro offers good potential to enable or enhance various enterprise-level applications. Here are some critical use cases that we think are most likely to be among the first wave of projects using Gemini Pro.
Customer Service and Workflows
- Dynamically updating answers to FAQs
- Helping with troubleshooting
- Routing questions to the appropriate resources
- Extracting and summarizing information from documents, forms, and datasets
- Filling in templates
- Maintaining databases
- Generating routine reports
Personalization and Recommendations
- Creating personalized marketing messages and recommendations
- Optimizing pricing
- Automating risk assessments
- Streamlining loan applications
- Providing personalized health treatment plans
- Recommending preventive health measures
Business Process Optimization
- Identifying process delays
- Optimizing resource allocation
- Streamlining decision-making processes with improved information flow
- Identify cost savings opportunities
Security and Fraud Detection
- Identifying potential cyber-attacks
- Identifying malicious code and protecting sensitive data
- Analyzing financial data for suspicious activity to help prevent losses
Content Moderation and Safety
- Moderating user comments and posts on social media, including forum discussions
- Improving the correct identification of spam
Above all, a very foundational use for Google Gemini Pro might be to enable the implementation of an enterprise-level generative AI copilot.
What is an Enterprise-Level Generative AI Copilot?
A generative AI copilot is an advanced artificial intelligence system designed to collaboratively assist and augment human users in various tasks, leveraging productive capabilities to contribute actively to the creative and decision-making processes. This type of technology is customized for specific enterprise applications, learning from user interactions and context to provide tailored support. It goes beyond conventional AI assistants by actively generating real-time suggestions, solutions, or content. It fosters a symbiotic relationship with users to enhance productivity, creativity, and problem-solving within organizational workflows.
Why might Gemini Pro be a good platform for building a generative AI copilot?
We think that Gemini Pro should be considered a possible platform for building a copilot. Its capabilities and characteristics align well with the requirements of such a system.
First, Gemini Pro can process and generate human language effectively, enabling it to understand user intent and respond coherently and informally. It has a knowledge base built on 40 trillion tokens, equivalent to having access to millions of books. It can reason about information, allowing it to provide relevant and insightful assistance to users.
Also, like other generative AI platforms, Gemini Pro can adapt its responses and behavior based on the context of a conversation, helping to ensure that its assistance remains relevant and helpful.
So that’s a good foundation.
Upon such a foundation, Google relies on the partners in its ecosystem to build an overall solution that addresses enterprise needs. These include ensuring that their data is secure. That information inside their enterprise is not used to train public models, control access to the data based on job roles and other factors, help with data integration, and build an excellent user interface. These are examples of areas where technology partners like Robosoft Technologies make all the difference when bringing an AI-based solution to life within an enterprise.