Category : AI & Automation

AI & Automation

Why the Google Gemini Launch Matters

On December 7, Google announced the launch of Gemini, its highly anticipated new multi-modal AI architecture, including a Nano version optimized for hand-held devices. The announcement was greeted with mixed reviews.

Some users expressed doubts about the claims made by Google or whether the Gemini product was significantly better than GPT-4. Quoting an AI scientist who goes simply by the name “Milind,” Marketing Interactive suggested that Google is playing catch up at this point and that OpenAI and Microsoft might be ahead by six months to a year in bringing their AI models to market.

There was also plenty of public handwringing about a promotional video by Google featuring a blue rubber duck because the demo had been professionally edited after it was recorded.

Despite the tempest in a teapot about the little blue rubber duck, we believe the announcement is essential and deserves our full attention.

Decoding Gemini: How Parameters Shape Its Capabilities

Parameters are, roughly speaking, an index to how capable an AI might be. GPT 4.0 was built on 1.75 trillion parameters.

We don’t know how many parameters were used to build Gemini. Still, Ray Fernandez at Technopedia estimated that Google used between 30 and 65 trillion parameters to make Gemini, which, according to SemiAnalysis, would equate to an architecture that might be between 5 and 20x more potent than GPT-4.

Beyond the model’s power, there are at least four points of differentiation for Gemini.

#1. Multi-modal Architecture: Gemini uses multi-modal architecture from the ground up, unlike the competing architectures, which have text, images, video, and code in separate silos, which forces other companies to roll out those capabilities one by one, complicating the ability for them to work together in an optimum way.

#2. Massive Multitask Language Understanding: Gemini scored higher than its competition on 30 out of 32 third-party benchmarks. On some of those, they were only slightly ahead, and on others, more, but overall, that’s an imposing win-loss record.

In particular, Gemini recorded an essential milestone by outscoring human experts on a tough test called Massive Multitask Language Understanding (MMLU). Gemini scored 90.04% versus a human expert performance, which scored 89.8%, according to the benchmark authors.

#3. Alpha Code2 Capabilities: Simultaneously with the launch of Gemini, Google also launched Alpha Code2, a new, more advanced coding capability that now ranks within the top 15% of entrants on the Codeforces competitive programming platform. That ranking represents a significant improvement over its state-of-the-art predecessor, which previously ranked in the top 50% on that platform.

#4. Nano LLM model: Also simultaneous with the launch of Gemini was the Nano LLM model, which is optimized to run on a handheld device, bringing many of Gemini’s capabilities to edge devices like handheld phones and wearables. For now, that’s a unique advantage for Gemini.

points of differentiation for Google Gemini

What are the practical implications of Gemini Nano on a handheld device?

Companies like Robosoft Technologies that build apps will collaborate with clients to test the boundaries of what Nano can do for end users using edge devices like cell phones.

Edge computing emphasizes processing data closer to where it is generated, reducing latency and dependence on centralized servers, and cell phones will undoubtedly be first in line to benefit from Nano because they can perform tasks like image recognition, voice processing, and various types of computations on the device itself.

What about Wearables or other Types of Edge Devices?

Google hasn’t said whether Nano can run on wearables or other edge devices, but its design and capabilities suggest it probably can.

First, Nano is a significantly slimmed-down version of the full Gemini AI model, making it resource-efficient and potentially suitable for devices with limited computational power, like wearables.

Also, Nano is designed explicitly for on-device tasks. It doesn’t require constant Internet connectivity, making it ideal for applications where data privacy and offline functionality are crucial — both are relevant for wearables.

In particular, we noticed that Google’s December 2023 “feature drop” for Pixel 8 Pro showcased a couple of on-device features powered by Nano, including “Summarize” in the Recorder app and “Smart Reply” in Gboard. In our opinion, these capabilities could easily translate to wearables.

What about Apple Technology?

There’s no official indication that Nano is compatible with Apple technology. We think such compatibility is unlikely because Google primarily focuses on Android and its ecosystem.

However, the future of AI development is increasingly open-source and collaborative, so it’s possible that partnerships or independent efforts by members of the AI ecosystem — including companies like Robosoft Technologies — could lead to compatibility between Gemini Nano and Apple devices.

Enterprise-Level Use Cases for Gemini Pro

From what we know so far, Gemini Pro offers good potential to enable or enhance various enterprise-level applications. Here are some critical use cases that we think are most likely to be among the first wave of projects using Gemini Pro.

Customer Service and Workflows

  • Dynamically updating answers to FAQs
  • Helping with troubleshooting
  • Routing questions to the appropriate resources
  • Extracting and summarizing information from documents, forms, and datasets
  • Filling in templates
  • Maintaining databases
  • Generating routine reports

Personalization and Recommendations

  • Creating personalized marketing messages and recommendations
  • Optimizing pricing
  • Automating risk assessments
  • Streamlining loan applications
  • Providing personalized health treatment plans
  • Recommending preventive health measures

Business Process Optimization

  • Identifying process delays
  • Optimizing resource allocation
  • Streamlining decision-making processes with improved information flow
  • Identify cost savings opportunities

Security and Fraud Detection

  • Identifying potential cyber-attacks
  • Identifying malicious code and protecting sensitive data
  • Analyzing financial data for suspicious activity to help prevent losses

Content Moderation and Safety

  • Moderating user comments and posts on social media, including forum discussions
  • Improving the correct identification of spam

Above all, a very foundational use for Google Gemini Pro might be to enable the implementation of an enterprise-level generative AI copilot.

What is an Enterprise-Level Generative AI Copilot?

A generative AI copilot is an advanced artificial intelligence system designed to collaboratively assist and augment human users in various tasks, leveraging productive capabilities to contribute actively to the creative and decision-making processes. This type of technology is customized for specific enterprise applications, learning from user interactions and context to provide tailored support. It goes beyond conventional AI assistants by actively generating real-time suggestions, solutions, or content. It fosters a symbiotic relationship with users to enhance productivity, creativity, and problem-solving within organizational workflows.

Why might Gemini Pro be a good platform for building a generative AI copilot?

We think that Gemini Pro should be considered a possible platform for building a copilot. Its capabilities and characteristics align well with the requirements of such a system.

First, Gemini Pro can process and generate human language effectively, enabling it to understand user intent and respond coherently and informally. It has a knowledge base built on 40 trillion tokens, equivalent to having access to millions of books. It can reason about information, allowing it to provide relevant and insightful assistance to users.

Also, like other generative AI platforms, Gemini Pro can adapt its responses and behavior based on the context of a conversation, helping to ensure that its assistance remains relevant and helpful.

So that’s a good foundation.

Upon such a foundation, Google relies on the partners in its ecosystem to build an overall solution that addresses enterprise needs. These include ensuring that their data is secure. That information inside their enterprise is not used to train public models, control access to the data based on job roles and other factors, help with data integration, and build an excellent user interface. These are examples of areas where technology partners like Robosoft Technologies make all the difference when bringing an AI-based solution to life within an enterprise.

Read More
AI & Automation

Conversational AI breaks through user barriers – Designing a fulfilling conversation is key

Hey Alexa, what is conversational AI? If you’ve ever interacted with a virtual assistant like Siri, Alexa or Google Assistant, then you’ve experienced conversational Artificial Intelligence (AI). These game-changing automated messaging and speech-enabled applications have permeated every walk of life, creating human-like interactions between computers and humans. From checking your appointments and carrying out bank transactions, to tracking the status of your food or delivery order and learning the names of songs, conversational AI will soon be playing a lead role in your digital interactions.

So, how does Conversational AI work?

Users interact with conversational AI through text chats or voice. Simple FAQ chatbots require specific terms to derive responses from their knowledge bank. However, applications based on conversational AI are far more advanced – they can understand intent, provide responses in context, and learn and improve over time. While conversational AI is the umbrella term, there are underlying technologies such as Machine Learning (ML), Natural Language Processing (NLP), Natural Language Understanding (NLU) and Natural Language Generation (NLG) that enable text-based interactions. In the context of voice, additional technologies such as Automatic Speech Recognition (ASR) and text-to-speech software enable the computer to “talk” like a human.

Conversational AI process

Imagine you give a command to a conversational AI application to track your order. This input could either be spoken or text. If spoken, the ASR converts the spoken phrases into machine-readable language. Once converted by ASR, the application then moves into the NLP stage, where it first uses NLU to understand the context and intent of the message. Based on this, a response is formed through a dialogue management system and generated into an understandable format by NLG. The response is then either delivered in text, or in the case of voice, converted to speech through text-to-speech software. All this happens in a matter of seconds, to get the information you need about the status of your order.

Conversational AI will create a real and personal relationship between humans and technology

As our world becomes more digital, conversational AI can enable seamless communication between humans and machines, with interactions that are an integral part of daily life. Besides improved user engagement, conversational assistants allow round-the-clock business accessibility and reduce manual errors in sharing information. They reduce the dependency on people for multi-lingual support and enable inclusion by removing literacy barriers. The benefits and potential of conversational AI are inviting businesses and technology to make heavy investments in the space.

Sales, service and support have been early adopters of conversational AI, because of the structured nature of information exchange that these functions require. This has decreased query resolution times, reduced the dependence on human agents and provided the opportunity for 24/7 sales and service. The AI chatbots are even able to deliver recommendations on purchases based on personalized customer preferences. According to Gartner, chatbots and conversational agents will raise and resolve a billion service tickets by 2030.

Across sectors, conversational AI is transforming interactions between people and systems. The banking sector is banking on conversational AI to provide a superior experience through transactions such as providing balance information, paying bills, marketing offers and products and so on, all without human intervention. The insurance sector is using chatbots to help customers choose a policy, submit documents, handle customer queries, renew policies and more. The healthcare sector is using these chatbots to check patient symptoms, schedule appointments, maintain patients’ medical data, and share medication and routine check-up reminders. Automobiles are becoming a cockpit for personal AI assistants or in-car experiences.

Businesses are also using conversational AI to manage their own workforce and improve the employee experience. Through chatbots, they make vital information available to employees 24/7, reducing the need for human resources to manage queries and processes. The possibilities and opportunities with conversational AI are endless and use cases are available in every industry.

Overcoming user frustration with Conversational AI through better engineering and design

While there are several benefits to conversational AI, you might be familiar with many instances when the conversation ends in frustration. As AI technology evolves and matures, these challenges must be addressed at the design and engineering stage.

In terms of design, the success of the platform entirely hinges on user interface and experience. It must be easy to use, intuitive, and must fit seamlessly into the overall design of the application and customer journey. While UI is important, the conversation itself is the most critical aspect. It is important to ensure that the conversational design flows smoothly, follows well-tested and widely applicable patterns and has exception rules inbuilt into the script design.

The more human-like the conversation is, the better the user’s acceptance

  • Draw from real life – To design a fulfilling conversation, architects and UX designers must draw from real-life, and UX design principles. The product has to be designed for ease of use, ease of conversation and ease of resolution. The product has to be easily findable, accessible and usable to the user in the overall product ecosystem. This can be achieved by following time-tested UI and UX principles in developing visual or auditory experiences.
  • Build trust – To build trust in conversational AI, small talk or playful ways to engage with the AI can be built into the engagement.
  • Understand the target audience – Understanding the target audience and their needs is pivotal to the success of conversational AI. An in-depth study of the demographics helps in building a platform that is unbiased. Incorporating languages, accents and cultural nuances allows the user to relate better and enable smoother interactions.
  • Solve customer problems, not business problems – A deep understanding of the customer ensures that the conversation design is solving for the customer, rather than solving for the business problem. When the focus is on the business problem, the is a possibility of ignoring the human-like flow of interaction. Putting the customer first helps in building a valuable and desirable interface that is a win-win for both the customer and the business. It is also important to ask what the system will help resolve and design the conversation to ensure the most frequent use cases for the application are solved logically and seamlessly. Bad AI chatbot conversationExample of a bad AI chatbot interaction
  • Recover from lagging conversations – The AI bot must also have the ability to learn from mistakes, recover from broken conversations and redirect to human agents when conversations cannot be fulfilled through AI. This has to be designed seamlessly into the interface, ensuring the customers trust the system and come back to use it in the future.

Engineering can help provide human-like interaction

  • The systems have to be able to deal with noisy settings and decipher languages, dialects, accents, sarcasm, and slang that could influence intent in the conversation. Intense data training, larger varied datasets, language training and machine learning (ML) could solve these challenges as the technology matures.
  • Another concern with conversational AI is data privacy and protection. To gain user trust, security must be paramount and all regional privacy laws must be adhered to.
  • Backend integration of conversational AI platforms may decide their success or failure in the market. The platform must integrate with CRM, after-sales, ticketing, databases, analytics systems and so on, to get appropriate data for the user, and provide appropriate data to the business.
  • Finally, the AI system should be backed by analytics and data, so that data scientists have invaluable insights to continuously improve the system.

Conversational AI is growing at an incredible pace and at a massive scale. This is because of the immense possibility that conversational AI has to bridge the gap between humans and technology. There is vast demand also due to the efficiencies and cost savings that conversational AI can offer businesses with quick, accurate and effortless query resolution. Businesses across industries should leverage this technology of the future to deliver a consistent and superior user experience.

Read More
AI & Automation

Will Chatbots Replace Traditional Apps for Brands?

Last year, the marketing promotion for the movie Insidious: Chapter 3 included a chatbot where fans could talk on the Kik app with a bot version of a character from the film.

Read More
1 2