Beyond ChatGPT: Why Gemini is the Future of Generative AI

Whether you like it or not, we are in the midst of a technological revolution and evolution.

The 4th industrial revolution is currently underway, and significant advancements in technologies, such as blockchain, IoT, augmented reality, robotics, 3D printing, and cloud computing are transforming others while they themselves are being transformed by each other. For example, blockchain enhances the security and transparency of IoT devices by providing a decentralized and tamper-proof ledger for recording data exchanges between devices, ensuring data integrity and minimizing unauthorized access. Conversely, IoT devices generate massive amounts of data that can be recorded and verified using blockchain technology, making the blockchain more robust and functional. Similarly, cloud computing provides the computational power and storage needed to process and render augmented reality (AR) experiences, allowing AR applications to be more complex and data-intensive. In turn, the increasing demand for AR applications drives the need for more advanced cloud computing services, including edge computing and low-latency data processing, thus improving the overall infrastructure and capabilities of cloud computing.

These examples illustrate how these technologies transform other fields while driving each other’s development and innovation – quite a dynamic and sustainable ecosystem.

But one particular area worthy of a special mention is the field of AI/ML (Artificial Intelligence and Machine Learning).

You may have heard about ChatGPT — everyone has, at this point. It had taken over the world by a storm when it was covered by the media during the transition to 2023 during the beginning of a major war in Ukraine as the world was just recovering from a global pandemic, CoVID-19, the most significant pandemic in history.

It’s a wonder how one thing can replace another in terms of attention and impact, when we thought nothing else can top it. But the future is here, and we are experiencing many global events that there seems to be just too much to handle.

But with a little bit of patience and determination, everything can be learned, which will enable you to gain a higher level of understanding of the global trends that are pushing and pulling the entire world to unknown territories.

In this blog post, I will go over the Gemini family of models.

If you don’t know what they are, don’t start Googling for answers yet. I will make sure to go over what they are in detail, their applications, some of the key distinguishing features, comparisons to other AI models, such as ChatGPT, and how some major players in the game are utilizing it.

Notably, a recent study by Forrester in Q2 2024 titled “The Forrester Wave™: AI Foundation Models for Language, Q2 2024” ranked Google’s Gemini model as the #1 model, surpassing ChatGPT.

After reading this blog post, you will have a comprehensive understanding of the Gemini family of models, their unique capabilities, and their impact on the AI/ML industry. You’ll discover how these models are driving innovation across industries and how you can leverage them to stay ahead in this fast-paced technological era.

So, let’s get started.

The Brief-And-Boring-Yet-Crucial Technical Overview of Gemini Models

There are officially several Gemini (Google’s answer to ChatGPT and Claude) models currently in existence as of May 2024, yet generally falls in two categories: Gemini 1.0, which handle an input of around 8,000 tokens (16 images max, videos 2 min max, text/code/pdf), and Gemini 1.5, released to GA in 2024, which handles an input of around 1,000,000 tokens (3k images max, 1 hr video max, text/code/pdf/audio/video/images).

Compare that to GPT-4’s 8k token limit and GPT-4o’s 128,000 limit, and Claude AI’s 30,000 limit.

Even more, Google is already experimenting with the future by taking in volunteers to help test out a model that can handle an astounding 2,000,000 tokens.

With that amount of tokens, you can upload an entire codebase of a complex software application or upload an entire movie for analysis.

Gemini is configured to be fully multimodal, which means that it can take in multiple forms of input in a prompt. Models that aren’t multimodal accept prompts only with text.

Modalities can include text, audio, video, pdf, image, and more.

For instance, with a fully multimodal model, you can upload an image of a car along with some text to ask ‘What is the make and model of this car?’. The model will then use both the image and the question to generate the answer.

This can also come in handy for traveling – upload a map screenshot and provide a voice recording of a travel query like, “How do I get from my current location to the nearest train station?” The model can then combine the map details and the audio query, and even perhaps make a function call to an external API to get the latest data on weather conditions to give you an up-to-date response with clear instructions laid out for you.

This will seriously make other AI companies re-think their strategies as the world continues to evolve rapidly in the 4th industrial revolution.

Speaking of which, the Gemini models integrate naturally into the Google Cloud Platform ecosystem, which itself is a major player in the cloud computing industry, which itself is a core driver of the 4th industrial revolution, which itself is causing a massive technological shift at a global scale almost never seen before.

The Google Cloud Platform has a very powerful product called the Vertex AI, which is the main hub on GCP (Google Cloud Platform) for virtually anything related to AI and machine learning (ML). With Vertex AI, you can:

  1. Train/deploy ML models, as well as work with LLMs.
  2. Take advantage of options for low/no-code ML training, as well as an option for complete control over the AI training process.
  3. Use a model from the Vertex AI Model Garden, which is a lovely garden full of all types of models, from pre-trained proprietary models to open models (such as Gemma, LLaMa, and HuggingFace).
  4. Work with Generative AI models (Gemini, PaLM, etc.)
  5. …so much more.

Generative AI work is done within the Vertex AI environment in conjunction with other GCP products, such as the Google-built, globally-connected internal network, as well as highly-available, durable, performant, and cost-effective cloud storage, and finally a strong suite of computer processing technologies to help train extremely complex machine learning/AI models, all while running on clean, carbon-free energy, massively contributing to the health of our planet’s environment in a sustainable way. Wouldn’t you want our world to be a bit greener?

So, what are some of the cool things you can do with Gemini in Vertex AI?

First off, Gemini is a type of Generative AI, in the same realm as ChatGPT and Claude, and even Midjourney. It’s a large language model that can write code for you, summarize an article in 1 sentence in the tone of an angry sounding old man, create an image of your dog just surfing along the exotic beaches of Brazil, give you a detailed recipe just by looking at a photo of the food you provide it, and infinitely more. With this capability, you can develop an application that will connect to Gemini that would allow your users to interact with your model. You can give it specific system instructions, which is like a prompt but permanently infused into the model. (I have recently worked with a client that had an online web app for pet owners. This online web app connected to a GenAI model via API. The instructions given to the model were to ensure that the model acted as a professional and caring and loving veterinarian that gave guidance and advice for concerned pet owners.).

By now, I hope that you are well aware of what can be done with Gemini. While ChatGPT is still useful, it’s not as powerful as Google’s Gemini, nor does it offer as much customizations as Gemini offers. I would argue, though, that nothing really comes close to ChatGPT when it comes to introducing beginners to the world of generative AI. However, for enterprises and for complex use cases, Gemini would fare significantly better, due to its large context size, multimodal capability, and its full-fledged integration into the GCP ecosystem.

Try Gemini Now: https://gemini.google.com/

 

 

Leave a Comment