We got a demo of Google’s Gemini Pro. Here’s what it can do

Innovation

Canva is using it. Woolworths is too. So is unicorn Culture Amp. Some of Australia’s most successful companies are utilising Google Cloud’s generative AI capabilities. Google Sydney opened its doors to Forbes Australia to show us the multimodal power of Gemini Pro.
Gemini Pro is now available in the Android studio app development environment. Image taken in Brussels, Belgium, on February 8, 2024. (Photo by Jonathan Raa/NurPhoto via Getty Images)

“When Canva is building to scale to their 100 million active users, they use this technique, this platform, to embed AI into their products,” says Matt Zwolenski, the CTO of Google Cloud Australia and New Zealand.

Zwolenski’s laptop is connected to a screen on the wall, and he is about to show me through Vertex – the developer platform that acts as a ‘Google Play store’ of sorts, providing developers with a suite of 130 tools that they can use to build AI. Some of those tools were developed by Google, and some, for example, Meta’s LLaMA and Anthropic’s Claude, were not.

One of the newest tools on the Vertex platform is Gemini Pro, announced in February by Google CEO Sundar Pichai and Google Deep Mind CEO Demis Hassabis. Gemini Pro was created by Google Deep Mind specifically for Google Cloud’s enterprise clients. It can’t currently be accessed by the general public.

“Gemini Pro is what Canva is using.” says Zwolenski. “It is multimodal, so it’s really good at images, text, and other things. If you go into the Canva app, you can use image generation and create photographic images. But Gemini Pro is also used for text generation, to help write stories in Canva, and so on. The platform makes it easy to manage multiple models and integrate them.”

It is this multimodal feature, that Zwolenski – and Pichai – say sets Gemini Pro apart from its predecessors.

“Multimodal is one of the areas that our Deep Mind research team has had real breakthroughs. Gemini is this multimodal idea – video and text and images together,” says Zwolenski.

“The analogy is, imagine you’re in France watching a French film. And you’re taking in the imagery, you’re hearing the music and the language, you’re reading the subtitles, and your brain is processing the storyline. Our brains are good at all of that. But AI models – up until this point – have not been good at multiple modes, they’ve been good at one thing,” says Zwolenski.

Matt Zwolenski is the CTO of Google Cloud Australia and New Zealand. He gave Forbes Australia a demo of the enterprise capabilities of Gemini Pro. Image: Google Australia
The enterprise demo: Gemini Pro in action – video to text

Zwolenski opens Vertex and talks me through an enterprise scenario where Gemini Pro can be of use.

“Imagine you want to list your house for rent. This could be for Domain, or it could be for a listed real estate agent. This scenario is – how do we make life simpler. I have prepared a short video of this house – just the kitchen and the lounge room. Normally, you’d upload your video and then write a description of the house. Instead of doing that, we’re going to [ask Gemini Pro] to create a description of what you see in this video. To list all the features in this house that’ll be relevant for a rental. The description should be factual based on only what is seen in the video and at a tone that would appeal to guests. And I hit submit.”

“What you can see here, is it’s looked at the video. It says it has an open floor plan, it talks about the appliances in the kitchen. Now, this was real-time – that is one of the things the team at Deep Mind has done a phenomenal job of, how fast that was, processing that video and coming with a result in two seconds. So imagine this is now embedded in the [real estate] website. You’ve uploaded the video, and you say ‘create the description.’ It’s a really fast, elegant response.”

“There’s a little bit of a difference in the consumer version, versus what happens here in the enterprise version. Firstly, there’s a whole bunch of tuning and things that [enterprise] customers can do. So here the temperature was set to point four. Now this temperature is creativity, you wind it up, the model gets more creative. You wind it down, the model gets more factual. Depending on your use case.”

The competitive advantage of training the data

Zwolenski suggests a further demo where training can be used to make the data more applicable to the needs of a specific enterprise.

“Now we can train it to our voice. In this scenario, the customer wants to make [the description] more creative. In their ads, it’s much more flamboyant, it’s more interesting. We often see customers say, okay, that’s the generic answer, I want to create something of my own. DeepMind has spent a long time building training into Gemini. But a customer can then take their own data or their own ideas and fine-tune it. This is what we call the ‘competitive mode.’ If you’re a bank, if you’re a consumer-facing organisation, you’ve got large amounts of data, you can use that data to create something that nobody else can.”

“What I’ve got is a file that has several hundred previous listings. So when Gemini creates a new listing, I want to write it in a similar style to what I’ve done previously. Instead of using standard Gemini Pro, I’m going to tune that version to the file, and hit submit. Now it’s using the extra data and coming up with a slightly different description. ‘The open floor plan is a testament to architectural grandeur.’ The language is very different.”

Integrating external data into Gemini Pro

External data can also be used to the benefit of enterprises when using Gemini Pro, Zwolenski says.

“At the moment, we’ve just got things we’ve uploaded. But what if we could also do something cool with external data? We can use an API to interface with something that’s on the internet. It goes out and looks at a site with listings and prices for houses. So we’re going to ask Gemini Pro, how much should we list the house for? Provide weekly recommendations for the month of December, including Christmas and New Year’s.”

“Gemini has worked out what’s in the video, looked at appliances, deciphered the photos, pulled it all together, and then gone out and compared it to similar properties, catalogued it into a timeframe and figured potential price per week. This is all running [behind the scenes] and it becomes invisible. This is what’s making our customers say hey, let’s look at Google from an AI perspective. Culture Amp is using this platform to train data and tune and improve over time.”


Another advantage of Gemini Pro is the years of research into safety that Google and Deep Mind have worked on, says Zwolenski.

Safety settings built into Gemini Pro

“Safety settings are really powerful. We built responsible AI filters on a decade of research looking at Google search – ensuring we don’t bring in data that is from the worst parts of the internet. Our filters scan billions of data points to ensure that we filter out the things that we don’t want to include.”

“Those filters were built for Search and Maps and YouTube. We spent a lot of years researching to get those right for billions of users, we’ve taken that same filter and built it into this platform. At Culture Amp and Canva – who are using this platform – they have access to the same filters that we use on search. If users try to put in things that are inappropriate, they will block it. Our customers will typically set it to block most because they want to protect their business.”

Gemini Pro can now be integrated into Android Studio

In addition to large enterprises like Culture Amp, Canva and Macquarie using Gemini Pro, Android Studio announced this week that some of the capabilities outlined above will now be available for smaller developers to use. Gemini Pro is being rolled out in Android Studio in 180 countries, including Australia.

This is important in terms of the reach of Gemini Pro, because, according to Android, in 2022 there were more than 2 billion active monthly Android devices around the world. The Android platform has 43.5% of the global operating system market share and IOS holds 18.2%, according to StatCounter. In Australia, the market share is different – IOS is more dominant than Android, with 27.93% market share. Android holds 18.17% of the market.

Nonetheless, Android (and now app developers who use Android Studio to develop apps) is a platform that reaches an enormous amount of people across the globe. Which gives the Gemini 1.0 Pro model greater reach.

Takeaway

In summary, the competitive advantage of Gemini Pro according to Google is the ability to tune and train multi-modal data to the specific needs of an enterprise.

“If a digital native company wants to do AI at scale, Gemini Pro takes the Google thinking and how we’ve been operating for the last 10 years, packages it up, and builds a platform for them to do it. It is very powerful AI in a very simple way,” says Zwolenski.

Look back on the week that was with hand-picked articles from Australia and around the world. Sign up to the Forbes Australia newsletter here or become a member here.

More from Forbes Australia

Avatar of Shivaune Field
Business Journalist
Topics: