AI agents could finally make Siri and Alexa truly useful

Innovation

The ‘agentic era’ is arriving in 2025, according to Alphabet CEO Sundar Pichai. Agent technology has the chance to do something that has to date eluded the big tech platforms – make their voice assistants actually useful.
Apple - Intelligence - Siri AI - Photo Illustration

Voice assistants like SIri and Alexa could get a much needed boost by AI agents. Image: Getty

In 2016, when newly-minted Google CEO Sundar Pichai unveiled the Google Assistant as part of his new “AI-first” agenda, he touted the fledgling voice assistant as a tool to help people complete tasks.

“The Google Assistant allows you to get things done, bringing you the information you need, when you need it, wherever you are,” he wrote in a blog post at the time.

It was a lofty goal that has, for the most part, fallen short. Too often, the software gets stumped by a request, defaulting to a web search and apologetically saying it can’t help. That led people to relegate voice assistants to simple tasks like setting cooking timers, playing music or controlling their lights. Amazon’s Alexa, released a decade ago, hasn’t fared much better. Siri, the earliest of the bunch, launched by Apple in 2011, has been panned most of all.

But as generative AI has gone mainstream over the last two years, it has paved the way for AI “agents”: software that’s specifically programmed to take action or complete tasks on behalf of a user, like booking a reservation or buying something online. And as the “agentic era,” as Pichai calls it, arrives in 2025, the technology has the chance to do something that has to date eluded the big tech platforms: make their voice assistants actually useful.

WASHINGTON, DC – JUNE 23: Google CEO Sundar Pichai (L) and Apple CEO Tim Cook (R) listen as U.S. President Joe Biden speaks during a roundtable with American and Indian business leaders in the East Room of the White House on June 23, 2023 in Washington, DC. Biden and Indian Prime Minister Narendra Modi held the meeting to meet with a range of leaders from the tech and business worlds and to discuss topics including innovation and AI. (Photo by Anna Moneymaker/Getty Images)

That means Google Assistant, Alexa and Siri could finally fulfill their promise to act like personal assistants. Instead of just reciting your meeting schedule for the day, like Google Assistant can do now, it might actually be able to book the meetings, reaching out to contacts and finding a time that works for both people. They might have the ability to book your flights and hotels for a big vacation like a digital travel agent, with little more info than trip dates and destination.

Agents are the latest frenzy in the tech industry, with more than 470 platforms devoted to the technology, according to Forrester research. That ranges from big tech giants to smaller startups like LangChain, CrewAI and Play.ai. Beyond consumer features, they can also potentially transform businesses, with agents for customer service or software development. Deal count for AI agent startups is up more than 81% over the last year, according to PitchBook, with more than $8 billion invested in the space.

“The race is on,” said Steve Jang, a Forbes Midas List investor and founder of the firm Kindred Ventures. “Startups will be competing with the established platforms on who can orchestrate this at much higher fidelity. And who can create much more humanistic and realistic voices and conversations, and access the data and actions that we all want.”

“I only use Siri for trivial things that I know it’s not going to screw up.”

Kanjun Qiu, Co-founder, Imbue

The big tech voice assistants are best poised for such an AI jump start. Google has its marquee model Gemini to beef up its voice searches. Apple earlier this year announced a partnership with OpenAI to use ChatGPT to power some Siri queries. And in the last year, Amazon has invested $8 billion in Anthropic, which makes the powerful Claude chatbot. Google declined to make any of its executives available for interviews. Apple and Amazon didn’t reply to interview requests.

Jang thinks the real innovations will be made in actual voice AI models. Unlike large language models, which underpin services like ChatGPT, voice models are not trained on text and then read aloud by the software. Instead, voice models are trained on actual voice audio, so they can pick up on subtleties in speech, like cadence or emotional cues. Jang has invested in Play.ai, which specializes in voice agents; it’s competing with companies like ElevenLabs, OpenAI and Google that are all working on voice models.

Google is testing a pair of prototype glasses for its AI model Gemini. Image: Google via Forbes US

Some, however, are not so convinced that agents will make the big voice assistants exponentially better. Kanjun Qiu, founder of Imbue, which is building agents for coding software, thinks adding more AI to those products will only “incrementally” improve them. She said that new AI features still won’t be a big enough leap for people to trust them. “Delegation as a paradigm is actually really hard for people,” said Qiu. “I only use Siri for trivial things that I know it’s not going to screw up.”

But she thinks recent improvements in voice AI will help consumers in other ways. For example, more apps will integrate voice features, she predicts. With improved latency and natural language understanding, you’ll be able to give an app instructions and it will carry out that action, Qiu said — like telling an e-commerce app you’d like to return the pair of shoes that don’t fit quite right. (An engineer by training, she said she’s built an app for herself that turns rambling into a to-do list.)

Improvements in AI and voice technology could also unlock hardware ambitions that Silicon Valley has been attempting for years. More than a decade ago, Google infamously faceplanted when it unveiled Google Glass, a piece of smart eyewear that stoked privacy fears and wasn’t very useful. Earlier this month, the company teased a new pair of prototype glasses to be used with Project Astra, Google’s new platform for AI agents. In a demo, the glasses, which are voice-controlled, automatically pulled up a door code from the wearer’s email the moment he looked at the entry keypad. The tech could also conjure up route information about the bus in front of him or the art sculpture he walked by.

Dutch Minister of Education, Culture and Science Robbert Dijkgraaf tries out Google glasses. (Photo by JEROEN JUMELET/ANP/AFP via Getty Images)

Meanwhile, Facebook’s Orion glasses, announced earlier this year, use a combination of voice and hand gestures to control AI tools, like looking at ingredients in your pantry and asking the tech to find a recipe that uses them.

Voice-based innovations also make technology more accessible. Not everyone can read or write or type, but more people have the ability to speak, Jang said. And it’s an increasing preference for young people: 42% of 18-to-29 year olds in the U.S. send voice messages in their chat apps at least weekly, according to a study by YouGov and Vox.

New advancements in AI could make voice tools even more widely used and change the way people interact with their technology. “It makes voice agents — and voice itself — this great new user interface that has been untapped so far in computing,” Jang said.

This story was originally published on forbes.com and all figures are in USD. 

Look back on the week that was with hand-picked articles from Australia and around the world. Sign up to the Forbes Australia newsletter here or become a member here. 

More from Forbes Australia

 

Avatar of Richard Nieva
Topics: