In 2016, when newly-minted Google CEO Sundar Pichai unveiled the Google Assistant as part of his new “AI-first” agenda, he touted the fledgling voice assistant as a tool to help people complete tasks.
“The Google Assistant allows you to get things done, bringing you the information you need, when you need it, wherever you are,” he wrote in a blog post at the time.
It was a lofty goal that has, for the most part, fallen short. Too often, the software gets stumped by a request, defaulting to a web search and apologetically saying it can’t help. That led people to relegate voice assistants to simple tasks like setting cooking timers, playing music or controlling their lights. Amazon’s Alexa, released a decade ago, hasn’t fared much better. Siri, the earliest of the bunch, launched by Apple in 2011, has been panned most of all.
But as generative AI has gone mainstream over the last two years, it has paved the way for AI “agents”: software that’s specifically programmed to take action or complete tasks on behalf of a user, like booking a reservation or buying something online. And as the “agentic era,” as Pichai calls it, arrives in 2025, the technology has the chance to do something that has to date eluded the big tech platforms: make their voice assistants actually useful.
That means Google Assistant, Alexa and Siri could finally fulfill their promise to act like personal assistants. Instead of just reciting your meeting schedule for the day, like Google Assistant can do now, it might actually be able to book the meetings, reaching out to contacts and finding a time that works for both people. They might have the ability to book your flights and hotels for a big vacation like a digital travel agent, with little more info than trip dates and destination.
Agents are the latest frenzy in the tech industry, with more than 470 platforms devoted to the technology, according to Forrester research. That ranges from big tech giants to smaller startups like LangChain, CrewAI and Play.ai. Beyond consumer features, they can also potentially transform businesses, with agents for customer service or software development. Deal count for AI agent startups is up more than 81% over the last year, according to PitchBook, with more than $8 billion invested in the space.
“The race is on,” said Steve Jang, a Forbes Midas List investor and founder of the firm Kindred Ventures. “Startups will be competing with the established platforms on who can orchestrate this at much higher fidelity. And who can create much more humanistic and realistic voices and conversations, and access the data and actions that we all want.”
The big tech voice assistants are best poised for such an AI jump start. Google has its marquee model Gemini to beef up its voice searches. Apple earlier this year announced a partnership with OpenAI to use ChatGPT to power some Siri queries. And in the last year, Amazon has invested $8 billion in Anthropic, which makes the powerful Claude chatbot. Google declined to make any of its executives available for interviews. Apple and Amazon didn’t reply to interview requests.
Jang thinks the real innovations will be made in actual voice AI models. Unlike large language models, which underpin services like ChatGPT, voice models are not trained on text and then read aloud by the software. Instead, voice models are trained on actual voice audio, so they can pick up on subtleties in speech, like cadence or emotional cues. Jang has invested in Play.ai, which specializes in voice agents; it’s competing with companies like ElevenLabs, OpenAI and Google that are all working on voice models.
Some, however, are not so convinced that agents will make the big voice assistants exponentially better. Kanjun Qiu, founder of Imbue, which is building agents for coding software, thinks adding more AI to those products will only “incrementally” improve them. She said that new AI features still won’t be a big enough leap for people to trust them. “Delegation as a paradigm is actually really hard for people,” said Qiu. “I only use Siri for trivial things that I know it’s not going to screw up.”
But she thinks recent improvements in voice AI will help consumers in other ways. For example, more apps will integrate voice features, she predicts. With improved latency and natural language understanding, you’ll be able to give an app instructions and it will carry out that action, Qiu said — like telling an e-commerce app you’d like to return the pair of shoes that don’t fit quite right. (An engineer by training, she said she’s built an app for herself that turns rambling into a to-do list.)
Improvements in AI and voice technology could also unlock hardware ambitions that Silicon Valley has been attempting for years. More than a decade ago, Google infamously faceplanted when it unveiled Google Glass, a piece of smart eyewear that stoked privacy fears and wasn’t very useful. Earlier this month, the company teased a new pair of prototype glasses to be used with Project Astra, Google’s new platform for AI agents. In a demo, the glasses, which are voice-controlled, automatically pulled up a door code from the wearer’s email the moment he looked at the entry keypad. The tech could also conjure up route information about the bus in front of him or the art sculpture he walked by.
Meanwhile, Facebook’s Orion glasses, announced earlier this year, use a combination of voice and hand gestures to control AI tools, like looking at ingredients in your pantry and asking the tech to find a recipe that uses them.
Voice-based innovations also make technology more accessible. Not everyone can read or write or type, but more people have the ability to speak, Jang said. And it’s an increasing preference for young people: 42% of 18-to-29 year olds in the U.S. send voice messages in their chat apps at least weekly, according to a study by YouGov and Vox.
New advancements in AI could make voice tools even more widely used and change the way people interact with their technology. “It makes voice agents — and voice itself — this great new user interface that has been untapped so far in computing,” Jang said.
Read the full article here