The nuclear winter of voice technology
I love voice and conversational technology and think it will transform how we live, but we are in a period of frustration. Can we end the nuclear winter of voice?
A rant, observations and what we can learn from the mobile app industry.
You can read the article below or listen to the audio version via my podcast or subscribe here.
What started us on this path
When a new technology hits the market, it does so first for the innovators (we have all seen the charts) and then slowly moves forward through the stages. At each stage, the technology gets more stable, more cost-effective, increases its market share and gains more competition. Though in voice technology and its attempt to become mainstream took a slightly different path.
Today Amazon has won the brand war of voice by putting their loss-leading Echo smart speakers in everyone's home; these overpowered kitchen timers and music players made the phrase "Alexa" synonymous with voice interaction across the broadest of audiences. From children to grandparents, if you asked them to speak with an inanimate object, they would probably start the sentence with "Alexa…".
This saturation of the Alexa brand may end up being Amazon's biggest mistake, as when your product reaches this level of abundance, customers expect more and Alexa isn't delivering (yet).
Microsoft tried with Cortana and gave up; Samsung has Bixby but is suffering from conglomerate ownership stifling much-needed creator engagement (we don't want to build apps for the fridge). Apple has Siri, a worthy opponent to Amazon, but all signs point to Siri being the UI layer to Apple's AR ambitions rather than a platform in its own right. Finally, aside from playing, "when will Google shut this down" Google has managed to catch up with Amazon in hardware quality and consumer price point. Still, it is unclear what Google's mission is for voice or even the broader Google Assistant ecosystem.
The landscape in 2021 feels like peak BlackBerry; every big tech announcement is marketed at a leap forward but lands flat. We get slightly improved speaker audio, the addition of a camera or a weird robot head that follows us around the room. When and from whom will we get an iPhone level release to move the industry forward.
Moving on from slot filling
We feel that in 2021 we are in a world of voice AI, but in practice, we have substituted 1990s IVR systems of press 1 for X, 2 for Y for "what can I help you with?" but we are still at press 1 for sales, press 2 for accounts, but we don't have to press 1, we say "sales" for "sales".
The abundance of smart devices, speakers, phones, watches and even microwaves makes turning the lights off by your voice feel like magic, but now we want more. We now get annoyed or even angry when voice fails us. Our expectations for a simple task like booking a car for a service now mean we expect the assistant to know the cars I own, their history, my calendar, their opening times, predicted traffic, payments, the list continues.
For authentic conversational experiences to exist, we need to move beyond slot filing to conversational knowledge. If delivered well, this would be the iPhone moment; we saw a glimmer with Google Duplex, the awe, the excitement, the concern.
The next evolution is not about voice but getting technology to interact natively to us. The keyboard, the mouse, even touch, has been us adapting to technology. The move to voice and conversation is making technology adapt to us; this is the hard bit.
People need to make money
Apple created the App Store and allowed developers to monetise because Apple made their money from the hardware (no developer conflict; in fact, the opposite). Apple's inbuilt apps on the iPhone where basic (and purposely so); they offered functionality but left room for developers to innovate and iterate. It wasn't until Apple launched Apple Music that the line became grey between Apple and developers, though, by this time, the ecosystem was mature with billions of dollars paid out to developers.
The biggest problem with this today is that voice purchasing is arduous for both creators and consumers. I have read many reports that talk about how X% (generally over 50%) of people have purchased via voice…. This has to be 100% bullshit even by the thinnest of lines to what a voice-enabled purchased could be classed as. I have asked peers in the voice space, watched and been a part of live polls of voice technology enthusiasts and I have never seen over 3% of people declare they have used voice to purchase.
Apple made an ecosystem that developers could make money within, so that meant developers made money. Other developers would see this, build apps and make money too . Then tool makers would create better tools for developers because developers would spend money on tools that made them more money. Apple's App Store made $60 billion last year in digital goods sales, not counting Apple Pay or non-digital app purchases - estimated at $500 billion.
For voice to accelerate, the Amazon and Google's of the world need to make monetisation easier for creators and incentivise consumers to see the value in paid-for experiences.
The enterprise is coming
During a year of pandemic lockdowns and rule changes for retail and hospitality, Google Duplex rang businesses to ask them when they were open and closed; Duplex updated over three million business listings during the pandemic. If you can't or don't want to touch a screen, then voice is an alternative option. The pandemic has given voice an enterprise moment to shine.
An example of how this could play out:
A store manager will ask an internal team to build a voice interface under an innovation budget that enables customers to get a product price by voice. The tech will get released and receive some PR because of the pandemic situation. Another department will then commission a 'calling in sick' voice app for staff which will help with staff logistics. The 10-year-old IVR system will then get updated because Salesforce releases a new platform that empowers remote contact centres. The mobile app team will integrate with Google Assistant for key features and sees daily engagement go up 25%; some developers will find the process hard and create better tooling, some of them will turn their tool into a product that other developers will then pay for.
Silicon Valley will drop $50 million into enterprise-grade cost-cutting technologies platforms; a few of these companies will exit for over a billion dollars. As this is all going on, a group of people will be trying to get a smart speaker approved for healthcare that is HIPAA compliant and after three years, they will get there. Hospitals won't care if this device costs $300 compared to $50 for an Alexa device; because after a year, that device will create an ROI of 5000%. The enterprise will then attract voice creators; open-source platforms like Rasa will grow and build up vertical-specific features. NLP will accelerate into more languages and regional dialects and then the nuclear winter will be over with a thriving ecosystem that can support the consumer voice industry too.
The biggest problem with the above is that it may not happen. With a vaccine rollout underway in the West, most of us will forget the last 18 months by the summer and businesses will start to cost cuts on fear of economic recovery. As we have seen from their pandemic response already, the East will be more cautious, with a growth in voice and face-powered experiences, though used on mobile devices rather than through smart ambient devices in homes.
The question is, will we take this opportunity to end the nuclear winter of voice...