2020 Voice Forecast: Wake Conditions
January 21, 2020
Implicit wake conditions are the key to natural — and even intimate — voice experiences in 2020 and beyond.
Currently, wake words alert a device that it should listen to what the person is about to say next, which may contain an invocation which, in turn, is a trigger phrase that prompts the assistant to connect the user to a specific voice app. If the person speaking asks for a skill or action by name, that is referred to as an explicit invocation, as opposed to an implicit invocation.
The Actions on Google documentation describes implicit invocation as “an interaction flow that occurs when a user makes a request to perform some task without invoking an Action by name. The Google Assistant attempts to match the user’s request to a suitable fulfillment, such as an Action, search result, or mobile app, then presents recommendations to the user… implicit invocation provides a way for users to discover your Action via the Assistant.” The strategy behind implicit invocations puts an emphasis on the discoverability of a potentially helpful voice app that a user may not be familiar with by name, based on their context. However, this isn’t valuable if the user is already very familiar with the voice app, and may use it daily.
Furthermore, zoom out and consider the user’s relationship to the device-level assistant — something that they may speak to several times a day. In the context of the device-level assistant, lowering barriers to invocation is not about discoverability, it’s about building a relationship and deepening intimacy.
Multi-turn exchanges are becoming slightly more prevalent in voice interactions, but since the initiation of a voice request depends on verbalizing the “wake word,” like Hey Alexa, the user will find themselves invoking, and re-invoking, and re-re-invoking the action or skill in order to continue the conversation that the user believes that they’re having. Consider how awkward the same experience model would map to a human-human interaction, which typically consists of several potentially (or seemingly) unrelated topics within one larger conversation. A wide chasm still exists between the user’s mental model and the actual functional reality of voice assistant interactions.
The use of wake words keeps our relationship with voice assistants at arms length, even though they are becoming increasingly omnipresent in our home and work lives.
By using more human conversation cues into device wake conditions, I hope that we can go from people saying “I’m worried about it listening to me all the time” to “when I am speaking to my assistant, I expect to be listened to”.
For years, the Amazon Echos have used its blue ring light to indicate directional listening, the Apple iPhone X with Face ID was released in 2017, and in 2019 the Google Pixel 4 shipped with Motion Sense (Project Soli) directional interaction indication.
By the end of 2020, I anticipate voice design moving away from explicit wake words (removing the need for saying “Alexa” or “Hey Google”) to implicit wake conditions, such as recognizing when a user is looking at or turns to face a device housing the assistant.
Taking cues from human interaction, there are many ways to implement implicit wake conditions, including:
- Gaze detection
- Face recognition
- Hand tracking
- Gestures, microgestures
- Bone induction via wearables
- Body position detection
One challenge that will continue to plague conversation designers will be appropriately determining when a conversation is over — since conclusion cues may not necessarily be the inverse of wake conditions. For example, someone may initially make eye contact with someone — or a device — to implicitly start a conversation, but that doesn’t mean they won’t turn away over the course of the interaction, while still continuing to want to chat. Google’s Motion Sense shows the most promise to alleviate this pain point since its full feature state allegedly has the capacity to recognize body cues that typically start and end interactions, e.g. when two people make eye contact with each other. How do humans decide when a conversation is over? Even between two people, there is the distinct possibility for awkwardness at the end of conversations.
Implicit wake conditions will provide unique opportunities for developers and designers to imagine voice-only and voice-first experiences that feel unobtrusive, intuitive, and intimate, since they will be based on normative human actions.
Thu May 5
Connector Spotlight: Chris Kim
Not only is she often characterized as among the best and brightest - Chris Kim, Talent Acquisition Coordinator here at Connected, also has an undeniable knack for attracting the same. With a background stronger than steel but a soft spot for diversity, equity, and inclusion, Chris always brings her humble, honest, and hilarious self to everything she does. Just don't ask her how her favourite basketball team did in the playoffs.
Thu Apr 28
Connector Flashlight: Hassaan Abbasi
Connector Spotlight is a series where we shine a metaphorical light on Connectors to learn more about who they are, what they do, and what they like. While we have no shortage of star power to satisfy the series (just shy of 3 years worth, by my count), we also have some incredible interns who work here, too. And while it may not be a spotlight, we'd like to illuminate one intern in particular with a brand new, intern-centric series entitled Connector Flashlight. Easy-going, insatiably curious, and we'll never forget your face when you found out what Project Rhubarb REALY was. Hassaan, we'll miss you, but have little doubt you're on your way to doing great things.