If the title of this topic has puzzled you even a little, then just rethink of ‘Jarvis’ from the Iron Man movie series. The voice user interfaces (VUIs) of today are no different, however, they are still behind in achieving the intelligence that Jarvis had.
A good thing to note is that where we have reached in terms of voice-enabled assistants was not even possible a few years ago because of the lack of computational power to process it. Recently, due to advancement in technologies and emergence of new ways to store data such as cloud computing, virtualization, and GPU acceleration, it is now possible to make AI-enabled voice apps.
Still, many aren’t familiar with these voice-enabled assistants. To help understand the concept better, think of voice-based products as personal assistants that can answer your questions by speaking. You can ask them about anything, from the weather today to your next meeting, and much more. They can even search public databases to find your answer where applicable. In case they are unable to answer, they will simply say they weren’t able to find a relevant answer to your question and forward your query to Google or Bing (or any other major search engine of your choice). And the best part? They learn by their interactions. Unlike the voice-enabled answer machines, the bots that we are talking about use complex machine learning algorithms to learn from their conversation with the people and to form an image about their traits.
Amazon and Google have already come up with a few voice-based products including Amazon Echo and Google Home. Physically, these devices resemble speakers, which can be activated using voice commands. For example, Amazon echo is activated when the user says ‘Alexa.’ Similarly, Google home is activated when the user says ‘Google Now or Ok Google.’
It is only a year since the launch, but these devices are spreading fast in the market. This has brought a new possibility for developers and they are now coming up with innovative ideas to make apps for them.
But a major challenge is the user interface design for these voice-first systems. Most designers and developers are only familiar with desktop and mobile interfaces for click and touch interactions. These are easier to understand because the user is engaging with the apps through his screen, which means more focus on visual aesthetics. As all information is available on the screen, therefore the only problem left with the designer is to make it display in a visually intuitive manner. But with voice-enabled interfaces, the story is a little different. Voice products don’t have display screens. Outputs are provided through voice alone. Also, unlike screens where a ton of data can be represented to the user, a voice-based system can only speak a few words in a minute, which decreases the quantity of output.
That’s why it is crucial to pay heed to the design of Voice UIs so that the users don’t face any trouble while using the products. Here are a few tips to improve the user experience for voice-based products.
Express intentions explicitly
Focus on this interaction between two users:
A: Get two Big Macs and a Happy Meal
B: Anything to drink?
By reading this discussion, what do you think of the context? McDonald’s, right? But, now say the same thing to a voice-based system. Will it answer the same? Nope. It won’t even understand the context. Designing for Voice UI requires explicit instructions that can help it understand the intent of the user.
- Missing context can be a problem: Humans interact with others on a regular basis and they know what is the nature and the context of the conversation taking place. If they don’t, it is much easier to comprehend since they are pre-equipped with rotary sensors such as eyes, nose, ears, and hands. And, when they interact with a device, they expect the same results from it. If they don’t get the desired response, it will be displeasing and they won’t use it again.
But devices don’t work like that. Even the cleverest bot made till today won’t be able to understand the answer to the above conversation.
- Interaction is the key: Amazon has employed a pretty clever technology in its assistant that replies to the user when he/she asks something that Alexa doesn’t know about. For example, ‘Alexa, what am I thinking?’ To this, the Alexa would reply, ‘you are thinking that if Alexa guesses what I am thinking then I would freak out!’
The answer is conversational and it gives the users of the product a feeling that they are not just talking to a cold-hearted battery-powered machine, but a lively bot with personality.
Need to design a product?
Our team of user-experience designers can turn a project brief into a visual prototype, collaborating with you every step of the way.
Provide Visual Feedback
Another thing to note while designing user interfaces for voice-based products is that humans need feedback. So, if a user asks something from the voice bot and it stays silent, this will offend the user. Instead, adding some kind of cue to let the user know that the product is listening is a better idea.
- Is the VUI all ears? Amazon and Google both have added light-enabled cues to the app. When a user asks something from echo by saying ‘Alexa’ it illuminates the blue circle on the top. This is a way of telling the user that ‘Yes, I am listening to you.’
- Variations in cues: Most voice-enabled products display a wave visualization when they are speaking. This lets the user know that the bot saying something or going to say something, even if the volume is turned off. It also increases trust. In fact, the Mycroft’s Mark 1 displays an electromagnetic pulse on the screen while listening, and mimics the lip movement through LEDs while speaking.
A good way to improve visual feedback is by adding a non-verbal cue whether audible or inaudible to the system. For example, when the user says the name of the bot such as ‘Alexa,’ then it should return either ‘Yes’ or ‘Hmm?’ to make it sound more human.
Limit information display
When you call a customer support center, you are greeted with a robotic voice followed by a few numbers that can lead to further actions. For example, press 1 for this, 2 for that, and so on.
- Add designing empathy: While designing the interface of the voice-based product, ask who your users are. Are you designing a language-specific app? What if the assistant is required to answer a lot of numbers at once. Will the user be able to retain that kind of information? Designing empathy is a crucial aspect of the VUI’s flow.
For example: “How much will bitcoin rise this year and should I invest in it?”
Instead of answering, the bitcoin price is $8000 and it will likely increase to $9000, tell the user a bit about why it is better for them to invest this week. The system could say, “it is a good time to invest in bitcoin because according to coin lab, the bitcoin value will surge past $9000 this week before it dives back to $7000 in the next few months.”
- Interface friction: Another thing to note is the limitation of the VUIs. While driving and wanting to send a message to your mom, Siri will be your best choice. But, it will take too much time if the user asks it to read every email in his inbox.
Similarly, if the user wants the map of a location and a voice interface can’t show it this doesn’t mean it should show an error message. It can say instead, ‘Oops, I am still working on maps integration,’ so that the user knows that the bot knows its limitations but is still interacting.
Set Realistic User Expectations
But the good thing is that users won’t be blank when working with VUIs because you can add user guides with proper examples to help them understand what they can expect from the voice products.
Guides with examples
For example, the Google Home mini and max come with a few conversational style questions that you can ask it just to get started. These are close-ended questions or commands. For example:
- Where is the nearest flower shop?
- Remind me to buy cake at 5 PM today
Adding an intro video, or tour for the same will also be a good idea to make users understand the type of conversations they can have with the voice assistants.
Using prompts wisely:
A problem that engineers are concerned in VUIs is adding pings for the success of a task. Let’s say you want to set a reminder for today at 5PM. The conversation will flow in this way.
“Ok Google, remind me to start homework at 5PM today”
Google home will reply, ‘your reminder for start homework is ready.’ And, when it’s 5PM, it will say, ‘I have one reminder for you.’
Alexa doesn’t show the success message. It only says, ‘setting reminder for 5PM.’
Keep Users Informed About All Possible Interactions
Where videos and user guides are a good way to get the user started, there will always be instances where the user’s queries won’t be interpreted by the VUI. In that case, spelling out all the options in front of the user will be a good idea.
Take the below image as an example:
Source Oreilly
Using a prototyping tool to display branches of the interactions taking place between the user and the system for open-ended cues will make a good starting point for the VUI product. For example, ‘Play a song for me,’ can be answered with an easier choice instead of saying you have 128 songs, which ones would you like to play, interact with him more personally.
Alexa does a good thing about it. When it is asked to sing me a song. It says, me? I can’t sing! And then, starts singing.
Need to craft a memorable user experience?
We’ll collaborate with you to build a user experience that addresses the specific needs of your product and its end-user.
Final words:
To sum it all up, the UX design of voice assistants is still in its infancy. But that doesn’t mean the developers can’t play with these and come up with things that the users haven’t thought of. But while innovating design for voice products, the developers need to take care of a few things such as:
- Explain the intention of the message and design personally styled error messages so users don’t get confused.
- The products should be designed so they send cues through their physical design as well as voice. Like, illuminated lights that dim or brighten up with the volume.
- Design close-ended questions and provide choices to the users out of a few options if the VUI doesn’t understand their command
What more ways can we make VUIs retentive? Share your ideas by commenting below.
Also, check out how IoTs are making an impact in the education sector?