Dialogue designers replace graphic designers when creating voice interfaces

(written by lawrence krubner, however indented passages are often quotes). You can contact lawrence at: lawrence@krubner.com, or follow me on Twitter.

Over the last 30 years we’ve gotten used to the Graphical User Interfaces in software that appears on a computer screen. And most of us (who work in the tech industry) have had the experience of working with a person who, depending on their skills, will be given a title such as “graphic designer” or “user experience designer” or sometimes “product designer”.

Working with the Amazon Echo, myself and my partner have come to realize that we still need a person who shapes the user experience. Obviously, with a voice interface, there is no graphic design. But we’ve found the need to focus great attention on what the experience of talking is like, and what trigger words feel the most natural, and what non-trigger words need to be allowed because people tend to say them anyway.

Thus, we see a role taking shape on the tech team: script designer, who can take responsibility for designing the user experience, when that experience is vocal. Thankfully my partner seems to have a good feel for this.

Some people in the video game industry might retort “This is nothing new, we’ve had people writing dialogue for 30 years.” But what I’m talking about here is a different thing: not the dialogue spoken by characters in a game, but the dialogue spoken by the user to make your software do something. And understanding what people tend to say, and want to say, when they talk to a computer is an important element of the user experience, and it’s a fairly new kind of experience.

Game designers can claim to have pioneered this interface where text based adventures are concerned. “Walk north” and “Open door” and “Pick up sword” can be thought of as the embryonic beginning of this kind of interface. But when dealing with the voice, for better and for worse, there is more ambiguity about what the user just said, and there are fewer ways to give them clues about what they should say, and working around those 2 limits are the crucial part about creating voice interfaces.

The Jack Principles, which arose from the game You Don’t Know Jack, are worth studying for insights into how to proceed with this kind of interface:

An iCi program has a particular defining quality that separates it from all other forms of communication: it feels like someone is talking with you. Indeed, an iCi program appears to create a continuous conversation between the character in the program and the human sitting in front of the screen.

This doesn’t mean the conversation can be about anything the way a real human-to-human conversation can. The topic is constrained by the goals and design of the program’s creators. If the host of a program asks you:

“So, do you like books about politics?”

As the user of the program, you can’t respond by saying, “Speaking of politics, what do you think about this silly confirmation hearing going on in the Senate?” You can only respond with one of the choices the program recognizes (which in this case might be “Yes”, “No” or “Sort of”).

It appears, however, that in a well-designed iCi program, the audience will accept those inherent limitations without question. They accept those limitations so that they can buy into the illusion that the character is really talking to them. At a movie, we find ourselves watching a sweet, little orange alien with a penchant for
Reese’s Pieces and allow ourselves to forget that such a creature doesn’t actually exist. We do this quite naturally. It allows us to be entertained by the film. This phenomenon is commonly known as “the suspension of disbelief.”

iCi programs can work because a user will suspend her disbelief and buy into the illusion that the character is really talking to her.

“So, do you like books about politics?”

You select “No.”

“No? Not even historical politics? Lincoln-Douglas debates…that sort of thing? Any interest in that?”

You select “No.”

“O.K., no problem. I won’t recommend any books on politics….
….How about sci-fi? Are you interested in science fiction at all?”

In this case, you are responding by just saying “no.”

If it were a real conversation with a human, you’d probably say “no” and start explaining why you don’t like politics. With an iCi program, however, it appears people will naturally accept limitations on their responses. Thereby, the suspension of disbelief becomes possible.

This is the big advantage of the game:

You can only respond with one of the choices the program recognizes (which in this case might be “Yes”, “No” or “Sort of”).

The game has an easy time expressing its limits. With a voice interface, this is much more difficult, and a good dialogue designer works hard to communicate the options using the fewest possible words (so as to not bore the user).

Post external references

  1. 1