The advantages of a Natural Language Processing interface

(written by lawrence krubner, however indented passages are often quotes). You can contact lawrence at: lawrence@krubner.com, or follow me on Twitter.

“admin” of Daily Hacker News is critical of my ideas regarding a Natural Language Processing interface for software.
Some of their points conflate Natural Language Processing with the text version of Natural Language Processing, and right now I’m mostly working on Natural Language Processing via voice interfaces, so I’m going to reply with voice examples, so as to remind everyone that Natural Language Processing is a broad topic.

“admin” starts with this point:

Let’s go back to Lawrence’s example. Posit, for a moment, that the UI works as intended: you can type

I want to edit the info about my contact Jenny Hei.

and the app gets ready to do just that. Awesome, right?

But then they make a bizarre swing and they start talking about writing computer programs this way:

If you’re a programmer… is this how you want to program? Do you normally write in COBOL?

SUBTRACT A B C FROM D GIVING E

Admit it, you thought it was pretty neat when C let you say

a++

That is silly. Natural Language Processing has a lot of important applications, but computer programming is not one of them. Many things in life has some great uses, and then other uses for which they are less efficient. Wheat is good at feeding people, but its less efficient when used as a building material. Money is good for buying things, but its less efficient when used as a material for fire.

“admin” then almost accidentally hit upon a standard tactic of streamlining, but they don’t seem to realize this:

If you actually had to use Lawrence’s interface, you’d breathe an enormous sigh of relief if someone installed a mod that let you type

EDIT CONTACT JENNY HEI

Indeed, shortened versions of longer sentences are standard, and longer versions are supported only because some people feel they are natural. Amazon’s guidelines for Voice Interfaces recommends this.

I am working on a voice integration for Salesforce, via an Amazon Echo. Here are 3 intents with 2 variations:

GetCompany about {Company}

GetCompany about {Company} restaurants

GetWhatIsTheValueOfOpenOpportunities about open deals for {Company}

GetWhatIsTheValueOfOpenOpportunities about open deals for {Company} restaurants

GetWhatChanceDoWeHave what chance we have with {Company}

GetWhatChanceDoWeHave what chance we have with {Company} restaurants

Why do I allow the word “restaurants” at the end? Because some people find it natural. But I agree with “admin” that people often want shorter interfaces, especially as they become comfortable with the experience. So we make it possible for them to say both:

Alexa, ask Cricket about Big Steak Grill restaurants

and:

Alexa, ask Cricket about Big Steak Grill

This is incorrect:

And you’d be even happier if the mod allowed you to hit the Contacts button, type J in the search box which is automatically enabled, and hit enter.

I have the impression that they have not talked to many sales people. Many people hate GUI software. They do not want to type a J in a search box.

This is true:

It’s not that interfaces can’t get too arcane! You can get a lot done if you know EMACS really well… but for most people it’s about as easy to master as quantum mechanics.

I love Emacs and I’ve been a devoted fan of Emacs for 10 years, but I would not recommend it for non-programmers.

This is silly:

A WYSIWYG word processor is much nicer. But notice that we don’t edit by saying

HIGHLIGHT THE WORD “CAN”

THE NEXT ONE IN THE FILE PLEASE

REPLACE IT WITH THE WORD “CAN’T”

BOLDFACE IT

MOVE THE CURSOR TO THE END OF THE DOCUMENT

Who has time to type all that? Or say it, for that matter?

Gosh, wasn’t there once a great and mighty civilization in which powerful business people hired secretaries and then spent most of the day in an activity known as dictation? Oh! That’s right! That was our civilization! This was considered normal behavior for about 100 years:

Betty, take a memo. To the purchasing team in Brazil. We must cut cost by 5%. Push suppliers on prices or find new suppliers. Underline this: Volume must not be cut, but prices must be cut. End underline. Deadline is the next summer season. Sincerely, Roger Morton.

Automating this has been a goal of computer programmers since around the time computers were first invented. Secretaries have disappeared. Typing will soon disappear for a wide range of tasks.

This is completely wrong:

And this is all assuming that you can program a computer to understand spoken commands. Lawrence’s team evidently didn’t realize that what they were asked to do was implement an AI, at a level that has never been done.

Obviously, you can program a computer to understand spoken commands. There are dozens of good software packages out there, and they get better all the time. The difficulty we faced was building the Finite State Machine that could track of accumulated meaning of a conversation, including tracking the mistakes that we needed to guide the user away from. And that is why I’m excited to build on a platform such as Amazon’s, which offer a lot of tools to help programmers along.

This bit finally gets things correct:

That is, you create a toy world, not unlike Terry Winograd’s SHRDLU, and work out an English-like code to manage it which is rigid in its own way, but happens to recognize a lot of keywords in different orders. For instance, maybe it can handle all of

I want to edit the info about my contact Jenny Hei

Edit the record under Contacts for Jenny Hei

Search for Jenny Hei in Contacts.

Find Jenny Hei using the Contacts file and let me edit.

But “admin” seems to think they’ve made some kind of revolutionary discovery when they write:

Good work! Now are you quite sure you also allowed these?

I should like to modify the particulars about Jenny Hei, a contact.

Get me Contacts; I’m’a edit Jenny Hei’s record.

Change Jenny Hei’s name to Mei. She’s under Contacts.

That record I added yesterday. Let me change it.

And this is very silly:

And that’s before we even get into incomplete or ambiguous queries! The user leaves off the key word “Contacts”, or isn’t clear if they’re adding or editing, or gives the name wrong, or gives the name right only it’s recorded wrong in the database,

What “admin” is trying to describe is what Amazon refers to as “partial intent.” How do you handle it when you can only partially infer what the user wanted? There are reasonable ways to work with partial intent, and one often has to do so in both GUI and non-GUI interfaces. Again, many companies have already written about this issue, and Amazon offers a useful guide for designing Voice Interfaces.

As Amazon says:

Users are unpredictable, so you should also expect them to express just a subset of what is required for you to take action on their request. These are “partial intents.” Here’s an example:

User: “Alexa, ask Astrology Daily for my horoscope.”

Astrology Daily: Horoscope for what sign?

In this example, the user expressed an intent (for my horoscope), but did not give a required slot (a zodiac sign). If this occurs, you should recognize what is missing and follow up with the user to “fill in the blanks”:

User: “Alexa, ask Astrology Daily for my horoscope.”

And Amazon suggests this handling to “fill in the blanks” of the partial intent:

Astrology Daily: Horoscope for what sign?

User: Leo.

Astrology Daily: Today’s outlook for Leo is…

This is good and true:

The more you produce the illusion that your app is intelligent, the more users will assume it’s way more intelligent than it is.

This is silly:

And when that fails, they will be just as annoyed and frustrated as if they had to learn to push the Contacts button in the first place.

The correct fallback behavior for the app is to list the options that the user has. Done well, this offers more “discoverability” than we’ve managed to get out of GUI apps, despite 31 years of effort (starting with the 1984 Mac).

This does not make sense if it was meant as a comment on what I wrote:

One of programmers’ oldest dreams, or snares, is to write an interface that’s so simple to use that the business analyst can write most of the app. I’ve fallen for this one myself, more than once! The sad truth is even if you do this task pretty well, non-programmers aren’t going to be able to use it.

Again, I am puzzled why “admin” would raise the issue of computer programming. I never raised the issue. If “admin” is merely offering the opinion that a Voice Interface won’t work for computer programming, then I agree. But if “admin” thinks this somehow invalidates the use of Voice Interfaces for various tasks unrelated to computer programming, then they are wrong.

About this:

it doesn’t come easily to non-programmers to think in small steps, to remember all the exceptions and hard cases before they come up, or to understand the data structure implied by a process.

This is an argument for a Finite State Machine that can manage the states of the conversation and offer prompts whenever the user goes outside of the allowed bounds of the conversation. And if you read the guidelines now being published about voice interfaces, this is regarded as a “best practice”.

And this final bit is especially silly:

The irony is that Lawrence, later in the article, runs into exactly the same situation with other developers, but doesn’t make the connection. What devs hate doing is documentation. Lawrence wants his fellow devs to keep a couple pages in the wiki up to date with their APIs, and they just won’t do it, unless he nags them to death. Is this a UI problem, as Lawrence thinks SAP has? No, it’s a motivation problem, or a mental skillset problem, or something… and whatever it is, it’s even harder than natural language programming.

As if motivation and UI are entirely unrelated! The assumption here seems to be that the difficulties of the interface don’t have any impact on people’s motivation. That is a bit crazy as an assumption.

The UI of documentation is fundamental to how much documentation takes place. That is why most IDEs try to automate the process of creating documentation. And if you can get every developer on the team to use the same IDE, then this can simplify the generation of a default documentation, and increase the total amount of documentation. Of course, with documentation, quality is typically more important than quantity. Mere repeating of the signature of a function doesn’t actually help anyone develop. There is a trade off, and there is also the cost of forcing developers to give up their favorite IDE and switch to whatever the team is using. In our case, it seemed useless, since the iPhone developer wanted to work in XCode and no one else wanted to use XCode for their Clojure or Java development. So in our case, a heterogeneous mix of development styles was a given. Presumably, we gained a productivity boost by having each developer work in the IDE they were most comfortable with, though that meant that we then had to make an effort to generate documentation via a poor UI, and the poor UI cost us some of the productivity. Was the gain worth the loss? I would guess yes, as the loss was minor — it amounted to me having to occasionally harass Shinzo and Pranab, so they would update their documentation.

Over the next 2 to 3 years we will see an explosion of apps that rely on a voice interface. People like “admin” can perform a useful service when they reminds us of the difficult aspects of this task. However, they do not perform a useful task when they simply suggested that the task is impossible.

Post external references

  1. 1
    http://dailyhackersnews.com/2015/12/04/no-you-dont-need-a-natural-language-interface/
  2. 2
    https://developer.amazon.com/public/solutions/alexa/alexa-skills-kit/docs/defining-the-voice-interface
  3. 3
    https://developer.amazon.com/public/solutions/alexa/alexa-skills-kit/docs/alexa-skills-kit-voice-design-handbook
Source