Trying to explicitly enumerate everything that is true is hopeless

(written by lawrence krubner, however indented passages are often quotes). You can contact lawrence at:, or follow me on Twitter.

This is good:

Let’s take a look at some existing representations. The most famous representation is WordNet. In WordNet, the symbols are groups of words that have the same meaning, called synsets. One synset could be the set consisting of “car” and “automobile.” Each word can be in multiple synsets. For example, “bank” could be in the synset that means river bank and also in the synset that means a place where money is deposited. There are a few kinds of relationships between synsets, such as has-part, superordinate (more general class), and subordinate (more specific class). For example, the synset containing “automobile” is subordinate to the one containing “motor vehicle” and superordinate to the one containing “ambulance.”

ConceptNet is a representation that provides commonsense linkages between words. For example, it states that bread is commonly found near toasters. These everyday facts could be useful if you wanted to make a boring chatbot; “Speaking of toasters, you know what you typically find near them? Bread.” But, unfortunately, ConceptNet isn’t organized very well. For instance, it explicitly states that a toaster is related to an automobile. This is true, since they are both machines, but trying to explicitly enumerate everything that is true is hopeless.

Of course, it seems like Wikipedia already has all the information that computers need. We even have a machine-readable form of Wikipedia called DBpedia, and DBpedia and WordNet have been combined into a representation called YAGO (Yet Another Great Ontology). YAGO has good coverage of named entities, such as entertainers, and it was used by Watson to play Jeopardy!, along with other sources. YAGO and DBpedia contain a lot of facts, but they aren’t the basic facts that we learn as young children, and their representations are shallow.

We need deep representations because classification and hierarchy are efficient ways of specifying information. The better your organization, the more power your statements about the world have, and many statements aren’t even necessary, like saying that a toaster is related to an automobile. One representation that organizes concepts down to the lowest level is SUMO (Suggested Upper Merged Ontology). For example, in SUMO, “cooking” is a type of “making” that is a type of “intentional process” that is a type of “process” that is a “physical” thing that is an “entity.”

Post external references

  1. 1