More NLP for Clojure

(written by lawrence krubner, however indented passages are often quotes). You can contact lawrence at:, or follow me on Twitter.

This looks fantastic:

To do its magic, postagga extracts the phrase structure of your input, and tries to find how do this structure compare to its many semantic rules and if it finds a match, where in this structure shall he extract meaningful information.

Let’s study a simple example. Look at the next sentence:

“Rafik loves apples”
That is our “Natural language input”

First step in understanding this sentence is to extract some structure from it so it is easier to interpret. One common way to do this is extracting its grammatical phrase structure, which is close enough to what “function” words are actually meant to provide:

Noun Verb Noun
That was the phrase structure analysis, or as we call it POS (Part Of Speech) Tagging. These “Tags” qualify parts of the sentence, as the name imply, and will be used as a hi-fidelity mechanism to write rules for parsers of such phrases.

postagga has tools that enable you to train POS Taggers for any language you want, without relying on external libs. Actually, it does not care about the meaning of the tags at all. However, you should be consistent and clear enough when annotating your input data samples with tags: On the one hand, your parser will be more reliable and on the other hand, of course, you’ll do yourself a great favour maintaining your parser.

Now comes the parser part. Actually, postagga offers a parser that needs semantic rules to be able to map a particular phrase structure into data. In our example, we know that the first Noun depicts a subject carrying out some action. This action is represented by the Verb following it.Finally, the Noun coming after the Verb will undergo this action.

postagga parsers just lets you express such rules so they can extract the data for you. You literally tell them to take the first Noun, call it Subject, take the verb, label it action and the last Noun will be the Object. and package all of it into the following data strucutre:

{:Subject “Rafik” :Action “Loves” :Object “Apples”}
Naturally, postagga can handle much more complex sentences !

postagga parsers are eventually compiled into self-contained packages, with no single third party dependency, and can easily run on servers (Clojure version) and on the browser (ClojureScript), so now your bots can really get what you’re trying to tell them!

Post external references

  1. 1