NLP with Clojure and OpenNLP

(written by lawrence krubner, however indented passages are often quotes). You can contact lawrence at:

This is a fantastic introduction:

Natural Language Processing (NPL) opens the door to the possibility of turning otherwise inert text into meaningful or, more interestingly, actionable information. It is the latter that I am interested in and what this installment will focus on. I will explore the basics of NLP using the OpenNLP library and Clojure to convert a sentence into a useful structure to store or act on. More specifically, my goal is to take simple sentences that indicate the desire to create a meeting request or an appointment and extract the date, duration and participants.

The applications of this are obvious. You can easily imagine turning an email message containing the sentence “Please schedule a meeting with Adam Smith and Sally Keynes, on November 22 2015, at 1:30pm, for 1 hour, to discuss the perils of economic forecasting.” into an appointment in which [Adam Smith, Sally Keynes] are identified as the participants, [Start => 2015-11-22T13:30:00, End => 2015-11-22T14:30:00] becomes the appointment start and end time, and [discuss the perils of economic forecasting] is identified as the appointment subject.

The syntactical variations possible to express the same intent can be quite large. For example, a more terse variation could be “Need meeting with Adam Smith and Sally Keynes on Nov 22 at 1:30pm to discuss the perils of economic forecasting”. The first obvious difference is that this is fragment. As opposed to the liberal use of commas on the first example, this sentence omits all commas and so is the year. This is just one possible variation.

The large number of possible variations makes it very hard to leverage regular expression or other type of parsing to extract the relevant data from natural text. In what I plan to be the first of several installments about NLP, I will explore the basic concepts of applied NLP with the modest goal of creating calendar entries from plain text such as the one illustrated above. To this end I will leverage the OpenNLP library via the Clojure clojure-opennpl API.