Implementing Language Processing

Intro

When writing crawlers, often I found myself trying to figure out what time was a post written when the String that inserted into the Database was “2 days ago at 16:20”. Thus I implemented its-a-date,  a Natural Language Date Description Processor, or simply – when given a Date description, it returns a Date object.

In the process of figuring out how to approach the implementation of a Date Description Parser, I realized that I have to use the Interpreter Design Pattern, in other words, I had to write a Programming Language. Mostly I learned how to do so from this guy. But this article is a shorter and much simpler version of how to implement a Programming Language as a solution for my problem.

First of all, what is the connection between Date Description and a Programming Language? Well, it seems that date descriptions tend to have pretty accurate syntax. Here are some examples.

3 months after 11/01/1990

Type: Relative
Value: +3 Months

Type: Absolute
(Point of Reference)
Day: 11, Month: 1,
Year: 1990

 

An other example would be:

 

2 days ago at 16:20

Type: Relative
Value: -2 Days

Point of Reference: Now (Default)
Type: Absolute
Hour: 16, Minute: 20

 

2 Different Affect Types: The Relative & The Absolute

It seems that date descriptions can contain two affect types, Relative & Absolute. First one, the Absolute, determines different time parts (e.g ‘Today’ – sets the date, or ‘4th’ sets the day of the month). Relatives indicate a movement in timeline according to a point of reference. Default point of reference would be This Very Moment, but if an Absolute part has been set, usually it turns to be the point of reference. For example ‘in 2 minutes’ refers to 2 minutes from now, but ‘2 minutes after midnight’ refers to the hour 00:00.

Note that the scope of an affect type, relative or absolute, is a specific time part. For instance if we take a look at ‘Yesterday, 2 minutes after midnight’  we can see that ‘Yesterday’ is a relative refers to Now, but ‘2 minutes after’ is a relative refers to the Absolute ‘midnight’. 

Some regulations must be applied. We can easily see that 2 Absolutes of the same time part are not allowed (e.g. ‘midnight at 2 pm’, two absolute hour affects). On the other hand there is no problem of chaining relatives of the same time part (e.g. ‘2 minutes before 2 minutes before now’, although it’s somewhat silly).

 

Tokens

After discussing the grammar, or the fundamental rules of Date Descriptions, it is time to talk about tokens. Tokens are the tiniest part of a syntax, it’s a Symbol, Word or a Phrase that has a determined action and a specific role in the language. In JavaScript, for instance, some of the tokens are: {, if, new, ) & case

In the program, tokens will carry the logic of the Affect Types discussed above. Every token must have a predefined meaning that the interpreter can resolve. Lets review some of its-a-date tokens:

 

febsnip

 

The regex determines whether or not a token is invoked. The Affects is an array of consequences this token is going to carry, in this example, the month time part will be absolutely 2. this may also be the point of reference in case of a description like ‘3 months after February’.

 
 

agotoken

Notice that here we got a token of some other kind. It doesn’t have an Affects array, instead it has an affectGenerator function. The purpose of this function is to return an array of affects, given the values from the regex capturing groups. Notice that not only the value of the relative affect is determined by the matches, also the time type. When the description is “5 days ago”, the day part is altered, but if the description is “5 minutes ago”, the minutes are altered.

 
 

hourtoken

In this token example, we can see the affect generator that we recognize, but the special case here is that a single token have 2 affects. One over the hour, and the second over the minutes.

 

Interpreter

It is time to leave you with the riddle of how to implement the code that translates these tokens, and others, into a date object. If you are eager to fully understand how it actually works, I feel it is best understood when debugging. I welcome you to download the small project from Github, or just hit:
npm install its-a-date --save
and then…

 

itsadate

 

If you feel like contributing to the project, pull requests will make me happy. Implementing tokens for further more foreign languages can be a good practice of this article’s ideas, and as well very appreciated.