Lexing III: Nested power!

String interpolation via recursive descent

The beauty of a recursive-descent parser is that certain constructions are very natural and easy to implement. Take string interpolation, for instance:

int := interpolation

def ingify(String s)  String:
    return $(s)ing

print String $int is the easiest way
       of $ingify print a formatted string.


# Should output:
  String interpolation is the easiest way
  of printing a formatted string

This is parsed by changing the tokenizer for the macro sequence. When it finds a dollar sign, it will consumes a single constituent token (as in $int above), or a single opening-macro sequence (as in $( ... ) above). A constituent token is emitted, and an opening-macro sequence is dispatched to the corresponding tokenizer (specified in the readtable).

This means that any extension of the language via the readtable-macro mechanism will immediately be valid inside a string interpolation. For instance, if one were to extend the language via some weird bignum notation bn/123, one could immediately write $bn/11.022 (or whatever) inside a formatted string, and expect it to work Admittedly not very different than writing $(bn/11.022) instead, which would work by calling the tokenizer for the ( macro character, but the elegance of the mechanism still shines through.

A parenthesis: multiple ways of denoting forms

If you are wondering what the double-parenthesis ⦅⦆ mean, I plan to use them to allow for mixing lisp-like notation with C-like notation for forms (function calling, macro application, etc). In fact, there will be multiple ways of denoting forms:

# All of the following are parsed into the same code,
  a form with three identifiers 'head', 'arg1' and 'arg2'
  whose meaning is *apply 'head' to 'arg1' and 'arg2'*

head( arg1, arg2 )

head arg1 arg2
head arg1, arg2
head: arg1, arg2

head arg1 arg2
head arg1, arg2
head: arg1, arg2

(head: arg1, arg2)

head
    arg1
    arg2

head arg1
    arg2

head:
    arg1
    arg2

head arg1:
    arg2

(head:
    arg1
    arg2)

head arg1
     arg2

And more combinations will be possible, although it won’t be possible to just mix and match all of those above. For instance (head arg1 arg2) means simply a tuple with the three identifiers, rather than a form (where the first element would be applied to the other two). Such a difference between tuple and form does not exist in Lisp, but it will make a lot of nice notation possible. This will be explained in the forthcoming posts dealing with parsing.

Comments

The same can be done for comments:

# The current position and velocity are $p and $v
  and the time elapsed since the last frame is $Δt

p := pΔtv

# The position at the next frame is $p

## The need to quickly comment code which may contain
   $-characters (which should be ignored) is solved
   via a *raw comment*, which is marked with a double dash ##


The expected behavior is that the escaped code inside the comments is evaluated whenever the program is compiled in debug mode, and while debugging the development environment will fill the values in place. Then the comments will have runtime functionality, as well!

Quotation

Like Lisp, Lyc will support macros. Macros are usually defined using quotation. quote ... or simply ‘...’ will mean the code which is the result of parsing .... Then a macro is a function which outputs code when given code.

Within quotation, the escape characters ~ and ~@ will serve the same function as , and ,@ in Lisp, namely evaluate the expression within the escape, and insert it in this code position (in the second case the expression is interpreted as a sequence, and each element is inserted in turn). For instance, one could define a macromap macro, which applies a given piece of code to each of its arguments:

defmacro macromap( head, ):
    items := ~head ~itm for itm in 
    return block ~@items

# For instance, we may now do
macromap assert
    my-var = 1
    test-something()

# and this will be macro-expanded into
block:
    assert my-var = 1
    assert test-something()

The beauty of having a recursive descent design is that there is really nothing special going on inside the code quotation. It is implemented via a recursive call to the standard parser.

To start getting a feel for the pleasantries of having multiple ways of denoting forms, here is an alternative way of defining the macro above:

defmacro
    macromap head
        
    set items
        for itm in 
           ~head ~itm
    return quote
        block
            ~@items