Lexing III: Nested power!
04 Oct 2015String interpolation via recursive descent
The beauty of a recursive-descent parser is that certain constructions are very natural and easy to implement. Take string interpolation, for instance:
int := “interpolation”
def ingify(String s) String:
return “$(s)ing”
print “String $int is the easiest way
of $⦅ingify “print”⦆ a formatted string.”
# Should output:
String interpolation is the easiest way
of printing a formatted string
This is parsed by changing the tokenizer for the “
macro sequence. When it finds a dollar sign, it will consumes a single constituent token (as in $int
above), or a single opening-macro sequence (as in $( ... )
above). A constituent token is emitted, and an opening-macro sequence is dispatched to the corresponding tokenizer (specified in the readtable).
This means that any extension of the language via the readtable-macro mechanism will immediately be valid inside a string interpolation. For instance, if one were to extend the language via some weird bignum notation bn/123
, one could immediately write $bn/11.022
(or whatever) inside a formatted string, and expect it to work Admittedly not very different than writing $(bn/11.022)
instead, which would work by calling the tokenizer for the (
macro character, but the elegance of the mechanism still shines through.
A parenthesis: multiple ways of denoting forms
If you are wondering what the double-parenthesis ⦅⦆
mean, I plan to use them to allow for mixing lisp-like notation with C-like notation for forms (function calling, macro application, etc). In fact, there will be multiple ways of denoting forms:
# All of the following are parsed into the same code,
a form with three identifiers 'head', 'arg1' and 'arg2'
whose meaning is *apply 'head' to 'arg1' and 'arg2'*
head( arg1, arg2 )
head arg1 arg2
head arg1, arg2
head: arg1, arg2
⦅head arg1 arg2⦆
⦅head arg1, arg2⦆
⦅head: arg1, arg2⦆
(head: arg1, arg2)
head
arg1
arg2
head arg1
arg2
head:
arg1
arg2
head arg1:
arg2
(head:
arg1
arg2)
⦅head arg1
arg2⦆
And more combinations will be possible, although it won’t be possible to just mix and match all of those above. For instance (head arg1 arg2)
means simply a tuple with the three identifiers, rather than a form (where the first element would be applied to the other two). Such a difference between tuple and form does not exist in Lisp, but it will make a lot of nice notation possible. This will be explained in the forthcoming posts dealing with parsing.
Comments
The same can be done for comments:
# The current position and velocity are $p and $v
and the time elapsed since the last frame is $Δt
p := pΔtv
# The position at the next frame is $p
## The need to quickly comment code which may contain
$-characters (which should be ignored) is solved
via a *raw comment*, which is marked with a double dash ##
The expected behavior is that the escaped code inside the comments is evaluated whenever the program is compiled in debug mode, and while debugging the development environment will fill the values in place. Then the comments will have runtime functionality, as well!
Quotation
Like Lisp, Lyc will support macros. Macros are usually defined using quotation. quote ...
or simply ‘...’
will mean the code which is the result of parsing ...
. Then a macro is a function which outputs code when given code.
Within quotation, the escape characters ~
and ~@
will serve the same function as ,
and ,@
in Lisp, namely evaluate the expression within the escape, and insert it in this code position (in the second case the expression is interpreted as a sequence, and each element is inserted in turn). For instance, one could define a macromap
macro, which applies a given piece of code to each of its arguments:
defmacro macromap( head, ):
items := ‘~head ~itm’ for itm in
return ‘block ~@items’
# For instance, we may now do
macromap assert
my-var = 1
test-something()
# and this will be macro-expanded into
block:
assert my-var = 1
assert test-something()
The beauty of having a recursive descent design is that there is really nothing special going on inside the code quotation. It is implemented via a recursive call to the standard parser.
To start getting a feel for the pleasantries of having multiple ways of denoting forms, here is an alternative way of defining the macro above:
defmacro
macromap head
set items
for itm in
‘~head ~itm’
return quote
block
~@items