Readtable-Macro Transducer-Chain Parsing

This repository contains an implementation of a framework for using the readtable-macro transducer-chain (RMTS) parsing method.

This is described in the following three blog posts:

Note

You may jump to the full Table of contents below. Following is an overview of the source code.

Structure of the source code

Common

Several utilities used throughout the project.

  • Context - A python class to simulate variables with dynamic scope (Lisp’s special variables).
  • Errors - A collection of exceptions.
  • Record - A python dict accessible via getattr/setattr (so one can write record.item instead of record["item"]).
  • StringStuff - Some functions to manipulate strings, and information about characters with non-single-space width.
  • SysArgsParser - A utility to parse command line arguments.
  • Util - General utility functions.
  • Unused by this project, needs to be refactored/moved:
    • Hierarchy
    • Instance
    • Memoize

Streams

Self-contained implementation of an object from which characters can be read, while keeping track of offset, and line and column numbers. This probably should be replaced with Python’s own TextIOBase class. But it was much simpler for me to just have a class with exactly what I needed, instead of worrying about implementing whatever Python needs. Will refactor this at some point.

Syntax

Syntax types.

  • Code - The base class for all syntax types.
  • Identifier - An identifier.
  • Literal - A Literal.
  • Node and Element - A Node is a doubly-linked list, where each Element in the list holds (a reference to) an instance of Code. Currently we define 6 types of nodes:
    • Form - A Form denotes a Node where the first element is singled out, and called the head of the form. It is usually understood that the first element is applied to the remaining elements, called arguments of the form. One may denote a Form by head( arg1, arg2, ...) or ⦅head arg1 arg2 ...⦆.
    • Tuple - A Tuple is a Node where no element is singled out. It usually represents a sequence of elements, taken together. One may denote a Tuple by (elm1, elm2, ...).
    • PreForm and PreTuple - A PreForm or a PreTuple denotes a Form or a Tuple, respectively, if it holds two or more elements appear, or, if there is a single element, denotes that very same single element. One may denote a PreForm by ₐ⟅elm1 ...⟆ₐ, and a PreTuple by ⟅elm1 ...⟆.
    • Punctuator - A Node whose single element should be parsed for punctuation.
  • NodeIterator - An iterator for iterating over the elements of a Node.
  • Token - An Element representing a token. The outcome of tokenization should be a single Node instance whose elements are all Tokens.
  • LispPrinter - A collection of functions for printing Code using S-Expressions (Lisp prefix notation).

Tokenization

Readtable-macro parsing.

Transducers

Tree transducers.

Parsers

Parsers for specific languages (currently only lyc).