Quick Overview

Algodal™ Parser Generator Tool is a parser generator that generates UTF-8 parsers in C99 code. It can generate complete parsers for most languages. A complete parser is defined as one in which parses the text of a defined language into all the defined chunks according to the official specification of that language. Currently, Algodal™ Parser Generator Tool does not create complete parsers for Python-like grammars. That is, grammars that expect the generation of tokens from logic instead of from characters, such as, DEDENT token in the case of Python. You can still use Algodal™ Parser Generator Tool to generate parsers for Python-like grammars but you will need to do post-processing. This is doing additional processing on the generated tokens or nodes outside the parser. Algodal™ Parser Generator Tool provides code to help with this but you can write your own code as well.

Algodal™ Parser Generator Tool parsers are perfect for C-like grammars. Examples of languages with grammars like this include programming languages like C, C++, Java, JavaScript, etc, and data languages like JSON, etc.

Algodal™ Parser Generator Tool has a different interpretation from most other parser generator of what a parser is. Algodal™ Parser Generator Tool considers a parser to be a virtual machine. Therefore, it divides a parser into 2 parts: the program with the actual code of the parser and the machine which the program runs on. The machine is reusable code. Therefore, once you generate the machine code, you don’t need to regenerate it unless you have updated to a newer Algodal™ Parser Generator Tool version. The program is specific code. It is specific to the parser source you write. Each parser you write will generate a different code, as well as, if you update your parser source, the generated program code will also change. Instead of generating the program as C code, you can generate it as JSON data. Algodal™ Parser Generator Tool provides a function that allows you to convert your parser program JSON to memory data your parser machine will then run. This feature makes it very easy to use Algodal™ Parser Generator Tool parsers in text editors or as pluggable parsers.

This design structure allows you to have a single machine running as many programs as you desire. Therefore, multiple parsers in a single program is supported out of the box (unlike most other parser generators) and generated parser code will be much smaller compared to other parser generators.

The generated parsers handle recursion well (it’s the user’s responsibility to write the parser well to avoid infinite loop) and there is no deep recursion limit - your parser will recurse as long as memory allows.

Some additional features of the generated parsers include:

  • read UTF-8 text without any issues (unlike most other parser generators)

  • parses fast and accurate.

  • compile cross-platform on Windows and Linux. (Can possibly compile on Apple but is not officially supported).

  • bindings allows for the parsers to be used in other programming languages.

The license is friendly:

  • The parser source (.parser or .algodal-parser file) you write is yours.

  • You can add the generated parsers code in your project (public, private, commercial) freely.

Algodal™ Parser Generator Language (also called parser language or source) is the language in which the parsers are written. The smallest building block of a parser is called an ACTION. An ACTION can be defined in either the TOKENPHASE or the SYNTAXPHASE. The TOKENPHASE is where tokenization occurs. Tokenization is grouping individual characters into tokens. The SYNTAXPHASE is where the abstract syntax tree (AST) is generated. The items in the tree are called nodes. Unlike most parser generators, Algodal™ Parser Generator generates the AST for you. You can control what nodes generated in the AST using OBJECTPOINT and SCOPE which you will learn about in detail later. You can generate a parser that only tokenizes the text or both tokenizes and analyzes the tokens. Analyzation is the process of grouping tokens into nodes and takes place during the SYNTAXPHASE.

An ACTION is defined using commands. Each command is a built-in function that does something specific or instuction that tells the parser how to handle parsing.

Note

In this document, commands are referred to by their english names in bold-all-caps. In the source you can see their literal expression, which in most cases are english names preceded by @ symbol.