SOURCE

Here we describe the source language and language lingo.

OPERATION AND RESULT

When a parse occurs, the result has 2 features that describes the success of the parse: size and flag. The size is how many items (characters in tokenization or tokens in analyzation) that were parse. The flag is a boolean that tells if the operation was successful - true for yes or false for no. For a parse to be considered successful, the size must be greater than 0 or the flag be true. In the case of there is multiple parse for the same location in text, the parser will choose the most successful parse - true and largest size.

[1] @char

The above will produce a result of size:1, flag:true.

[0] @char

The above will produce a result of size:0, flag:true.

# Imaging parsing the text ABC with the below
[1] "CAT"

The above scenario will produce a result of size:0, flag:false.

INSTRUCTIONS

Parser source instructions are grouped into options, series and items.

OPTION: ITEM | ITEM | ITEM
SERIES: ITEM ITEM ITEM
ITEM: COMMANDS, SERIES, OPTIONS

Options are separated by |, series are separated by spaces and a single item is an item. Each item in a series must be successful for the entire series to be successful, while only one of the option needs to be successful.

CMD1 CMD2 | CMD3 CMD4

So the above instruction would read #. process CMD1 then CMD2 OR #. process CMD3 then CMD4

CMD1 (CMD2 | CMD3) CMD4

We can use curve brackets to control which set of commands the or operator affects. So in the case of above: #. process CMD1 then (CMD2 or CMD3) then CMD4

The machine of the parser will test all or options to find the best (most successful) one unless the user use the FIRST command which makes the machine choose the first option that produces a successful result.

SPECIFICATIONS

The language is case insensitive.

TOKENPHASE

@token:

Identify tokenphase actions.

SYNTAXPHASE

@syntax:

Identify tokenphase actions.

CONFIG

@config {}

Contains all config information.

TOKENIZER

@tokenizer []

Assign tokenphase actions or strings that will be a tokenizer.

ENTRYPOINT

@entrypoint []

Assign syntaxphase actions that will begin analyzation.

OBJECTPOINT

@objectpoint []

List of tokenizers or syntaxphase action to save in the abstract syntax tree.

ACTION

@action A <>

The defines an action. The content of an action is parser source instruction.

CHAR

@char

Parse any single character from the text.

COMPARE-CHAR-RANGE

@char { 0x20 }
@char { 0x20 : 0x32}
@char {'A'}
@char {'A' : 'Z'}
@char {'😀' : '😹'}

Parse a single character that matches the given range. The range can be a single value or smallest and bigest value.

COMPARE-POINT-RANGE

Ta { 0x20 }
Tb { 0x20 : 0x32}
Sa {'A'}
Sb {'A' : 'Z'}
A {'😀' : '😹'}

Compares the result parsed by the action (identified by the label) to the given character range.

Important

If the text parsed is greater than the length of a single character then this operation will be unsuccessful.

COMPARE-POINT-STRING

Compares the result parsed by the action (identified by the label) to the given string.

name {"Fred"}

You can also compare using sub-string using the sub-string identier % and the length of the string to compare.

name {"Fr" % 2}

STRING

Parse any a series of characters from the text that matches the string exactly.

"Hello"

Prepend with $ to make case-insensitive.

$"Hello"

Important

In the tokenphase, STRING is a command to parse the text that matches itself exactly. In the syntaxphase, STRING is a parsed token and had to have been used as a tokenizer in the tokenphase.

ORDER

@order "Hello"
@order $"Hello"

Parse any a series of characters from the text that matches the string in any order.

Important

Can be used only in the tokenphase.

ONEOF

@oneof "Hello"
@oneof $"Hello"

Parse any single character from the text that matches one of the characters in the string exactly.

Important

Can be used only in the tokenphase.

NOT

@not "Hello"
@not @oneof $"Hello"
@not ("A" | "B")

The negate a result(unsuccessful becomes successful and successful becomes unsuccessful). If it produces a successful of it’s own, the size will always be 1.

AND

@not "Hello" @and  @not Ta
@not @oneof $"Hello" @and @not ("A" | "B") @and @not Sa

The chain NOT commands together. If it produces a successful of it’s own, the size will always be 1.

COUNT

[5]   COMMAND # run the command 5 types
[2:7] COMMAND # run the command 2 to 7 times
[?]   COMMAND # run the command  0 or 1 time
[*]   COMMAND # run the command 0 or MORE times
[+]   COMMAND # run the command 1 or MORE times
[3+]  COMMAND # run the command 3 or MORE times

The iterator command. The COUNT command is defined using square brackets. The content of the brackets is either a number, number range (number : number) or symbol (+, *, ?) or increment (number followed by +).

DISCARD

@discard Ta

Discard a tokenizer’s token so that it does exist in the syntax phase.

REDUCE

@reduce Ta

Reduces a tokenizer’s token so that multiple of it does not exist in a sequence.

FIRST

@first item | item | item

The parameter of this command is the entire option (and not a single item) unlike other commands. This command gets the first option that is successful rather than the default of getting most successful.

PRIORITY

@tokenizer [name 10, otherName, space -10]

Priority is a feature that allows assigning priortity to competing actions. This degates duplication. Smallest numbers have lowest priority and largest numbers have highest priority.

Important

Only tokenizers support this feature.

SCOPE

<B> # core syntax
@action A < <B> > # used in action

Scope negates adding an item to the abstract syntax tree. It is a feature, that gives extra control of where not to add an objectpoint in the abstract syntax tree.

NULLIFY

@nullify @not A

The nullify reduces a result to a size of 0 but keeps the flag as true. This allows us check if an item would parse without actually parsing and moving the index.