SOURCE
Here we describe the source language and language lingo.
OPERATION AND RESULT
When a parse occurs, the result has 2 features that describes the success of the parse: size and flag. The size is how many items (characters in tokenization or tokens in analyzation) that were parse. The flag is a boolean that tells if the operation was successful - true for yes or false for no. For a parse to be considered successful, the size must be greater than 0 or the flag be true. In the case of there is multiple parse for the same location in text, the parser will choose the most successful parse - true and largest size.
[1] @char
The above will produce a result of size:1, flag:true.
[0] @char
The above will produce a result of size:0, flag:true.
# Imaging parsing the text ABC with the below
[1] "CAT"
The above scenario will produce a result of size:0, flag:false.
INSTRUCTIONS
Parser source instructions are grouped into options, series and items.
OPTION: ITEM | ITEM | ITEM
SERIES: ITEM ITEM ITEM
ITEM: COMMANDS, SERIES, OPTIONS
Options are separated by |, series are separated by spaces and a single item is an item. Each item in a series must be successful for the entire series to be successful, while only one of the option needs to be successful.
CMD1 CMD2 | CMD3 CMD4
So the above instruction would read #. process CMD1 then CMD2 OR #. process CMD3 then CMD4
CMD1 (CMD2 | CMD3) CMD4
We can use curve brackets to control which set of commands the or operator affects. So in the case of above: #. process CMD1 then (CMD2 or CMD3) then CMD4
The machine of the parser will test all or options to find the best (most successful) one unless the user use the FIRST command which makes the machine choose the first option that produces a successful result.
SPECIFICATIONS
The language is case insensitive.
TOKENPHASE
@token:
Identify tokenphase actions.
SYNTAXPHASE
@syntax:
Identify tokenphase actions.
CONFIG
@config {}
Contains all config information.
TOKENIZER
@tokenizer []
Assign tokenphase actions or strings that will be a tokenizer.
ENTRYPOINT
@entrypoint []
Assign syntaxphase actions that will begin analyzation.
OBJECTPOINT
@objectpoint []
List of tokenizers or syntaxphase action to save in the abstract syntax tree.
ACTION
@action A <>
The defines an action. The content of an action is parser source instruction.
CHAR
@char
Parse any single character from the text.
COMPARE-CHAR-RANGE
@char { 0x20 }
@char { 0x20 : 0x32}
@char {'A'}
@char {'A' : 'Z'}
@char {'😀' : '😹'}
Parse a single character that matches the given range. The range can be a single value or smallest and bigest value.
COMPARE-POINT-RANGE
Ta { 0x20 }
Tb { 0x20 : 0x32}
Sa {'A'}
Sb {'A' : 'Z'}
A {'😀' : '😹'}
Compares the result parsed by the action (identified by the label) to the given character range.
Important
If the text parsed is greater than the length of a single character then this operation will be unsuccessful.
COMPARE-POINT-STRING
Compares the result parsed by the action (identified by the label) to the given string.
name {"Fred"}
You can also compare using sub-string using the sub-string identier % and the length of the string to compare.
name {"Fr" % 2}
STRING
Parse any a series of characters from the text that matches the string exactly.
"Hello"
Prepend with $ to make case-insensitive.
$"Hello"
Important
In the tokenphase, STRING is a command to parse the text that matches itself exactly. In the syntaxphase, STRING is a parsed token and had to have been used as a tokenizer in the tokenphase.
ORDER
@order "Hello"
@order $"Hello"
Parse any a series of characters from the text that matches the string in any order.
Important
Can be used only in the tokenphase.
ONEOF
@oneof "Hello"
@oneof $"Hello"
Parse any single character from the text that matches one of the characters in the string exactly.
Important
Can be used only in the tokenphase.
NOT
@not "Hello"
@not @oneof $"Hello"
@not ("A" | "B")
The negate a result(unsuccessful becomes successful and successful becomes unsuccessful). If it produces a successful of it’s own, the size will always be 1.
AND
@not "Hello" @and @not Ta
@not @oneof $"Hello" @and @not ("A" | "B") @and @not Sa
The chain NOT commands together. If it produces a successful of it’s own, the size will always be 1.
COUNT
[5] COMMAND # run the command 5 types
[2:7] COMMAND # run the command 2 to 7 times
[?] COMMAND # run the command 0 or 1 time
[*] COMMAND # run the command 0 or MORE times
[+] COMMAND # run the command 1 or MORE times
[3+] COMMAND # run the command 3 or MORE times
The iterator command. The COUNT command is defined using square brackets. The content of the brackets is either a number, number range (number : number) or symbol (+, *, ?) or increment (number followed by +).
DISCARD
@discard Ta
Discard a tokenizer’s token so that it does exist in the syntax phase.
REDUCE
@reduce Ta
Reduces a tokenizer’s token so that multiple of it does not exist in a sequence.
FIRST
@first item | item | item
The parameter of this command is the entire option (and not a single item) unlike other commands. This command gets the first option that is successful rather than the default of getting most successful.
PRIORITY
@tokenizer [name 10, otherName, space -10]
Priority is a feature that allows assigning priortity to competing actions. This degates duplication. Smallest numbers have lowest priority and largest numbers have highest priority.
Important
Only tokenizers support this feature.
SCOPE
<B> # core syntax
@action A < <B> > # used in action
Scope negates adding an item to the abstract syntax tree. It is a feature, that gives extra control of where not to add an objectpoint in the abstract syntax tree.
NULLIFY
@nullify @not A
The nullify reduces a result to a size of 0 but keeps the flag as true. This allows us check if an item would parse without actually parsing and moving the index.