SOURCE ======= Here we describe the source language and language lingo. OPERATION AND RESULT --------------------- When a parse occurs, the **result** has 2 features that describes the success of the parse: **size** and **flag**. The **size** is how many items (characters in tokenization or tokens in analyzation) that were parse. The **flag** is a **boolean** that tells if the operation was successful - **true** for yes or **false** for no. For a parse to be considered **successful**, the **size** must be greater than 0 or the **flag** be **true**. In the case of there is **multiple** parse for the same **location** in text, the parser will choose the **most successful** parse - **true** and **largest size**. .. code-block:: [1] @char The above will produce a result of `size:1, flag:true`. .. code-block:: [0] @char The above will produce a result of `size:0, flag:true`. .. code-block:: # Imaging parsing the text ABC with the below [1] "CAT" The above scenario will produce a result of `size:0, flag:false`. INSTRUCTIONS --------------- *Parser source instructions* are grouped into **options**, **series** and **items**. .. code-block:: text OPTION: ITEM | ITEM | ITEM SERIES: ITEM ITEM ITEM ITEM: COMMANDS, SERIES, OPTIONS Options are separated by `|`, series are separated by spaces and a single item is an item. Each item in a series must be **successful** for the entire series to be **successful**, while only one of the option needs to be successful. .. code-block:: text CMD1 CMD2 | CMD3 CMD4 So the above instruction would read #. process CMD1 then CMD2 *OR* #. process CMD3 then CMD4 .. code-block:: text CMD1 (CMD2 | CMD3) CMD4 We can use curve brackets to control which set of commands the *or operator* affects. So in the case of above: #. process CMD1 then (CMD2 or CMD3) then CMD4 The machine of the parser will test all *or options* to find the best (**most successful**) one unless the user use the **FIRST** command which makes the machine choose the first option that produces a **successful** result. SPECIFICATIONS --------------- The language is case insensitive. **TOKENPHASE** .. code-block:: @token: Identify tokenphase actions. **SYNTAXPHASE** .. code-block:: @syntax: Identify tokenphase actions. **CONFIG** .. code-block:: @config {} Contains all config information. **TOKENIZER** .. code-block:: @tokenizer [] Assign tokenphase actions or strings that will be a tokenizer. **ENTRYPOINT** .. code-block:: @entrypoint [] Assign syntaxphase actions that will begin analyzation. **OBJECTPOINT** .. code-block:: @objectpoint [] List of tokenizers or syntaxphase action to save in the abstract syntax tree. **ACTION** .. code-block:: @action A <> The defines an action. The content of an action is *parser source instruction*. **CHAR** .. code-block:: @char Parse any single character from the text. **COMPARE-CHAR-RANGE** .. code-block:: @char { 0x20 } @char { 0x20 : 0x32} @char {'A'} @char {'A' : 'Z'} @char {'😀' : '😹'} Parse a single **character** that matches the given range. The range can be a single value or smallest and bigest value. **COMPARE-POINT-RANGE** .. code-block:: Ta { 0x20 } Tb { 0x20 : 0x32} Sa {'A'} Sb {'A' : 'Z'} A {'😀' : '😹'} Compares the result parsed by the action *(identified by the label)* to the given character range. .. important:: If the text parsed is greater than the length of a single character then this operation will be unsuccessful. **COMPARE-POINT-STRING** Compares the result parsed by the action *(identified by the label)* to the given string. .. code-block:: name {"Fred"} You can also compare using sub-string using the sub-string identier `%` and the length of the string to compare. .. code-block:: name {"Fr" % 2} **STRING** Parse any a series of characters from the text that matches the string exactly. .. code-block:: "Hello" Prepend with `$` to make case-insensitive. .. code-block:: $"Hello" .. important:: In the tokenphase, STRING is a command to parse the text that matches itself exactly. In the syntaxphase, STRING is a parsed token and had to have been used as a tokenizer in the tokenphase. **ORDER** .. code-block:: @order "Hello" @order $"Hello" Parse any a series of characters from the text that matches the string in any order. .. important:: Can be used only in the tokenphase. **ONEOF** .. code-block:: @oneof "Hello" @oneof $"Hello" Parse any single character from the text that matches one of the characters in the string exactly. .. important:: Can be used only in the tokenphase. **NOT** .. code-block:: @not "Hello" @not @oneof $"Hello" @not ("A" | "B") The negate a result(unsuccessful becomes successful and successful becomes unsuccessful). If it produces a successful of it's own, the size will always be 1. **AND** .. code-block:: @not "Hello" @and @not Ta @not @oneof $"Hello" @and @not ("A" | "B") @and @not Sa The chain **NOT** commands together. If it produces a successful of it's own, the size will always be 1. **COUNT** .. code-block:: [5] COMMAND # run the command 5 types [2:7] COMMAND # run the command 2 to 7 times [?] COMMAND # run the command 0 or 1 time [*] COMMAND # run the command 0 or MORE times [+] COMMAND # run the command 1 or MORE times [3+] COMMAND # run the command 3 or MORE times The iterator command. The **COUNT** command is defined using square brackets. The content of the brackets is either a number, number range (*number : number*) or symbol (*+, \*, ?*) or increment (*number followed by +*). **DISCARD** .. code-block:: @discard Ta Discard a tokenizer's token so that it does exist in the syntax phase. **REDUCE** .. code-block:: @reduce Ta Reduces a tokenizer's token so that multiple of it does not exist in a sequence. **FIRST** .. code-block:: @first item | item | item The parameter of this command is the entire option (and not a single item) unlike other commands. This command gets the first option that is successful rather than the default of getting most successful. **PRIORITY** .. code-block:: @tokenizer [name 10, otherName, space -10] Priority is a feature that allows assigning priortity to competing actions. This degates duplication. Smallest numbers have lowest priority and largest numbers have highest priority. .. important:: Only tokenizers support this feature. **SCOPE** .. code-block:: # core syntax @action A < > # used in action Scope negates adding an item to the abstract syntax tree. It is a feature, that gives extra control of where not to add an objectpoint in the abstract syntax tree. **NULLIFY** .. code-block:: @nullify @not A The nullify reduces a result to a size of 0 but keeps the flag as true. This allows us **check** if an item would parse without actually parsing and moving the index.