Files
plxml/doc/language.md
2022-06-05 16:33:39 +02:00

11 KiB

PL/XML: The Handbook

Introduction

As do many ideas, PL/XML sprang from the need to do something extremely useless just for fun, to see how it would turn out. The original premise was a programming language based on XML syntax. Its name came from (PL/SQL)[https://en.wikipedia.org/wiki/PL/SQL], some kind of torture device for IT students.

In order not to make this project completely useless, I wrote the original interpreter in Rust (WIIR!) to get better acquainted with the language.

The result is a dynamically-typed procedural language, which is provably Turing-complete since I was able to write a blazing slow (Brainfuck interpreter)[../sample/bf.pl.xml] with it. All the convenient aspects are overshadowed by the utter agony that manually writing XML is.

Quick guide

<program name="primer">
    <function name="my-print">
        <arguments>
            <argument name="the-text" />
        </arguments>
        <body>
            <call function="print-line">
                <arguments>
                    <value variable="the-text" />
                </arguments>
            </call>
        </body>
    </function>
    <main>
        <assign variable="text">
            <string value="Hello, world!">
        </assign>
        <call function="my-print">
            <arguments>
                <value variable="text" />
            </arguments>
        </call>
    </main>
</program>

This slightly over-engineered hello world program contains some basics of PL/XML, such as program structure, variable assignment and retrieval, instanciation, and function definition and calls.

Program structure

Every PL/XML program should be wrapped in a program node specifying its name. This program must contain a main node that will be executed first, and can define a set of functions using function nodes. Inside main and the function body nodes is actual code that will be sequentially executed.

Values

PL/XML has a few value types. The first two are the signed numeric integer and real types, which have no precision guarantee. Another type is the usual character string, which may or may not support Unicode. The array type is a generic iterable collection of any value, including arrays. Functions are values as well, and as such can be (and technically are) stored in variables.

Integer, Real and String values can be instanciated by using the eponymous node with a value attribute. For instance:

<integer value="1" />
<real value="2.5" />
<string value="hello!" />

Value nodes integer, real, and string can also be used to cast a value to another type. For instance, a string value can be parsed into a real, and a real value can be rounded down by casting it to an integer.

<real>
    <integer>
        <real value="2.5" />
    </integer>
</real>

Arrays can be initialized empty or with contained elements. Array manipulation is performed through standard library functions.

<array />

<array>
    <integer value="0" />
    <integer value="1" />
    <integer value="2" />
    <integer value="3" />
    <real value="3.14" />
</array>

When boolean-like values are needed, all values are considered truthy, except the integer 0.

Variable manipulation

A value can be assigned to and retrieved from a variable. Variables are dynamically-typed, meaning you can assign any type to any variable, no matter its previous type.

To assign a value to a variable, use an assign node with a variable attribute specifying the name of the variable, and add a child node containing any value-returning node, such as string.

<assign variable="my-variable">
    <string value="hello!" />
</assign>

To retrieve a value, use a value node with the same variable attribute.

<value variable="my-variable" />

Variables have some sort of scoping which is function body-bound: there is a global scope containing standard and user-defined functions from which local scopes inherit.

Function calls

A call node is used to call functions. Function arguments are passed as child nodes to a arguments node. The short syntax uses the function attribute to specify the function to call.

<call function="my-print">
    <arguments>
        <string value="text" />
    </arguments>
</call>

Functions are values that can also be stored and retrieved through variables. Thus exists a longer syntax allowing dynamic calls, without the function attribute but putting the function value as a child of the call node.

<call>
    <value variable="my-print" />
    <arguments>
        <string value="text" />
    </arguments>
</call>

You can find a variety of utility functions in the standard library.

Function definition

Functions are defined at the top level, as child nodes to the program node. The name attribute specifies the function name to use when called. They have a arguments node, defining to which local variables arguments will be assigned, in order of argument nodes, and a body node containing the code that will be executed when the function is called.

<function name="my-print">
    <arguments>
        <argument name="the-text" />
    </arguments>
    <body>
        <call function="print-line">
            <arguments>
                <value variable="the-text" />
            </arguments>
        </call>
    </body>
</function>

This function, named "my-print" takes one argument, called "the-text". It uses this value to call the standard library "print-line" function.

Functions can return values. Wrap a value in a return node to use it as a return value for the function. Subsequent code will not be executed, and the caller can use the call node as any other value.

<function name="sum-plus-two">
    <arguments>
        <argument name="number1" />
        <argument name="number2" />
    </arguments>
    <body>
        <return>
            <add>
                <value variable="number1" />
                <value variable="number2" />
                <integer value="2" />
            </add>
        </return>
    </body>
</function>

This function takes two arguments and adds them together, adding two to the sum, and returns the result.

Built-in operations

The previous example uses an add node to sum integer values. PL/XML has multiple usual arithmetic and logic operators to manipulate values, used directly as nodes containing them.

Only compatible values will be used together. Integers will automatically be promoted to reals if needed.

add and multiply both take any number of number arguments and will compute their sum or product. add can also be used to concatenate string values.

<add>
    <integer value="9" />
    <integer value="33" />
</add>

<add>
    <string value="hello, " />
    <string value="world!" />
</add>

<multiply>
    <integer value="6" />
    <real value="7" />
</multiply>

subtract and divide take at least one numeric argument, which will be subtracted from or divided using subsequent arguments.

<subtract>
    <integer value="51" />
    <integer value="9" />
</subtract>

<divide>
    <integer value="126" />
    <integer value="3" />
</divide>

and and or also take at least one argument, and will chain their corresponding logic operation on all arguments.

<and>
    <integer value="1" />
    <string value="yes" />
</and>

<or>
    <integer value="0" />
    <string value="no" />
</or>

not takes exactly one argument, and will give a truthy value (the integer 1) if the argument is falsy (the integer 0), and a falsy value otherwise.

<not>
    <string value="make me falsy" />
</not>

equal, greater, and lower all take exactly two arguments, and will give a truthy value if the first is respectively equal to, greater than, or lower than the second, and a falsy value otherwise.

<equal>
    <integer value="5" />
    <integer value="5" />
</equal>

<greater>
    <integer value="4" />
    <integer value="2" />
</greater>

<lower>
    <integer value="11" />
    <integer value="16" />
</lower>

Control structures

As in many imperative languages, control structures are used to manipulate the flow of code execution. The first one is the if structure. Its first child is the value checked for truthyness, after which a then block contains the code to execute if it is truthy, otherwise the code contained in the optional else block will be executed.

<if>
    <value variable="my-condition" />
    <then>
        <call function="print-line">
            <arguments>
                <string value="truthy" />
            </arguments>
        </call>
    </then>
    <else>
        <call function="print-line">
            <arguments>
                <string value="falsy" />
            </arguments>
        </call>
    </else>
</if>

Three other structures give access to loops. while loops contain the condition to check, which will be executed at the beginning of each loop turn, and a do node containing the code to execute.

<while>
    <integer value="1" />
    <do>
        <call function="print-line">
            <arguments>
                <string value="forever!" />
            </arguments>
        </call>
    </do>
</while>

The for loop takes from, to, and step child nodes, which should evaluate to integer values. Code contained in the do child node will be executed with a variable whose name is specified in the variable attribute on the for node containing the current iteration value.

<for variable="i">
    <from><integer value="0" /></from>
    <to><integer value="10" /></to>
    <step><integer value="1" /></from>
    <do>
        <call function="print-line">
            <arguments>
                <add>
                    <string value="iteration #" />
                    <string>
                        <value variable="i" />
                    </string>
                </add>
            </arguments>
        </call>
    </do>
</for>

Finally, the each loop iterates over an array, assigning its values in order to the specified variable.

<each variable="v">
    <value variable="my-array" />
    <do>
        <call function="print-line">
            <arguments>
                <add>
                    <string value="value = " />
                    <string>
                        <value variable="v" />
                    </string>
                </add>
            </arguments>
        </call>
    </do>
</each>

Error handling

Some standard library functions or language nodes may raise errors during execution. In a handle node, errors can be caught inside a try node to use them in a catch node, which will only be executed if an error was raised.

<handle>
    <try>
        <divide>
            <integer value="1" />
            <integer value="0" />
        </divide>
    </try>
    <catch variable="error">
        <call function="print-line">
            <arguments>
                <add>
                    <string value="eroor caught = " />
                    <value variable="error" />
                </add>
            </arguments>
        </call>
    </catch>
</handle>