Onyx Version 2: Manual and Semantic Specification
January 7, 2007

Basic semantics | Printable Representation | Scope rules
Types and Type Conversion rules | Built-in functions and operators
Function Calls | Error reporting requirements | Appendices: Reserved words and punctuation marks


Changelog

Date Description
11-Mar-2007 Added a discussion about the decimal() conversion function.
7-Mar-2007 Fixed error message concerning non-Boolean value in where condition, to be consistent with the reference
2-Mar-2007 Added discussions concerning the role of type substitions in function prototype matching.
28-Feb-2007 Fixed an incorrect error case 3+"3"
28-Feb-2007 Corrected the discussion about the order of function definitions. The order of definitions is not significant, and forward references are allowed.
28-Feb-2007 Corrected the discussion about global variables; in particular, the order of evaluation is significant, and forward definitions are not allowed.
27-Feb-2007 Expanded discussion about global variables
18-Feb-2007 Removed div as a legal MultiplicativeOp
5-Feb-2007 Changed typos where ODOM objects were called OXML objects. This is an imprecise use of the terminology.
5-Feb-2007 The document() function no longer supports path names containing // and \\, which will result in an Error loading OXML data.
31-Jan-2007 The document() function supports only relative filename paths. Absolute paths are not considered to be well-formed file names, and will result in an Error loading OXML data. A new relative path @ has been added to the document() function.
12-Jan-2007 Added further discussions and delete reference to explicit type conversions in comparison operators. [SBB]
11-Jan-2007 Added discussion about side-effects using addAttribute() and addChildNode(), and added an example. Also discuss implicit conversions involving Decimal [SBB]
09-Jan-2007 Clarified discussion of implicit type conversions [SBB]
04-Jan-2007 Initial release of the spec. [SBB]


Basic semantics

This document describes the basic semantics, scope rules, types and type conversion rules, Built-in functions and operators and list of built-in prototypes, error reporting requirements, as well as reserved words and punctuation marks for Onyx, an instructional variant of the W3C standard XML query language. Onyx uses a subset of the XML standard, called OXML, which is described in a separate document. Onyx provides a simplified view of the XML Document Object Model (DOM), called ODOM.

An Onyx program consists of an optional prolog containing function definitions, followed by a sequence of expressions. The value of an Onyx program  is the sequence of values of those expressions, computed in accordance with the semantic rules for Onyx. Note that an empty file is a valid Onyx program. It produces an empty result. Incorrect Onyx programs may raise syntax, static semantic, or runtime semantic errors.

A Sequence is an ordered collection of zero or more items. An item is either an atomic value or an Onyx-supported ODOM construct. An atomic value is a value of an atomic type; an Onyx-supported ODOM construct may either be a Node conforming to one of the possible Node subtypes or an attribute environment. These types are defined in the Types and Type Conversion section. A Sequence containing exactly one item is a singleton sequence. In Onyx, an item is identical to a singleton sequence containing that item in the context where a sequence is required. That is, an item can be promoted to a sequence contained that item.

A Sequence may contain duplicate values or nodes, but a Sequence is never an item in another Sequence. That is, unlike conventional lists, sequences are "flat", and may not contain other sequences. When a new Sequence is created by concatenating two or more input sequences, the new Sequence contains all the items of the input sequences and its length is the sum of the lengths of the input sequences. As a result, sequences are never nested--for example, combining the values 1, (2, 3), and ( ) into a single Sequence results in the Sequence (1, 2, 3). A Sequence containing zero items is an empty sequence.

As mentioned previously, Onyx provides a simplified view of the XML Document Object Model (DOM), called ODOM. There are only two Node types: element nodes and text nodes, referred to in Onyx as ENode and TNode, respectively. Onyx's data model for XML documents, OXML, restricts XML node content to be either a Sequence of XML element nodes or strictly text-only content, but not a mixture of  the two. A Node of either type consists of a tag, an optional attribute environment (a list of attributes), together with the content. Constructors and de-constructors for attribute environment and Node are described below.

Printable Representation

The printable representation of the value of a correct Onyx program is a legal OXML document whose root node is a <onyx-result> node containing the result of the query. If the result Sequence contains at least one Node, then all the elements in the Sequence should also be nodes, which are set as the children of the <onyx-result> Node. If the result Sequence contains only non-node elements, then the output must consist of text Node with tagname "Result" whose content is the printable representation of the items in that Sequence, in order, separated from each other by a single space. onyx.xml.OnyxNode.toString() method correctly formats the result according to this specification. If the result contains mixed content, then a dynamic error is raised.

Input:

let $e1 := enode("a", attrenv())
let $a2 := attrenv()
let $a2 := addAttribute($a2, "name2", "value2")
let $e2 := tnode("b", $a2, "2")
return ($e1, $e2)

Result: <?xml version="1.0" encoding="UTF-8"?>
<onyx-result>
   <a/>
   <b name2="value2">2</b>
</onyx-result>

This example demonstrates the case when the result is a non-node, the constant 7.

Input: 7
Result: <?xml version="1.0" encoding="UTF-8"?>
<onyx-result>
   <Result>7</Result>
</onyx-result>

Comments

Comments in Onyx begin with "{--", end with "--}" and are defined to continue until the first occurrence of the comment termination string "--}". There are no nested comments.

QNames

Onyx inherits the notion of a QName,  which  resembles an identifier name (namespace specifiers are not supported):

QName ::= ( Letter | "_" ) ( Letter | Digit | "." | "-" | "_" )*

Literal constants

The values of literal constants are items of type  Integer, Decimal, Boolean, or String.   The semantics of these datatypes (respectively) are equivalent to java.math.BigInteger, java.math.BigDecimal, java.lang.Boolean, and java.lang.String. There is no limit to the number of digits that can appear in an Integerliteral or an Decimal literal. Note that the following type definitions correspond to equivalent definitions in the Onyx Lexical Specification.  

Integer ::= (Digit)+
Decimal ::= ( ( Integer ( "." (Digit)* )? ) | ( "." Integer ) )
Boolean ::= "true" | "false"
Quote ::= """ String ::= (Quote ( Quote Quote | [^Quote] )* Quote )

A string is delimited by quotation marks. Onyx strips off the enclosing quotation marks, but two adjacent quotation marks within a string are interpreted as a single quotation mark.

Input: "a string",  "This is a 'string'.", "This is also a ""string""."
Result: <?xml version="1.0" encoding="UTF-8"?>
<onyx-result>
   <Result>a string This is a 'string'. This is also a "string".</Result>
</onyx-result>

Variables

A variable evaluates to the value to which its name is bound in the current scope. An expression containing an unbound variable raises a dynamic semantic error. Variables can be bound by variable declarations, for-clauses, let-clauses,  and also in function calls, which bind values to the formal parameters of functions before evaluating the function body. Variables do not define storage, and hence cannot be modified.

A variable consists of a dollar sign ($) followed by a QName with no intervening whitespace, as in $a. Variables are bound to values using the ":=" operator, as in the following let

let $a := 7

Since a QName may itself include the symbol "-", be careful to use spaces around minus signs. In the context of binding to the value 7, the expression $a - 2 > 6 has the value false, while the expression $a-2 > 6 contains an unbound variable called $a-2.

Global variable declarations are of the following form.

declare variable $varname as onyx.types.someType { expression }; or     [2/27/07]
declare variable $varname { expression };
    [2/27/07]

The following example declared a global variable $a with an initial value of 7 and a global variable $b with type Decimal and initial value of 4.4.

declare variable $a { 7 };
declare variable $b as onyx.types.Decimal { 4.4 };

Note that the declaration of the type is optional. That is, a variable name can be used to represent values of any of the valid Onyx types, however, at type of the value will be known at run time. The use of the semi-colon differentiates the declarations from the remainder of the program in which successive values are delimited by commas.

The order of global variable definitions is signficant. Forward references are not allowed, and will result in a variable not bound semantic error:
      declare variable $b { $a + 4 };
      declare variable $a { 7 };

Global variables may not be redefined (any attempt to do this will raise a static parsing semantic error), but a global variable may be rebound locally, i.e. via a FLWR expression or as a formal paramter in a function call. In any context, a local binding obscures a global definition of the same name. Thus, the following program returns the value 8:
      declare variable $a { 7};
      let $a := 8
          return $a
[2/28/07]

The comma operator, range expressions, and parenthesized expressions

One way to construct a Sequence is by using the comma operator, which evaluates each of its operands and concatenates the resulting values, in order, into a single result Sequence.  (The comma operator was used above in the example under the Literals heading.) Parentheses can be used for grouping expressions or expression sequences. The value of a parenthesized expression or expression Sequence is the value of the contained expression or expression Sequence.  The example below demonstrates that sequences never contain other sequences; the result is always one "flat" sequence.

Input: ((1,2), 3, (4), (), (5,(6,7)))
Result: <?xml version="1.0" encoding="UTF-8"?>
<onyx-result>
   <Result>1 2 3 4 5 6 7</Result>
</onyx-result>

The to operator may be used as a shorthand form for constructing Sequences of consecutive Integers. (The to operator defines an invocation of the bultin op:to() built-in function.) This infix operator takes two operands, both of which must be Integer (otherwise a dynamic error is raised).   The result is a Sequence containing the two Integer operands and every integer between them.  If the first operand is equal to the second, the Sequence is a singleton.

Input: 7 to 9, 6 to 4, 2 to 2
Result: <?xml version="1.0" encoding="UTF-8"?>
<onyx-result>
   <Result>7 8 9 6 5 4 2</Result>
</onyx-result>

Parentheses may be used for grouping expressions or expression sequences. The value of a parenthesized expression or expression Sequence is the value of the contained expression or expression Sequence.

Function calls

A function call consists of an QName followed by a parenthesized list of zero or more expressions. These expressions are the function's arguments. The QName and the types of the arguments must match the function prototype definition that is in scope. If an exact match is not found, type promotion or type substitution may be necessary, otherwise, a static error is raised. Type promotion is used in function calls and built-in operators for example, to permit Integer values to be compared with strings restresenting numerical values (see discussion below). Type promotion is different from subtype substitution, which refers to the use of a value whose actual type is derived from the expected type (For further information, see the discussions here and here in the XQuery specification). [3/2/07]

For example [3/2/07]

  • A function expecting a parameter $x of type Decimal can be called with an actual parameter of type Integer. This is an example of type promotion. The actual parameter is converted to the expected type across the call and within the body of of the function, $x is of type Decimal.
  • A function expecting a parameter of type AnyType can be called with an actual parameter of type Integer. This is an example of subtype substitution. The passed value retains its original type within the body of the function.
  • A function call expression is evaluated as follows:

    1. Each argument expression is evaluated left to right, producing an argument value.
    2. If an exact function prototype match is found, it is selected, otherwise type substitution is applied based on available prototypes with the same function name. [3/2/07]
    3. If there are no available type substitutions, then type promotion is applied based on available prototypes with the same function name. [3/2/07]
      1. If there are multiple matches, then the prototype requiring the least number of type promotions is used (For counting purposes, type substitutions are ignored. [3/2/07]) If more than one prototype has the least number of promotions, then an error is raised and all potential matches for the function are displayed in the error message.
      2. If no function table entry is a match for the called function, then a dynamic error is raised and all candidate prototypes are reported as a potential match; these candidates must have signatures that have the same number of arguments as that of the function call, and the name of the candidate must be the same as the name of the function call. Each prototype is treated as a string and the matches are output is ascending order.
    4. Using the selected prototype, a new scope is started, the argument values are bound to the formal parameters of the function, and the function body is evaluated. The value returned by the function body is the value of the function call. Once the function call is complete, the scope of the function call is removed.

    Note that the use of Decimal arguments are of limited use; as previously mentioned, they may appear in comparison expressions only.

    Built-in functions and operators

    Onyx provides a useful set of built-in functions and operators, which can be found in the Appendix at this link The built-in Onyx numeric and Boolean operators all have a typical programming-language semantics, subject to the precedence and associativity rules given in the Onyx grammar, and to typing rules described in the Types and Type Conversion section. Arithmetic operators take Integer arguments only; arithmetic expressions involving Decimal, or a combination of Integer and Decimal are not permitted except for comparison operators. (See types and type conversions).

    The comparison operators return a result of type Boolean. They take singleton operands only, and sequences may not be compared. The equality and non-equality operators compare two values of AnySimpleType of the same type. The inequality operators greater-than, etc. apply to all AnySimpleType arguments except for Boolean, which may not be compared.

    The literal true represents the Boolean value true. The literal false represents the Boolean value false.

    The built-in string() function converts a single ODOM TNode, an Integer, a Decimal, a Boolean, or Sequence value, to a String. In the case of an ODOM TNode the string content of the node is returned. In the case of an Integer, Decimal value, or Boolean, the string representation is returned. In the case of a Sequence, the string representation of the Sequence is returned, where each element is delimited by a single space and the sequence is enclosed in square brackets. No other types are supported. The + operator is overloaded to handle string concatenation, and is semantically equivalent to the built-in concat function, i.e. "AB" + "CD" = concat("AB","CD")

    The built-in integer() function converts a String to an Integer value. If the value cannot be converted, a runtime exception is raised. The integer() function will truncate Decimal fractions according to the behavior of the BigInteger class, and there will be a loss of precision, i.e. integer(2.4) = 2

    The built-in decimal() function converts a String or Integer to a Decimal value. If the value cannot be converted, a runtime exception is raised. [3/11/07]

    In Onyx, Decimal provides only comparison operators and type conversions. No artithmetic operations are supported. Thus, 3 + "2.7" is not a legal expression.

    FLWR Expressions

    FLWR expressions are used to handle iteration and to bind variables to intermediate results. A FLWR expression evaluates as follows:

    1. A for clause associates ("binds") one or more variables to values of expressions, creating variable bindings as for nested loops. The bindings can be viewed as a list of tuples drawn from the Cartesian product of the sequences of values to which the expressions evaluate (whether or not the bindings are actually computed that way). If the sequence for a particular for clause does not contain any values, then the remaining body of the FLWR is not evaluated since there are no bindings. The result for this particular for clause would be an empty sequence. For example, the for clause
      for $i in (3,4), $j in (8,9,10)

      binds the variables ($i,$j) in order to corresponding elements of the list of 2-tuples ((3,8), (3,9), (3,10), (4,8), (4,9), (4,10)) constructed from the Cartesian product of the sequences (3,4) and (8,9,10). This is equivalent to a (nested) pair of for-clauses:

      for $i in (3,4)
            for $j in (8,9,10)

      The first binds $i to successive elements of the Sequence (3,4); for each of those bindings, the second binds $j to successive elements of the Sequence (8,9,10).
       

    2. A let clause contains one or more variables, and binds each variable directly to the result of evaluating an expression. The let clauses generate one tuple with the variable bindings. For example, the let clause let $k := 15, $x := "a", $y := $k + 3
    3. binds the variables ($k,$x,$y) to corresponding values in the 3-tuple (15,"a",18). An equivalent way to view this is as three let-clauses in sequence:

              let $k := 15
              let $x := "a"
                  let $y = $k + 3

      The first binds $k to 15, the second binds $x to "a", the third binds $y to the value of $k + 3.

      If for clauses are present, the variable bindings created by let clauses can be viewed as being added to the tuples generated by the for clauses. In the context of the for-let clause

      for $i in (3,4), $j in (8,9,10) let $k := 7 + $j

      the variables ($i,$j,$k) are bound in order to corresponding elements of the list of 3-tuples ((3,8,15), (3,9,16), (3,10,17), (4,8,15), (4,9,16), (4,10,17)). An equivalent way to view this is as the Sequence of clauses:

              for $i in (3,4)
                  for $j in (8,9,10)
                      let $k := 7 + $j

      The first binds $i to successive elements of the Sequence (3,4); for each of those bindings, the second binds $j to successive elements of the Sequence (8,9,10); for each of those bindings, the third binds $k to the value of 7 + $j.  

      There is also a shortened form of the let expression, that uses the comma (,) to delimit successive bindings:

                  let $k := 15, $x := "a", $y := $k +3

    4. A where clause can be used as a filter for the tuples of variable bindings generated by the for and let clauses. The where-expression in the where clause is evaluated once for each of these tuples. It is a dynamic error if any of the tuples does not produce a Boolean result. If the value of the where-expression is true, the tuple is retained and its variable bindings are used in an execution of the return clause. If it is false, the tuple is discarded.

    5. The return clause contains an expression that is used to construct the value of the FLWR expression itself. The return clause is invoked once for every tuple generated by the for and let clauses, after filtering out any tuples that do not satisfy the conditions of a where clause. The expression in the return clause is evaluated once for every invocation, and the result of the FLWR expression is an ordered Sequence containing the results of these invocations. The ordering of the resulting Sequence must be the order in which items were added to it by the return clause.
    6. Input: for $i in (3,4), $j in (8,9,10)
      return ($i*$j)
      Result: <?xml version="1.0" encoding="UTF-8"?>
      <onyx-result>
        <Result>24 27 30 32 36 40</Result>
      </onyx-result>

    Conditional expressions

    A conditional if-then-else expression in Onyx contains a test-expression, a then-expression, and an else-expression. The test-expression is evaluated first. It is a dynamic error if the test-expression does not produce a Boolean result. If the test-expression is true, then the value of the conditional is the value of the then-expression; if it is false, the value of the conditional is the value of the else-expression.
     

    Core Onyx

    In production XML query languages, powerful constructs are provided to make expressions simpler to write and use, but which are also redundant. Onyx inherits some of these, which in effect act as "syntactic sugar." For example, the following complex for expression

        for $i in (1, 2),
            $j in (3, 4)
                return
                    ($i, $j)

    may be rewritten as a composition of several simple for expressions.

        for $i in (1, 2) return
            for $j in (3, 4) return
                ($i, $j)

    The language composed of these simpler expressions is called the Onyx Core language and may be described by a special grammar. Since the Core subset of Onyx covers only FLWR expressions, we won’t go to the trouble of sepcifying a separate grammar, the interested reader is referred to the XQuery semantic specification for the details. There are two other constructs covered by the Core language. First, the comma "," construct for the let construct. Second, the where clause.

    The where clause is provided as syntactic sugar, and may be replaced by an appropriate conditional expression. For example the following for loops are equivalent:

    for $i in (3,4,5,7)
        where ($i > 4)
        return $i
    for $i in (3,4,5,7)
        return if ($i > 4) then $i else ( )

    ODOM Constructors

    Onyx provides "computed" attribute environment and Node constructors.  These are specified using built in constructors attrenv(), enode(), and tnode(). These ODOM constructors, as well as addAttribute(), deal with tagnames and attribute names. The parameters to these functions take onyx.types.String values as arguments. The XML (and thus OXML) specification restricts the values of tagnames and attribute names. Since onyx.types.String values can have values which are invalid XML names, the ODOM library may raise an exception, namely an OnyxXMLException, if used with a value that does not meet the specification.

    In general, the definition of a QName provides a good guideline as to what values these XML names may have, however the exact definition is shown below as quoted from the XML 1.1 specification:

    NameStartChar ::= ":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] | [#xD8-#xF6] | [#xF8-#x2FF] | [#x370-#x37D] | [#x37F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] | [#x10000-#xEFFFF]
    NameChar ::= NameStartChar | "-" | "." | [0-9] | #xB7 | [#x0300-#x036F] | [#x203F-#x2040]
    Name ::= NameStartChar (NameChar)*

    In the table above, a valid XML tagname or attribute name must match the Name rule.

    Attribute Environment Constructor: The attribute environment constructor takes no arguments. Onyx attributes are specified by adding key-value pairs to an attribute environment. The addAttribute() function takes two arguments, an attribute name and a value. (This operation produced a side effect. [1/11/07]) The attribute name and attribute value are evaluated. The name must be of type String. The value must be of type String. If more than one value is added to the environment with identical keys, then the value for that attribute is overwritten. The value of an attribute can be retrieved from an attribute environment by using the getAttributeValue() function, which takes one argument specifying the key to lookup. The value of an attribute is the string representation of the value associated with the specified key.

    Input: let $a := attrenv()
    let $a := addAttribute( $a, "key1", string(4+3) )
    let $a := addAttribute( $a, "key2", "anotherValue" )
    return (getAttributeValue( $a, "key1" ), getAttributeValue( $a, "key2" ))
    Result: <?xml version="1.0" encoding="UTF-8"?>
    <onyx-result>
       <Result>7 anotherValue</Result>
    </onyx-result>

    Element Node Constructor: The element constructor takes the tag name and an attribute environment. The name must be of type String. The content of an element Node is a Sequence of text nodes or element nodes. Child nodes are added to an element node using the addChildNode() mutator. Thus, this operation produced a side effect. [1/11/07] A Sequence of the children can be retrieved using the children() accessor function.

    Input: let $attr := attrenv()
    let $elt := enode( "element", $attr )
    let $elt2 := enode( "element2", $attr )
    let $elt := addChildNode( $elt, $elt2 )
    return $elt
    Result: <?xml version="1.0" encoding="UTF-8"?>
    <onyx-result>
       <element>
          <element2/>
       </element>
    </onyx-result>

    Text Node Constructor: The text Node constructor takes the tag name, an attribute environment, as well as the text content of the node. The name must be of type String. The content of the text node must be a String. The contents of the text node can be retrieved using the string() function, which is defined to return the text content of the text node when the argument of type onyx.types.TNode.

    Input: let $attr := attrenv()
    let $attr := addAttribute( $attr, "key1", "value1" )
    let $en := enode( "eltNode", $attr )
    let $attr := attrenv()
    let $attr := addAttribute( $attr, "key2", "value2" )
    let $tn := tnode("txtNode", $attr, "this is a spiffy text node")
    return addChildNode( $en, $tn )
    Result: <?xml version="1.0" encoding="UTF-8"?>
    <onyx-result>
       <eltNode key1="value1">
          <txtNode key2="value2">this is a spiffy text node</txtNode>
       </eltNode>
    </onyx-result>

    A worked example: [1/11/07]

    In this example, we show how to construct the following document.



    Input: let $attr := attrenv(),
       $elt := enode( "D", $attr )
       for $i in 1 to 4
          let $tag := "E" + string($i),
              $child := tnode( $tag,$attr, string($i)),
              $elt := addChildNode( $elt, $child )
             return $elt
     
    Result:  <onyx-result>
        <D>
          <E1>1</E1>
          <E2>2</E2>
          <E3>3</E3>
          <E4>4</E4>
        </D>
     </onyx-result>

    Two points of note. Because addChildNode() is a side-effect producing operation, the for loop returns 4 references to the same object. This is true even though a new binding of elt is made on each iteration of the loop. However, we see only one instance of the element with tag D. The reason is that the underlying W3C DOM library org.w3c.dom has removed duplicate child nodes (see appendChild()).

    User-defined functions

    Functions in Onyx are defined by specifying the function name and parameter names. Parameters and the return value may be given a definite type, but this information is optional. By default, an untyped parameter or return result is given the type onyx.types.AnyType. However, Onyx does not support the use of explicit generic types such as onxy.types.AnyType within Onyx progarms. These are used only for internal purposes (similar rules apply to onyx.types.Number, which will be discussed below). See the Scope Rules for details related to functions. A function is defined in the following manner.

    declare function funcName( $varname1 as onyx.types.someType, $varname2 ) as onyx.types.Decimal { function body };

    The use of the semi-colon differentiates the declarations from the remainder of the program in which successive values are delimited by commas. Below is a concrete example of how to define a function that returns a Sequence of the first 10 Fibonacci numbers.

    Input:

    declare function fibR( $numFib as onyx.types.Integer, $prev2, $prev1 ) as onyx.types.Sequence {
      if( $numFib > 0 )
      then
        ( integer($prev2 + $prev1), fibR( integer($numFib - 1), $prev1, integer( $prev2 + $prev1 ) ) )
      else
        ()
    };

    declare function fib( $numFib as onyx.types.Integer ) as onyx.types.Sequence {
      if( $numFib < 1 )
      then
        ()
      else if( $numFib = 1 )
      then
        ( 1 )
      else if( $numFib = 2 )
      then
        ( 1, 1 )
      else
        ( 1, 1, fibR( integer($numFib - 2), 1, 1 ) )
    };

    fib( 10 )

    Result: <?xml version="1.0" encoding="UTF-8"?>
    <onyx-result>
       <Result>1 1 2 3 5 8 13 21 34 55</Result>
    </onyx-result>

    Scope rules

    Names in Onyx are introduced in three ways: as names of defined functions, and as names of variables in variable declarations,  for-clauses, let- clauses, and as formal parameters in function definitions. Scope rules determine where in an Onyx program a name can be used, and what the name refers to. Onyx uses static scoping; more specifically, at any point in the program there are two scopes to consider, a local scope and a global scope. Global variables are read-only, and are visible in all scopes. They are introduced using a declaration, rather than defined with a let.

    Function names

    Built-in Onyx function names have scope over every Onyx program. A function name declared in a function definition in an Onyx program prolog has scope in the entire program, and the order of function definitions is not signficant. Thus, forward references are allowed (However, as specified in the grammar, all function definitions must appear in the prolog, before the QueryBody) [2/28/07]

          declare function f() { g(1) };
          declare function g($x) { $x +1 };
              f()

    Formal parameters to functions

    A variable appearing in a formal parameter list of a function definition has scope over the body of the function. The effect of multiple formal parameters with the same name is undefined, though it is not an error.

    Variables bound in for and let clauses of FLWR expressions

    A variable name may not be used except in its scope. Any variable bound in a for or let clause is in scope until the end of the FLWR expression in which it is bound (except within the very expression to which it is bound). If the variable name used in the binding was already bound in the current scope, a new nested scope for the name is created by the binding, and the variable name refers to the newly bound variable until that variable goes out of scope. At this point, the variable name again refers to the variable of the prior binding, if it is still in scope.

    Lexical  scoping of variables

    Variables are lexically scoped, except for variables declared in the query prolog, which have global scope.  Variables may appear lexically unbound in function definitions; this is reported as a dynamic error when the function is called.

    Input: declare function g($w) { $w + $y };
    declare variable $y {7};
    let $v := 3 return (g($v),$y+3)
    Result: <?xml version="1.0" encoding="UTF-8"?>
    <onyx-result>
       <Result>10 10</Result>
    </onyx-result>

    Types and Type conversions

    Onyx is a strongly typed language; the value of an Onyx expression that does not raise any errors is always of a definite type. Thus, the value of an expression can be thought of as a pair consisting of the value and the type of the value. In some cases the type can be completely determined statically, from the syntactic type of the expression; in other cases the precise type can only be determined dynamically, at runtime. For example, reading in an OXML document creates a set of nodes, i.e. the document() function. Associated with each Node is a type and the types cannot be known until run time

    Onyx does not have user-defined types, so the only types available are built-in. Onyx is based on simple built-in type hierarchy, and the root of the hierarchy is AnyType. (Onyx does not support XML document schema validation). We can think of AnyType as an abstract class, i.e. with direct instances. This is somewhat like Java Object, except that the type is abstract. There are three subtypes of AnyType: Sequence, AnySimpleType and, XmlType. The subtype AnySimpleType handles the atomic values like String, Number Decimal, and so on. The subtype XMLType handles the xml document entities such as ENode, TNode, and attribute environment.

    Variables, parameters, and functions cannot be declared to be typed as abstract types (AnyType, AnySimpleType, XmlType, Number, and Node.). As mentioned previously, when types are left unspecified, they default to AnyType, but this type cannot be used explicitly and is for internal accounting only.

    The Onyx types are listed in the following table.


    Typename
    Parent type
    Comments
    AnyType
    -----
    The "root" type.
    AnySimpleType
    AnyType
    The parent type of all items, i.e. values that can be contained in a Sequence.
    Sequence
    AnyType
    An ordered Sequence of 0 or more items. Note however that there is no distinction between an item (i.e., a Node or an atomic value) and a singleton Sequence containing that item, i.e., an item is equivalent to a singleton Sequence containing that item and vice versa.
    XmlType AnyType The parent type for all Onyx-supported ODOM constructs; an abstract type.
    String
    AnySimpleType
    A string of Unicode characters. An atomic type.
    Number
    AnySimpleType
    The parent type of all numeric types, an abstract type.
    Decimal
    Number
    A numeric data type whose instances consist of an unscaled value of arbitrary length, and a scale. Only comparison operators are supported. (See java.math.BigDecimal.)
    Integer
    Number
    A subtype of Decimal in which scale is 0.   (See java.math.BigInteger.)
    Boolean
    AnySimpleType
    A Boolean type, with values true or false. An atomic type.
    Node
    XmlType
    An ODOM node, parent of all Node types. (An abstract type)
    TNode
    Node
    A text node.
    ENode
    Node
    An element node.
    AttrEnv XmlType An attribute environment. Consists of a Sequence of attribute values, with the restriction that no two attribute values in the Sequence have the same name.
     

    Type names in an Onyx program must be written using the fully qualified type name. That is, a Decimal value would have the type onyx.types.Decimal and an element node would have the type onyx.types.ENode.

    Type conversions

    The only forms of automatic type conversion supported by Onyx are promotions from a single value to a Sequence (e.g. 1 to (1), but applies to any XmlType or AnySimpleType), in comparison expressions involving a mixture of types Number and String (Integer and Decimal, with Integer implicitly promoted to Decimal, as described below), or when passing an argument of type AnySimpleType or XmlType to a function that requires a Sequence, or an argument of type Integer to a function that expects a Decimal [1/11/07] All other type casts must be done explicitly, and may involve only the built in types Decimal, Integer, String, and Sequence.

    Type conversions for comparison operators

    Arguments to comparison operators must be atomic values of type Number (Integer and Decimal, with Integer implicitly promoted to Decimal) or String (String is converted to either Integer or Decimal, ensuring that the comparison 3.0 = "3.00" is true). The result is of type Boolean. All other comparisons cause a dynamic error to be raised. Type-casts must be used explicitly to ensure that the comparison can take place. If the arguments to the comparison operator are not of the same type or are not singletons, then a dynamic type error is raised. [1/12/07]

    Type conversions for logical operators

    The logical operators and, or require argument(s) of type Boolean; otherwise  a dynamic argument type error is raised.  The value returned by a logical operator is of type Boolean.


    Error reporting requirements

    The Onyx interpreter must report an error to standard error output, in exactly the format described below.

    The following kinds of errors must be detected. In each case, only the first-occurring error in the program needs to be detected. There are three categories of errors: static parser errors, static semantic errors, and dynamic semantic errors. Static errors are considered to come "before" any dynamic errors. Parser errors are those that deal with the code's compliance with the language. Semantic errors are those that occur when interpreting/evaluating the individual constructs of the code. Since parser errors deal with the grammar and type names, all parser errors can be determined prior to evaluation and thus are all static errors. Semantic errors can be either dynamic (during evaluation) or static (prior to evaluation). The report of each type of error has the following forms:

    <?xml version="1.0" encoding="UTF-8"?>
    <onyx.error.ParserError>
       <StaticError column="X" line="Y">error message</StaticError>
    </onyx.error.ParserError>

    <?xml version="1.0" encoding="UTF-8"?>
    <onyx.error.SemanticError>
       <StaticError column="X" line="Y">error message</StaticError>
    </onyx.error.SemanticError>

    <?xml version="1.0" encoding="UTF-8"?>
    <onyx.error.SemanticError>
       <DynamicError column="X" line="Y">error message</DynamicError>
    </onyx.error.SemanticError>

    where X and Y are the line and column location of the error in the input program. The format of the detail string depends on the specific kind of error, as described in the following table.

    Error
    Category
    Location of error
    Detail String
    Details

    Lexical error Parser
    (Static)
    The location of the lexical error token Lexical Error: X X is the token that caused the error
    Syntax error
    Parser
    (Static)
    The location of the token that caused the syntax error
    Syntax Error
    None
    Use of an invalid type name Parser
    (Static)
    The location of the QName token of the invalid type name T is not a valid Onyx type T is the name of the invalid Onyx type
    Attempt to define an already defined function
    Semantic
    (Static)
    The location of the QName token of the function name in the attempted function definition
    Function F with prototype P already defined
    F is the name of the function;
    P is the prototype of the function being defined
    Attempt to define an already defined global variable Semantic
    (Static)
    The location of the variable name token of the variable being redeclared Global variable $X already defined $X  is the name of the variable
    Argument type incompatible with function or operator before and after applying valid type promotions
    Semantic
    (Dynamic)
    The location of the token identifying the function
    or operator
    Function with prototype P not found
    P is the prototype of the called function
    or operator, e.g. op:numeric-add(onyx.types.Decimal,onyx.types.String)
    All possible matches are reported.
    Top level Sequence contains mixed content (i.e. Node and non-Node types) Semantic
    (Dynamic)
    None Top level Sequence cannot contain mixed content: S S is the string representation of the top level Sequence.
    OXML content will be displayed using entity references.
    Attempt to evaluate an unbound variable
    Semantic
    (Dynamic)
    The location of the token identifying to the variable
    Variable $X not bound
    $X is the variable
    Attempt to call an undefined function
    Semantic
    (Dynamic)
    The location of the QName token naming the function in the attempted call
    Function with prototype P not found
    P is the prototype of the called
    function or operator, e.g.
    foo(onyx.types.Decimal,onyx.types.String)
    Error reading OXML file (not found, or not well-formed)
    Semantic
    (Dynamic)
    The location of the QName token naming the document function
    Error loading OXML data from file named F
    F is the name of the file
    OXML Attribute Name is invalid Semantic
    (Dynamic)
    The location of the function call attempting to set the attribute (addAttribute) Attribute name N is an invalid OXML attribute name N is the invalid attribute name
    OXML Tagname is invalid Semantic
    (Dynamic)
    The location of the constructor attempting to set the tagname (tnode() or enode()) Tagname N is an invalid OXML tagname N is the invalid tagname
    If-Condition contains a non-Boolean value Semantic
    (Dynamic)
    The location of the expression with incorrect type If condition is not of type onyx.types.Boolean None
    Where Condition is not of type onyx.types.Boolean[3/7/07] Semantic
    (Dynamic)
    The location of the expression with incorrect type Where condition is not of type onyx.types.Boolean None
    Function return value type does not match declared return type Semantic
    (Dynamic)
    The location of the function call that caused the error Function N declared to return type E, but found F N is the name of the function. E is the fully qualified type name of the expected return type. F is the fully qualified type name that was found.
    Attempt to assign value of incorrect type when declaring a variable
    Semantic
    (Dynamic)
    The location of the variable name token
    Type not assignable to $X
    $X is the variable

    Attempt to convert non-numeric string to number Semantic
    (Dynamic)
    The location of the type-cast function call that caused the error "S" is not a valid number S is the invalid number
    Divide by zero Semantic
    (Dynamic)
    The location of the div or idiv operator for which the error occurred           [2/18/07] Division by zero is invalid None
    Modulus is not greater than zero Semantic
    (Dynamic)
    The location of the mod operator for which the error occurred Modulus value must be greater than zero None

    Onyx program examples:

    Input: 3 + "3"
    Result: <?xml version="1.0" encoding="UTF-8"?>
    <onyx.error.SemanticError>
       <DynamicError column="5" line="1">
           <ErrorMessage>Function with prototype op:numeric-add(onyx.types.Integer,onyx.types.String) not found</ErrorMessage>
          <PossibleMatch>onyx.types.Integer
              op:numeric-add(onyx.types.Integer,onyx.types.Integer)</PossibleMatch>
          <PossibleMatch>onyx.types.String
              op:numeric-add(onyx.types.String,onyx.types.String)</PossibleMatch>
       </DynamicError>
    </onyx.error.SemanticError>     [2/28/07]

    Input: for $i in 3 to 8
    for $j in 9 to 5
       where $i <= ($j - 3)
       return
          $i * 2 + $j
    Result: <?xml version="1.0" encoding="UTF-8"?>
    <onyx-result>
       <Result>15 14 13 12 17 16 15 19 18 21</Result>
    </onyx-result>

    Input: declare function foo( $x as onyx.types.Integer ) { () };
    declare function foo( $j as onyx.types.Integer ) { () };
    Result: <?xml version="1.0" encoding="UTF-8"?>
    <onyx.error.SemanticError>
       <StaticError column="18" line="2">Function foo with prototype foo(onyx.types.Integer) already defined</StaticError>
    </onyx.error.SemanticError>


    Undefined error formats

    Any errors not defined here are undefined, and thus an implementer may choose any reasonable method for handling these cases.




    Built-in functions and operators

    The operators are in parenthesis. The built-in (predefined) function name always begins with op: Note that only one instance of a unary + or -operator may used, and thus repeated expressions such as ++3 and +--3 are illegal. A few of the built-in functions are infix: and, or, mod, idiv and to. As mentioned previously, numeric operations apply to Integer values only. A list of the prototypes for built-ins can be found HERE

    Operator Arity
    Meaning  Comments
    op:numeric-add (+) 2
    Numeric addition
    String Concatenation
    Integer numeric type conversions apply
    Semantically equivalent to concat
    op:numeric-subtract (-) 2
    Numeric subtraction Integer numeric type conversions apply
    op:numeric-multiply (*) 2
    Numeric multiplication Integer numeric type conversions apply
    op:numeric-integer-divide (idiv) 2
    Numeric integer division Integer numeric type conversions apply
    op:numeric-mod (mod) 2
    Numeric modulus Integer numeric type conversions apply
    op:numeric-unary-plus (+) 1
    Numeric unary plus Integer numeric type conversions apply
    op:numeric-unary-minus (-) 1
    Numeric negation Integer numeric type conversions apply
    op:to (to)
    2
    Range operator
    Creates a Sequence of numbers. Both parameters must be of type Integer
    op:and (and)
    2
    Boolean conjunction
    Standard Boolean type conversions apply
    op:or (or)
    2
    Boolean disjunction
    Standard Boolean type conversions apply
    op:equals (=)
    2
    Value equality test
    See notes on value comparisons
    op:not-equals (!=)
    2
    Value nonequality test
    See notes on value comparisons
    op:less-than (<)
    2
    Value less-than test
    See notes on value comparisons
    op:greater-than (>)
    2
    Value greater-than test
    See notes on value comparisons
    op:less-than-equals (<=)
    2
    Value less-than-or-equals test
    See notes on value comparisons
    op:greater-than-equals (>=)
    2
    Value greater-than-or-equals test
    See notes on value comparisons

    Value Comparison Operators

    The value comparison operators op:not-equalsop:equals will test for equality or non-equality of two AnySimpleType values of the same type: Boolean Integer Decimal and String. The inequalty operators op:less-than, op:greater-than, op:less-than-equals, and op:greater-than-equals compare two values of the same type, except for Boolean.

    Built-in functions

    Built-in functions produce no side effects except for addChildNote() and addAttribute().

    Function Name Parameters Parameter Description Return Type* Description
    String functions
    concat String First string String Returns the String of the concatenation of the two strings passed as parameter.
    String Second String
    ODOM Constructors/Accessors/Mutators
    document String URI to a valid OXML document Node Returns the root node of the XML document whose path is specified by the parameter. The resulting XML tree is OXML compliant. Argument must be a string specifying a URI, possibly a filename. If the OXML document does not exist or is not well formed, a dynamic exception is raised.

    Only relative paths are supported for filenames. Absolute paths are not considered to be well-formed file names, and will result in an Error loading OXML data. The document() can specify a special relative path using the @ specifier. This path is currently set to ~/../public/Examples/xmlFiles. Thus, to read from the file ~/../public/Examples/xmlFiles/asdf.xml we specify: document("@/asdf.xml"). [1/31/07]
    The document() function does not support path names containing // and \\, which will result in an Error loading OXML data. [2/5/07]

    enode String Tagname ENode Returns an Onyx ENode representing an OXML element with the specified tagname and attribute environment. May raise an OnyxXMLException if the tagname is not a valid XML name.
    AttrEnv The attribute environment for the element
    tnode String Tagname TNode Returns an Onyx TNode representing an OXML element with the specified tagname, attribute environment, and text content. May raise an OnyxXMLException if the tagname is not a valid XML name.
    AttrEnv The attribute environment for the element
    String Text content
    addChildNode ENode Element to add child to ENode Returns a reference to the ENode to which the child was added. The reference is the same as the parameter passed in. This is one of two side-effect producing built-in operations. [1/11/07]
    Node The Node to add as a child
    children ENode An ENode Sequence Returns an Onyx Sequence containing an ordered set of Onyx nodes representing the children of the specified element node.
    tagname Node An Onyx node String Returns a string representing the tagname of the specified node.
    setAttrEnv Node An Onyx Node Node Sets the attribute environment of the specified Node. Returns a reference to the Node.
    AttrEnv An attribute environment
    getAttrEnv Node An Onyx Node AttrEnv Returns the attribute environment of the specified Node.
    attrenv n/a   AttrEnv Returns a new empty attribute environment object
    addAttribute AttrEnv An attribute environment AttrEnv Adds the specified key value pair to the attribute environment. If the key already exists, the value is replaced. Returns a reference to the specified attribute environment. May raise an exception if the attribute key is not a valid XML name. This is one of two side-effect producing built-in operations. [1/11/07]
    String An attribute key
    String An attribute value
    getAttributeKeys AttrEnv An attribute environment Sequence Returns a Sequence of Strings representing the keys of the specified attribute environment.
    getAttributeValue AttrEnv An attribute environment AnyType Returns the value for the specified key in the specified attribute environment. The return value is of type onyx.types.String if the value exists. If the key does not exist, then an empty sequence is returned.
    String The key
    Node Tests
    isNode AnyType Any value Boolean Returns true if the specified value is an Onyx Node type, false otherwise.
    isENode AnyType Any value Boolean Returns true if the specified value is an Onyx ENode type, false otherwise.
    isTNode AnyType Any value Boolean Returns true if the specified value is an Onyx TNode type, false otherwise.
    Type Casts
    string AnyType Any value String Returns the string representation of any simple type value, Sequence, or TNode. If the value is an TNode, the text content of the node is returned.
    integer Number or String Any value Integer Returns an Integer value. A String containing a legal numeric value is converted to an Integer (an error is raised if the string is not a valid Number.) Both string and Decimal values are converted to an Integer value with the decimal part truncated. If an Integer is passed in, a copy is returned.
    Sequence Operators
    length Sequence A Sequence Integer Returns the number of elements in the Sequence.
    first Sequence A Sequence AnyType Returns the first element in the Sequence. If the sequence is empty, an empty sequence is returned.
    tail Sequence A Sequence Sequence Returns all elements of the Sequence except the first. If the sequence is empty or contains only one element, then an empty sequence is returned.
    * Note: All type names are shown as their simple form. When writing Onyx code, the fully qualified type name must be used to designate the type.
    ** Note: All type names are shown as their simple form. When writing Onyx code, the fully qualified type name must be used to designate the type.