Basic semantics | Printable Representation | Scope rules
Types and Type Conversion rules | Built-in functions and operators
Function Calls | Error reporting requirements |
Appendices: Reserved words and punctuation marks
| Date | Description |
| 11-Mar-2007 | Added a discussion about the decimal() conversion function. |
| 7-Mar-2007 | Fixed error message concerning non-Boolean value in where condition, to be consistent with the reference |
| 2-Mar-2007 | Added discussions concerning the role of type substitions in function prototype matching. |
| 28-Feb-2007 | Fixed an incorrect error case 3+"3" |
| 28-Feb-2007 | Corrected the discussion about the order of function definitions. The order of definitions is not significant, and forward references are allowed. |
| 28-Feb-2007 | Corrected the discussion about global variables; in particular, the order of evaluation is significant, and forward definitions are not allowed. |
| 27-Feb-2007 | Expanded discussion about global variables |
| 18-Feb-2007 | Removed div as a legal MultiplicativeOp |
| 5-Feb-2007 | Changed typos where ODOM objects were called OXML objects. This is an imprecise use of the terminology. |
| 5-Feb-2007 | The document() function no longer supports path names containing // and \\, which will result in an Error loading OXML data. |
| 31-Jan-2007 | The document() function supports only relative filename paths. Absolute paths are not considered to be well-formed file names, and will result in an Error loading OXML data. A new relative path @ has been added to the document() function. |
| 12-Jan-2007 | Added further discussions and delete reference to explicit type conversions in comparison operators. [SBB] |
| 11-Jan-2007 | Added discussion about side-effects using addAttribute() and addChildNode(), and added an example. Also discuss implicit conversions involving Decimal [SBB] |
| 09-Jan-2007 | Clarified discussion of implicit type conversions [SBB] |
| 04-Jan-2007 | Initial release of the spec. [SBB] |
This document describes the basic semantics, scope rules, types and type conversion rules, Built-in functions and operators and list of built-in prototypes, error reporting requirements, as well as reserved words and punctuation marks for Onyx, an instructional variant of the W3C standard XML query language. Onyx uses a subset of the XML standard, called OXML, which is described in a separate document. Onyx provides a simplified view of the XML Document Object Model (DOM), called ODOM.
An Onyx program consists of an optional prolog containing function definitions, followed by a sequence of expressions. The value of an Onyx program is the sequence of values of those expressions, computed in accordance with the semantic rules for Onyx. Note that an empty file is a valid Onyx program. It produces an empty result. Incorrect Onyx programs may raise syntax, static semantic, or runtime semantic errors.
A Sequence is an ordered collection of zero or more items. An item is either an atomic value or an Onyx-supported ODOM construct. An atomic value is a value of an atomic type; an Onyx-supported ODOM construct may either be a Node conforming to one of the possible Node subtypes or an attribute environment. These types are defined in the Types and Type Conversion section. A Sequence containing exactly one item is a singleton sequence. In Onyx, an item is identical to a singleton sequence containing that item in the context where a sequence is required. That is, an item can be promoted to a sequence contained that item.
A Sequence may contain duplicate values or nodes, but a Sequence is never an item in another Sequence. That is, unlike conventional lists, sequences are "flat", and may not contain other sequences. When a new Sequence is created by concatenating two or more input sequences, the new Sequence contains all the items of the input sequences and its length is the sum of the lengths of the input sequences. As a result, sequences are never nested--for example, combining the values 1, (2, 3), and ( ) into a single Sequence results in the Sequence (1, 2, 3). A Sequence containing zero items is an empty sequence.
As mentioned previously, Onyx provides a simplified view of the XML Document Object Model (DOM), called ODOM. There are only two Node types: element nodes and text nodes, referred to in Onyx as ENode and TNode, respectively. Onyx's data model for XML documents, OXML, restricts XML node content to be either a Sequence of XML element nodes or strictly text-only content, but not a mixture of the two. A Node of either type consists of a tag, an optional attribute environment (a list of attributes), together with the content. Constructors and de-constructors for attribute environment and Node are described below.
The printable representation of the value of a
correct Onyx program is a legal OXML document whose root node is a
<onyx-result>
node containing
the result of the query. If the result Sequence contains at least one
Node, then all the elements in the Sequence should also be nodes, which
are set as the children of the <onyx-result> Node.
If the result Sequence contains
only non-node elements, then the output must consist of text Node with tagname "Result" whose content is the printable
representation of the items in that Sequence, in order, separated from each
other by a single space.
onyx.xml.OnyxNode.toString() method correctly formats the result according to this specification.
If the result contains mixed content, then a dynamic error is raised.
| Input: | let $e1 := enode("a", attrenv()) |
| Result: |
<?xml version="1.0" encoding="UTF-8"?> <onyx-result> <a/> <b name2="value2">2</b> </onyx-result> |
This example demonstrates the case when the result is a non-node, the constant 7.
| Input: | 7 |
| Result: |
<?xml version="1.0" encoding="UTF-8"?> <onyx-result> <Result>7</Result> </onyx-result> |
Comments
Comments in Onyx begin with "{--", end with "--}" and are defined to continue until the first occurrence of the comment termination string "--}". There are no nested comments.
Onyx inherits the notion of a QName, which resembles an identifier name (namespace specifiers are not supported):
The values of literal constants are items of type Integer, Decimal, Boolean, or String. The semantics of these datatypes (respectively) are equivalent to java.math.BigInteger, java.math.BigDecimal, java.lang.Boolean, and java.lang.String. There is no limit to the number of digits that can appear in an Integerliteral or an Decimal literal. Note that the following type definitions correspond to equivalent definitions in the Onyx Lexical Specification.
A string is delimited by quotation marks. Onyx strips off the enclosing quotation marks, but two adjacent quotation marks within a string are interpreted as a single quotation mark.
| Input: | "a string", "This is a 'string'.", "This is also a ""string""." |
| Result: |
<?xml version="1.0" encoding="UTF-8"?> <onyx-result> <Result>a string This is a 'string'. This is also a "string".</Result> </onyx-result> |
A variable evaluates to the value to which its name is bound in the current scope. An expression containing an unbound variable raises a dynamic semantic error. Variables can be bound by variable declarations, for-clauses, let-clauses, and also in function calls, which bind values to the formal parameters of functions before evaluating the function body. Variables do not define storage, and hence cannot be modified.
A variable consists of a dollar sign ($)
followed by a QName with no intervening whitespace, as in
$a. Variables are bound to values using the ":="
operator, as in the following let
Global
variable declarations are of the following form.
declare variable $varname as onyx.types.someType { expression }; or
[2/27/07]
declare variable $varname { expression };
[2/27/07]
The following example declared a global variable $a with an initial value of 7 and a global variable $b with type Decimal and initial value of 4.4.
declare variable $a { 7 };
declare variable $b as onyx.types.Decimal { 4.4 };
Note that the declaration of the type is optional. That is, a variable name can be used to represent values of any of the valid Onyx types, however, at type of the value will be known at run time. The use of the semi-colon differentiates the declarations from the remainder of the program in which successive values are delimited by commas.
The order of global variable definitions is signficant. Forward references are not allowed, and will result in a variable not bound semantic error:
declare variable $b { $a + 4 };
declare variable $a { 7 };
Global variables may not be redefined (any attempt to do this will raise a
static parsing semantic error),
but a global variable may be rebound locally, i.e. via a FLWR expression
or as a formal paramter in a function call.
In any context, a local binding obscures a global definition of
the same name. Thus, the following program returns the value
8:
declare variable $a { 7};
let $a := 8
return $a
[2/28/07]
One way to construct a Sequence is by using the comma operator, which evaluates each of its operands and concatenates the resulting values, in order, into a single result Sequence. (The comma operator was used above in the example under the Literals heading.) Parentheses can be used for grouping expressions or expression sequences. The value of a parenthesized expression or expression Sequence is the value of the contained expression or expression Sequence. The example below demonstrates that sequences never contain other sequences; the result is always one "flat" sequence.
| Input: | ((1,2), 3, (4), (), (5,(6,7))) |
| Result: |
<?xml version="1.0" encoding="UTF-8"?> <onyx-result> <Result>1 2 3 4 5 6 7</Result> </onyx-result> |
The to operator may be used as
a shorthand form for constructing Sequences of
consecutive Integers. (The to operator defines
an invocation of the bultin op:to()
built-in function.) This infix operator takes two
operands, both of which must be Integer (otherwise
a dynamic error is raised). The result is a Sequence containing the
two Integer operands and every integer between them. If the first
operand is equal to the second, the Sequence is a singleton.
| Input: | 7 to 9, 6 to 4, 2 to 2 |
| Result: |
<?xml version="1.0" encoding="UTF-8"?> <onyx-result> <Result>7 8 9 6 5 4 2</Result> </onyx-result> |
A function call consists of an QName followed by a parenthesized list of zero or more expressions. These expressions are the function's arguments. The QName and the types of the arguments must match the function prototype definition that is in scope. If an exact match is not found, type promotion or type substitution may be necessary, otherwise, a static error is raised. Type promotion is used in function calls and built-in operators for example, to permit Integer values to be compared with strings restresenting numerical values (see discussion below). Type promotion is different from subtype substitution, which refers to the use of a value whose actual type is derived from the expected type (For further information, see the discussions here and here in the XQuery specification). [3/2/07]
For example [3/2/07]
A function call expression is evaluated as follows:
Note that the use of Decimal arguments are of limited use; as previously mentioned, they may appear in comparison expressions only.
Onyx provides a useful set of built-in functions and operators, which can be found in the Appendix at this link The built-in Onyx numeric and Boolean operators all have a typical programming-language semantics, subject to the precedence and associativity rules given in the Onyx grammar, and to typing rules described in the Types and Type Conversion section. Arithmetic operators take Integer arguments only; arithmetic expressions involving Decimal, or a combination of Integer and Decimal are not permitted except for comparison operators. (See types and type conversions).
The comparison operators return a result of type Boolean. They take singleton operands only, and sequences may not be compared. The equality and non-equality operators compare two values of AnySimpleType of the same type. The inequality operators greater-than, etc. apply to all AnySimpleType arguments except for Boolean, which may not be compared.
The literal true represents the Boolean value true. The literal false represents the Boolean value false.
The built-in string() function converts
a single ODOM TNode, an Integer, a
Decimal, a Boolean, or Sequence value, to a String. In the
case of an ODOM TNode the string content of the node
is returned. In the case of an Integer,
Decimal value, or Boolean, the string representation is returned.
In the case of a Sequence, the string
representation of the Sequence is returned, where each element is
delimited by a single space and the sequence is enclosed in
square brackets. No other types are supported.
The + operator is overloaded to handle string concatenation, and is semantically equivalent to the built-in concat function,
i.e. "AB" + "CD" = concat("AB","CD")
The built-in integer() function converts a String to an Integer value. If the value cannot be converted, a runtime exception is raised. The integer() function will truncate Decimal fractions according to the behavior of the BigInteger class, and there will be a loss of precision, i.e. integer(2.4) = 2
The built-in decimal() function converts a String or Integer to a Decimal value. If the value cannot be converted, a runtime exception is raised. [3/11/07]
In Onyx, Decimal provides only comparison operators and type conversions. No artithmetic operations are supported. Thus, 3 + "2.7" is not a legal expression.
FLWR expressions are used to handle iteration and to bind variables to intermediate results. A FLWR expression evaluates as follows:
for clause associates ("binds") one or more variables to values
of expressions, creating variable bindings as for nested loops. The bindings
can be viewed as a list of tuples drawn from the Cartesian product of the
sequences of values to which the expressions evaluate (whether or not the
bindings are actually computed that way). If the sequence for a particular for clause does not contain any values, then the remaining body of the FLWR is not evaluated since there are no bindings. The result for this particular for clause would be an empty sequence. For example,
the for clause binds the variables ($i,$j) in order to corresponding elements of the list of 2-tuples ((3,8), (3,9), (3,10), (4,8), (4,9), (4,10)) constructed from the Cartesian product of the sequences (3,4) and (8,9,10). This is equivalent to a (nested) pair of for-clauses:
for $i in (3,4)The first binds $i to successive elements of the Sequence
(3,4); for each of those bindings, the second binds $j to successive
elements of the Sequence (8,9,10).
let clause contains one or more variables, and binds each
variable directly to the result of evaluating an expression.
The let clauses generate one tuple with the variable
bindings. For example, the let clause
let $k := 15, $x := "a", $y := $k + 3
let $k := 15
let $x := "a"
let $y = $k + 3
The first binds $k to 15, the second binds $x to "a", the third binds $y to the value of $k + 3.
If for clauses are present, the variable bindings created by let clauses can be viewed as
being added to the tuples generated by the for clauses.
In the context of the
for-let clause
the variables ($i,$j,$k) are bound in order to corresponding elements of the list of 3-tuples ((3,8,15), (3,9,16), (3,10,17), (4,8,15), (4,9,16), (4,10,17)). An equivalent way to view this is as the Sequence of clauses:
for $i in (3,4)The first binds $i to successive elements of the Sequence (3,4); for each of those bindings, the second binds $j to successive elements of the Sequence (8,9,10); for each of those bindings, the third binds $k to the value of 7 + $j.
There is also a shortened form of the let expression, that uses the comma (,) to delimit successive bindings:let $k := 15, $x := "a", $y := $k +3
where clause can be used as a filter for the tuples of variable
bindings generated by the for and let clauses. The where-expression in
the where clause is evaluated once for each of these tuples.
It is a dynamic error if any of the tuples does not produce a Boolean result.
If the value of the where-expression is true,
the tuple is retained and its
variable bindings are used in an execution of the return clause. If it is
false, the tuple is discarded. return clause contains an expression that is used to construct
the value of the FLWR expression itself. The return clause is invoked once for
every tuple generated by the for and let clauses, after
filtering out any tuples that do not satisfy the conditions of a where clause. The expression
in the return clause is evaluated once for every invocation, and the result of
the FLWR expression is an ordered Sequence containing the results of these
invocations. The ordering of the resulting Sequence must be the order
in which items were added to it by the return clause. | Input: |
for $i in (3,4), $j in (8,9,10) return ($i*$j) |
| Result: |
<?xml version="1.0" encoding="UTF-8"?> <onyx-result> <Result>24 27 30 32 36 40</Result> </onyx-result> |
A conditional if-then-else expression in Onyx contains a
test-expression, a then-expression, and an else-expression. The test-expression
is evaluated first.
It is a dynamic error if the test-expression does not produce a Boolean result.
If the test-expression is
true, then the value of the conditional is the value of the
then-expression; if it is false, the value of the conditional is the
value of the else-expression.
for $i in (1, 2),
$j in (3, 4)
return
($i, $j)
may be rewritten as a composition of several simple for expressions.
for $i in (1, 2) return
for $j in (3, 4) return
($i, $j)
The language composed of these simpler expressions is called the Onyx Core language and may be described by a special grammar. Since the Core subset of Onyx covers only FLWR expressions, we won’t go to the trouble of sepcifying a separate grammar, the interested reader is referred to the XQuery semantic specification for the details. There are two other constructs covered by the Core language. First, the comma "," construct for the let construct. Second, the where clause.
The where clause is provided as syntactic sugar, and may be replaced by an appropriate conditional expression. For example the following for loops are equivalent:
|
for $i in (3,4,5,7) where ($i > 4) return $i |
for $i in (3,4,5,7) return if ($i > 4) then $i else ( ) |
Onyx provides "computed" attribute environment and Node constructors. These are specified using built in constructors attrenv(), enode(), and tnode(). These ODOM constructors, as well as addAttribute(), deal with tagnames and attribute names. The parameters to these functions take onyx.types.String values as arguments. The XML (and thus OXML) specification restricts the values of tagnames and attribute names. Since onyx.types.String values can have values which are invalid XML names, the ODOM library may raise an exception, namely an OnyxXMLException, if used with a value that does not meet the specification.
In general, the definition of a QName provides a good guideline as to what values these XML names may have, however the exact definition is shown below as quoted from the XML 1.1 specification:
| NameStartChar | ::= | ":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] | [#xD8-#xF6] | [#xF8-#x2FF] | [#x370-#x37D] | [#x37F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] | [#x10000-#xEFFFF] |
| NameChar | ::= | NameStartChar | "-" | "." | [0-9] | #xB7 | [#x0300-#x036F] | [#x203F-#x2040] |
| Name | ::= | NameStartChar (NameChar)* |
In the table above, a valid XML tagname or attribute name must match the Name rule.
Attribute Environment Constructor: The attribute environment constructor takes no arguments. Onyx attributes are specified by adding key-value pairs to an attribute environment. The addAttribute() function takes two arguments, an attribute name and a value. (This operation produced a side effect. [1/11/07]) The attribute name and attribute value are evaluated. The name must be of type String. The value must be of type String. If more than one value is added to the environment with identical keys, then the value for that attribute is overwritten. The value of an attribute can be retrieved from an attribute environment by using the getAttributeValue() function, which takes one argument specifying the key to lookup. The value of an attribute is the string representation of the value associated with the specified key.
| Input: |
let $a := attrenv() let $a := addAttribute( $a, "key1", string(4+3) ) let $a := addAttribute( $a, "key2", "anotherValue" ) return (getAttributeValue( $a, "key1" ), getAttributeValue( $a, "key2" )) |
| Result: |
<?xml version="1.0" encoding="UTF-8"?> <onyx-result> <Result>7 anotherValue</Result> </onyx-result> |
Element Node Constructor: The element constructor takes the tag name and an attribute environment. The name must be of type String. The content of an element Node is a Sequence of text nodes or element nodes. Child nodes are added to an element node using the addChildNode() mutator. Thus, this operation produced a side effect. [1/11/07] A Sequence of the children can be retrieved using the children() accessor function.
| Input: |
let $attr := attrenv() let $elt := enode( "element", $attr ) let $elt2 := enode( "element2", $attr ) let $elt := addChildNode( $elt, $elt2 ) return $elt |
| Result: |
<?xml version="1.0" encoding="UTF-8"?> <onyx-result> <element> <element2/> </element> </onyx-result> |
Text Node Constructor: The text Node constructor takes the tag name, an attribute environment, as well as the text content of the node. The name must be of type String. The content of the text node must be a String. The contents of the text node can be retrieved using the string() function, which is defined to return the text content of the text node when the argument of type onyx.types.TNode.
| Input: |
let $attr := attrenv() let $attr := addAttribute( $attr, "key1", "value1" ) let $en := enode( "eltNode", $attr ) let $attr := attrenv() let $attr := addAttribute( $attr, "key2", "value2" ) let $tn := tnode("txtNode", $attr, "this is a spiffy text node") return addChildNode( $en, $tn ) |
| Result: | <?xml version="1.0" encoding="UTF-8"?> <onyx-result> <eltNode key1="value1"> <txtNode key2="value2">this is a spiffy text node</txtNode> </eltNode> </onyx-result> |
A worked example: [1/11/07]
In this example, we show how to construct the following document.
| Input: |
let $attr := attrenv(), $elt := enode( "D", $attr ) for $i in 1 to 4 let $tag := "E" + string($i), $child := tnode( $tag,$attr, string($i)), $elt := addChildNode( $elt, $child ) return $elt |
| Result: |
<onyx-result> <D> <E1>1</E1> <E2>2</E2> <E3>3</E3> <E4>4</E4> </D> </onyx-result> |
Two points of note. Because addChildNode() is a side-effect producing operation, the for loop returns 4 references to the same object. This is true even though a new binding of elt is made on each iteration of the loop. However, we see only one instance of the element with tag D. The reason is that the underlying W3C DOM library org.w3c.dom has removed duplicate child nodes (see appendChild()).
User-defined functions
Functions in Onyx are defined by specifying the function name and parameter names. Parameters and the return value may be given a definite type, but this information is optional. By default, an untyped parameter or return result is given the type onyx.types.AnyType. However, Onyx does not support the use of explicit generic types such as onxy.types.AnyType within Onyx progarms. These are used only for internal purposes (similar rules apply to onyx.types.Number, which will be discussed below). See the Scope Rules for details related to functions. A function is defined in the following manner.
declare function funcName( $varname1 as onyx.types.someType, $varname2 ) as onyx.types.Decimal { function body };
The use of the semi-colon differentiates the declarations from the remainder of the program in which successive values are delimited by commas. Below is a concrete example of how to define a function that returns a Sequence of the first 10 Fibonacci numbers.
| Input: |
declare function fibR( $numFib as onyx.types.Integer, $prev2, $prev1 ) as onyx.types.Sequence {
declare function fib( $numFib as onyx.types.Integer ) as onyx.types.Sequence { fib( 10 ) |
| Result: |
<?xml version="1.0" encoding="UTF-8"?> <onyx-result> <Result>1 1 2 3 5 8 13 21 34 55</Result> </onyx-result> |
Names in Onyx are introduced in three ways: as names of defined functions, and as names of variables in variable declarations, for-clauses, let- clauses, and as formal parameters in function definitions. Scope rules determine where in an Onyx program a name can be used, and what the name refers to. Onyx uses static scoping; more specifically, at any point in the program there are two scopes to consider, a local scope and a global scope. Global variables are read-only, and are visible in all scopes. They are introduced using a declaration, rather than defined with a let.
Built-in Onyx function names have scope over every Onyx program. A function name declared in a function definition in an Onyx program prolog has scope in the entire program, and the order of function definitions is not signficant. Thus, forward references are allowed (However, as specified in the grammar, all function definitions must appear in the prolog, before the QueryBody) [2/28/07]
declare function f() { g(1) };
declare function g($x) { $x +1 };
f()
A variable appearing in a formal parameter list of a function definition has scope over the body of the function. The effect of multiple formal parameters with the same name is undefined, though it is not an error.
Variables are lexically scoped, except for
variables declared in the query prolog, which have global scope.
Variables may appear lexically unbound in function definitions; this is reported as a dynamic error when the function is called.
| Input: |
declare function g($w) { $w + $y }; declare variable $y {7}; let $v := 3 return (g($v),$y+3) |
| Result: |
<?xml version="1.0" encoding="UTF-8"?> <onyx-result> <Result>10 10</Result> </onyx-result> |
Onyx is a strongly typed language; the value of an Onyx expression that does not raise any errors is always of a definite type. Thus, the value of an expression can be thought of as a pair consisting of the value and the type of the value. In some cases the type can be completely determined statically, from the syntactic type of the expression; in other cases the precise type can only be determined dynamically, at runtime. For example, reading in an OXML document creates a set of nodes, i.e. the document() function. Associated with each Node is a type and the types cannot be known until run time
Onyx does not have user-defined types, so the only types available are built-in. Onyx is based on simple built-in type hierarchy, and the root of the hierarchy is AnyType. (Onyx does not support XML document schema validation). We can think of AnyType as an abstract class, i.e. with direct instances. This is somewhat like Java Object, except that the type is abstract. There are three subtypes of AnyType: Sequence, AnySimpleType and, XmlType. The subtype AnySimpleType handles the atomic values like String, Number Decimal, and so on. The subtype XMLType handles the xml document entities such as ENode, TNode, and attribute environment.
Variables, parameters, and functions cannot be declared to be typed as abstract types (AnyType, AnySimpleType, XmlType, Number, and Node.). As mentioned previously, when types are left unspecified, they default to AnyType, but this type cannot be used explicitly and is for internal accounting only.
The Onyx types are listed in the following table.
| Typename |
Parent type |
Comments |
| AnyType |
----- |
The "root" type. |
| AnySimpleType |
AnyType |
The parent type of all items, i.e. values that can be
contained in a Sequence. |
| Sequence |
AnyType |
An ordered Sequence of 0 or more items.
Note however that there
is no distinction between an item (i.e., a Node or an atomic value) and a
singleton Sequence containing that item, i.e., an item is equivalent to a
singleton Sequence containing that item and vice versa. |
| XmlType | AnyType | The parent type for all Onyx-supported ODOM constructs; an abstract type. |
| String |
AnySimpleType |
A string of Unicode characters. An atomic type. |
| Number |
AnySimpleType |
The parent type of all numeric types, an abstract type. |
| Decimal |
Number |
A numeric data type whose instances consist of an
unscaled value of arbitrary length, and a scale.
Only comparison operators are supported.
(See java.math.BigDecimal.)
|
| Integer |
Number |
A subtype of Decimal in which scale is 0. (See java.math.BigInteger.) |
| Boolean |
AnySimpleType |
A Boolean type, with values true or false. An atomic
type. |
| Node |
XmlType |
An ODOM node, parent of all Node types. (An abstract
type) |
| TNode |
Node |
A text node. |
| ENode |
Node |
An element node. |
| AttrEnv | XmlType | An attribute environment. Consists of a Sequence of attribute values, with the restriction that no two attribute values in the Sequence have the same name. |
Type names in an Onyx program must be written using the fully qualified type name. That is, a Decimal value would have the type onyx.types.Decimal and an element node would have the type onyx.types.ENode.
The only forms of automatic type conversion supported by Onyx are promotions from a single value to a Sequence (e.g. 1 to (1), but applies to any XmlType or AnySimpleType), in comparison expressions involving a mixture of types Number and String (Integer and Decimal, with Integer implicitly promoted to Decimal, as described below), or when passing an argument of type AnySimpleType or XmlType to a function that requires a Sequence, or an argument of type Integer to a function that expects a Decimal [1/11/07] All other type casts must be done explicitly, and may involve only the built in types Decimal, Integer, String, and Sequence.
The Onyx interpreter must report an error to standard error output, in exactly the format described below.
The following kinds of errors must be detected. In each case, only the first-occurring error in the program needs to be detected. There are three categories of errors: static parser errors, static semantic errors, and dynamic semantic errors. Static errors are considered to come "before" any dynamic errors. Parser errors are those that deal with the code's compliance with the language. Semantic errors are those that occur when interpreting/evaluating the individual constructs of the code. Since parser errors deal with the grammar and type names, all parser errors can be determined prior to evaluation and thus are all static errors. Semantic errors can be either dynamic (during evaluation) or static (prior to evaluation). The report of each type of error has the following forms:
<?xml version="1.0" encoding="UTF-8"?>
<onyx.error.ParserError>
<StaticError column="X" line="Y">error message</StaticError>
</onyx.error.ParserError>
<?xml version="1.0" encoding="UTF-8"?>
<onyx.error.SemanticError>
<StaticError column="X" line="Y">error message</StaticError>
</onyx.error.SemanticError>
<?xml version="1.0" encoding="UTF-8"?>
<onyx.error.SemanticError>
<DynamicError column="X" line="Y">error message</DynamicError>
</onyx.error.SemanticError>
where X and Y are the line and column location of the error in the input program. The format of the detail string depends on the specific kind of error, as described in the following table.
| Error
|
Category
|
Location of error
|
Detail String
|
Details |
| Lexical error | Parser (Static) |
The location of the lexical error token | Lexical Error: X | X is the token that caused the error |
| Syntax error |
Parser (Static) |
The location of the token that caused the syntax
error |
Syntax Error |
None |
| Use of an invalid type name | Parser (Static) |
The location of the QName token of the invalid type name | T is not a valid Onyx type | T is the name of the invalid Onyx type |
| Attempt to define an already defined function |
Semantic (Static) |
The location of the QName token of the function name
in the attempted function definition |
Function F with prototype P
already defined |
F is the name of the function;
P is the prototype of the function being defined |
| Attempt to define an already defined global variable | Semantic (Static) |
The location of the variable name token of the variable being redeclared | Global variable $X already defined | $X is the name of the variable |
| Argument type incompatible with function or
operator before and after applying valid type promotions |
Semantic (Dynamic) |
The location of the token identifying the function or operator |
Function with prototype P not found |
P is the prototype of the called function or operator, e.g. op:numeric-add(onyx.types.Decimal,onyx.types.String) All possible matches are reported. |
| Top level Sequence contains mixed content (i.e. Node and non-Node types) | Semantic (Dynamic) |
None | Top level Sequence cannot contain mixed content: S | S is the string representation of the top level Sequence. OXML content will be displayed using entity references. |
| Attempt to evaluate an unbound variable |
Semantic (Dynamic) |
The location of the token identifying to the
variable |
Variable $X not bound |
$X is the variable |
| Attempt to call an undefined function |
Semantic (Dynamic) |
The location of the QName token naming the function in
the attempted call |
Function with prototype P not found |
P is the prototype of the called function or operator, e.g. foo(onyx.types.Decimal,onyx.types.String) |
| Error reading OXML file (not found, or not
well-formed) |
Semantic (Dynamic) |
The location of the QName token naming the document
function |
Error loading OXML data from file named F |
F is the name of the file |
| OXML Attribute Name is invalid | Semantic (Dynamic) |
The location of the function call attempting to set the attribute (addAttribute) | Attribute name N is an invalid OXML attribute name | N is the invalid attribute name |
| OXML Tagname is invalid | Semantic (Dynamic) |
The location of the constructor attempting to set the tagname (tnode() or enode()) | Tagname N is an invalid OXML tagname | N is the invalid tagname |
| If-Condition contains a non-Boolean value | Semantic (Dynamic) |
The location of the expression with incorrect type | If condition is not of type onyx.types.Boolean | None |
| Where Condition is not of type onyx.types.Boolean[3/7/07] | Semantic (Dynamic) |
The location of the expression with incorrect type | Where condition is not of type onyx.types.Boolean | None |
| Function return value type does not match declared return type | Semantic (Dynamic) |
The location of the function call that caused the error | Function N declared to return type E, but found F | N is the name of the function. E is the fully qualified type name of the expected return type. F is the fully qualified type name that was found. |
| Attempt to assign value of incorrect type when declaring a variable |
Semantic (Dynamic) |
The location of the variable name token |
Type not assignable to $X |
$X is the variable |
| Attempt to convert non-numeric string to number | Semantic (Dynamic) |
The location of the type-cast function call that caused the error | "S" is not a valid number | S is the invalid number |
| Divide by zero | Semantic (Dynamic) |
The location of the
|
Division by zero is invalid | None |
| Modulus is not greater than zero | Semantic (Dynamic) |
The location of the mod operator for which the error occurred | Modulus value must be greater than zero | None |
Onyx program examples:
| Input: | 3 + "3" |
| Result: |
<?xml version="1.0" encoding="UTF-8"?> <onyx.error.SemanticError> <DynamicError column="5" line="1"> <ErrorMessage>Function with prototype op:numeric-add(onyx.types.Integer,onyx.types.String) not found</ErrorMessage> <PossibleMatch>onyx.types.Integer op:numeric-add(onyx.types.Integer,onyx.types.Integer)</PossibleMatch> <PossibleMatch>onyx.types.String op:numeric-add(onyx.types.String,onyx.types.String)</PossibleMatch> </DynamicError> </onyx.error.SemanticError> [2/28/07] |
| Input: | for $i in 3 to 8 for $j in 9 to 5 where $i <= ($j - 3) return $i * 2 + $j |
| Result: | <?xml version="1.0" encoding="UTF-8"?> <onyx-result> <Result>15 14 13 12 17 16 15 19 18 21</Result> </onyx-result> |
| Input: | declare function foo( $x as onyx.types.Integer ) { () }; declare function foo( $j as onyx.types.Integer ) { () }; |
| Result: | <?xml version="1.0" encoding="UTF-8"?> <onyx.error.SemanticError> <StaticError column="18" line="2">Function foo with prototype foo(onyx.types.Integer) already defined</StaticError> </onyx.error.SemanticError> |
Any errors not defined here are undefined, and thus an implementer may choose any reasonable method for handling these cases.
| Operator | Arity |
Meaning | Comments |
|---|---|---|---|
op:numeric-add (+) |
2 |
Numeric addition String Concatenation |
Integer numeric type conversions apply Semantically equivalent to concat |
op:numeric-subtract (-) |
2 |
Numeric subtraction | Integer numeric type conversions apply |
op:numeric-multiply (*) |
2 |
Numeric multiplication | Integer numeric type conversions apply |
op:numeric-integer-divide (idiv) |
2 |
Numeric integer division | Integer numeric type conversions apply |
op:numeric-mod (mod) |
2 |
Numeric modulus | Integer numeric type conversions apply |
op:numeric-unary-plus (+) |
1 |
Numeric unary plus | Integer numeric type conversions apply |
op:numeric-unary-minus (-) |
1 |
Numeric negation | Integer numeric type conversions apply |
| op:to (to) |
2 |
Range operator |
Creates a Sequence of numbers. Both parameters must be of type Integer |
| op:and (and) |
2 |
Boolean conjunction |
Standard Boolean type conversions apply |
| op:or (or) |
2 |
Boolean disjunction |
Standard Boolean type conversions apply |
| op:equals (=) |
2 |
Value equality test |
See notes on value comparisons |
| op:not-equals (!=) |
2 |
Value nonequality test |
See notes on value comparisons |
| op:less-than (<) |
2 |
Value less-than test |
See notes on value comparisons |
| op:greater-than (>) |
2 |
Value greater-than test |
See notes on value comparisons |
| op:less-than-equals (<=) |
2 |
Value less-than-or-equals test |
See notes on value comparisons |
| op:greater-than-equals (>=) |
2 |
Value greater-than-or-equals test |
See notes on value comparisons |
op:not-equalsop:equals
will test for equality or non-equality of two
AnySimpleType values of the same type: Boolean Integer Decimal and String.
The inequalty operators op:less-than,
op:greater-than, op:less-than-equals, and
op:greater-than-equals compare two values of the
same type, except for Boolean.
Built-in functions produce no side effects except
for addChildNote() and addAttribute().
| Function Name | Parameters | Parameter Description | Return Type* | Description |
String functions |
||||
| concat | String | First string | String | Returns the String of the concatenation of the two strings passed as parameter. |
| String | Second String | |||
| document | String | URI to a valid OXML document | Node | Returns
the root node of the XML document whose path is specified by the
parameter. The resulting XML tree is OXML compliant. Argument must be a
string specifying a URI, possibly a filename.
If the OXML document does not exist or is not well formed, a dynamic exception is raised.
Only relative paths are supported for filenames.
Absolute paths
are not considered to be well-formed file names,
and will result in an Error loading OXML data.
The document() can specify a special
relative path using the
@ specifier. This path is currently set
to ~/../public/Examples/xmlFiles.
Thus, to read from the file
~/../public/Examples/xmlFiles/asdf.xml we specify:
document("@/asdf.xml").
[1/31/07]
|
| enode | String | Tagname | ENode | Returns an Onyx ENode representing an OXML element with the specified tagname and attribute environment. May raise an OnyxXMLException if the tagname is not a valid XML name. |
| AttrEnv | The attribute environment for the element | |||
| tnode | String | Tagname | TNode | Returns an Onyx TNode representing an OXML element with the specified tagname, attribute environment, and text content. May raise an OnyxXMLException if the tagname is not a valid XML name. |
| AttrEnv | The attribute environment for the element | |||
| String | Text content | |||
| addChildNode | ENode | Element to add child to | ENode | Returns a reference to the ENode to which the child was added. The reference is the same as the parameter passed in. This is one of two side-effect producing built-in operations. [1/11/07] |
| Node | The Node to add as a child | |||
| children | ENode | An ENode | Sequence | Returns an Onyx Sequence containing an ordered set of Onyx nodes representing the children of the specified element node. |
| tagname | Node | An Onyx node | String | Returns a string representing the tagname of the specified node. |
| setAttrEnv | Node | An Onyx Node | Node | Sets the attribute environment of the specified Node. Returns a reference to the Node. |
| AttrEnv | An attribute environment | |||
| getAttrEnv | Node | An Onyx Node | AttrEnv | Returns the attribute environment of the specified Node. |
| attrenv | n/a | AttrEnv | Returns a new empty attribute environment object | |
| addAttribute | AttrEnv | An attribute environment | AttrEnv | Adds the specified key value pair to the attribute environment. If the key already exists, the value is replaced. Returns a reference to the specified attribute environment. May raise an exception if the attribute key is not a valid XML name. This is one of two side-effect producing built-in operations. [1/11/07] |
| String | An attribute key | |||
| String | An attribute value | |||
| getAttributeKeys | AttrEnv | An attribute environment | Sequence | Returns a Sequence of Strings representing the keys of the specified attribute environment. |
| getAttributeValue | AttrEnv | An attribute environment | AnyType | Returns the value for the specified key in the specified attribute environment. The return value is of type onyx.types.String if the value exists. If the key does not exist, then an empty sequence is returned. |
| String | The key | |||
Node Tests |
||||
| isNode | AnyType | Any value | Boolean | Returns true if the specified value is an Onyx Node type, false otherwise. |
| isENode | AnyType | Any value | Boolean | Returns true if the specified value is an Onyx ENode type, false otherwise. |
| isTNode | AnyType | Any value | Boolean | Returns true if the specified value is an Onyx TNode type, false otherwise. |
Type Casts |
||||
| string | AnyType | Any value | String | Returns the string representation of any simple type value, Sequence, or TNode. If the value is an TNode, the text content of the node is returned. |
| integer | Number or String | Any value | Integer | Returns an Integer value. A String containing a legal numeric value is converted to an Integer (an error is raised if the string is not a valid Number.) Both string and Decimal values are converted to an Integer value with the decimal part truncated. If an Integer is passed in, a copy is returned. |
Sequence Operators |
||||
| length | Sequence | A Sequence | Integer | Returns the number of elements in the Sequence. |
| first | Sequence | A Sequence | AnyType | Returns the first element in the Sequence. If the sequence is empty, an empty sequence is returned. |
| tail | Sequence | A Sequence | Sequence | Returns all elements of the Sequence except the first. If the sequence is empty or contains only one element, then an empty sequence is returned. |