Assignment #2 - Onyx Syntax Analysis
Due: 9:00 pm Friday,
February 9, 2007
Description | Tasks | Deliverables | Resources
| Date | Description |
| 02-05-07 | Added statement about normalization of return statement out of the AST. |
| 01-26-07 | Release of the document. |
Your second assignment is to write a parser for the Onyx language. Your Onyx parser will output an OXML representation of the Abstract Syntax Tree (AST). The parser must recognize the language described in the Onyx Grammar Specification. The generated output must conform to the Onyx AST Specification.
The Onyx language is a simplified variant of the W3C standard XQuery language. At the syntactic level, there are several notable changes. An obvious difference is the absence of "database" operations such as XPath. Instead, Onyx exposes a set of datatypes for manipulating the OXML documents in memory, which can be used to implement advanced queries. Onyx also eliminates several operators as well as type expressions. These features are not necessary for many simple problems.
The Onyx Semantic Specification specifies the precise meaning of Onyx constructs. However, you should need only refer to that document casually for the purposes of this assignment. You will implement the Onyx semantics in the interpreter you build in the next assignment.We recommend that a parser generator tool such as CUP be used to create the parser. If you used JFlex to generate your scanner in the previous assignment, it will integrate well with a CUP-generated parser. A parser driver with particular functionality is in any case required for this assignment, as described below. Integration with the CUP classes will require access to the java_cup.Main class and the java_cup.runtime package, which is easily accomplished if your CUP installation point is included on your CLASSSPATH (Running prep cs131w will take of this for you).
This assignment's task is to write a program that will perform syntax analysis on an input file and output the result as an OXML file. The syntax analysis must follow the definition in the Onyx Grammar Specification. The output is either an OXML document that conforms to the Onyx AST Specification (if the parse succeeds), or a representation of the syntax error contained in the input file. Report only the first error encountered, and use the following format.
<syntaxError column="1" line="5"/>
You will turn in your parser project as a UNIX tar archive. When untarred into an empty working directory, a Makefile must appear at the top level. Other files and directory structure may also appear, as appropriate, but must contain no ".class", ".o", ".jar" or executable files.
The Makefile must have a target named "parse". The command
make parse
issued in the top level working directory should build (i.e., compile) your parser and parser driver, resulting in an executable file named parse in the working directory.
The parse program will be run with the command line of the form
./parse filename
issued in that working directory. It should parse the input file named on the command line (which it may assume is a Unicode file using UTF-8 encoding). The results of the parse must appear in an output file which has the same name as the input file, but with an added ".xml" extension.
The format of the output file is described below. As in the previous assignment, you will use the onyx_xml library to generate and output your OXML documents.
If the input file was a syntactically correct Onyx program, then the top-level element of the output OXML document will be one query element, and the format of the file will follow the Onyx AST Specification.
Be sure and read over the discussion on normalization for dealing with FLWR expressions, as there is no AST node for FLWR. Instead, there are separate node types for for and let. The return clause is normalized out of the AST, and there is no return node. [02/05/07]. In addition, the where clause is normalized out of the AST, and a IfThenElseExpr node is used instead.If the input file contains a syntax error, the top-level element of the output OXML document will be one errors element, containing one syntaxError element with line and column attributes indicating the line number, and column in that line, of the input file of the beginning of the token being read when the first syntax error was detected. Line and column numbers should start at 1.
Automated tools will check the contents of the output file for correctness with respect to the required AST format and the syntax specification. No additional checking of functionality of the parser will be performed, and you will not be tested on any undefined behavior. (You may treat such behavior as you wish. Feel free to ask questions if you are curious about your options.)
In addition to your assignment, you must include three other files:
Refer to the A2 Phases document for additional instructions about these files.Assignments that do not include a properly completed DECLARATION, MEMBERS, and TeamEval files will not receive a grade and will be treated as late assignments.
From the directory containing your Makefile, MEMBERS, DECLARATION, and TeamEval files and the root of your A2 development directory, run the command TURNIN.A2 and follow the dialog. You can run this command as many times as you want up until the assignment deadline; any earlier turnin of yours will just be overwritten. In fact it is recommended that you run the command early, as soon as you have a partial solution to the assignment, to make sure the turnin mechanism is working for you.
DON'T WAIT UNTIL THE LAST MINUTE TO RUN TURNIN FOR THE FIRST TIME!!!
Often problems are uncovered when running TURNIN. You'll need to resolve these problems before your submission can be accepted. Waiting until the last minute to run TURNIN for the first time per assignment is done at your own risk!