CSE 131A - Parser Assignment

Assignment #2 - Onyx Syntax Analysis
Due: 9:00 pm Friday, February 9, 2007

Phases

See the A2 Phases document for guidance.

Description | Tasks | Deliverables | Resources

Change log

Date Description
02-05-07 Added statement about normalization of return statement out of the AST.
01-26-07 Release of the document.

Description

Your second assignment is to write a parser for the Onyx language. Your Onyx parser will output an OXML representation of the Abstract Syntax Tree (AST). The parser must recognize the language described in the Onyx Grammar Specification. The generated output must conform to the Onyx AST Specification.

The Onyx language is a simplified variant of the W3C standard XQuery language. At the syntactic level, there are several notable changes. An obvious difference is the absence of "database" operations such as XPath. Instead, Onyx exposes a set of datatypes for manipulating the OXML documents in memory, which can be used to implement advanced queries. Onyx also eliminates several operators as well as type expressions. These features are not necessary for many simple problems.

The Onyx Semantic Specification specifies the precise meaning of Onyx constructs. However, you should need only refer to that document casually for the purposes of this assignment. You will implement the Onyx semantics in the interpreter you build in the next assignment.

We recommend that a parser generator tool such as CUP be used to create the parser. If you used JFlex to generate your scanner in the previous assignment, it will integrate well with a CUP-generated parser. A parser driver with particular functionality is in any case required for this assignment, as described below. Integration with the CUP classes will require access to the java_cup.Main class and the java_cup.runtime package, which is easily accomplished if your CUP installation point is included on your CLASSSPATH (Running prep cs131w will take of this for you).

Tasks

This assignment's task is to write a program that will perform syntax analysis on an input file and output the result as an OXML file. The syntax analysis must follow the definition in the Onyx Grammar Specification. The output is either an OXML document that conforms to the Onyx AST Specification (if the parse succeeds), or a representation of the syntax error contained in the input file. Report only the first error encountered, and use the following format.


      <syntaxError column="1" line="5"/>

You must use an empty element tag with tag name syntaxError. Locate the position of the error using the two attributes as shown--these are the same as those reported in in the lexical scanner assignment.

Deliverables

You will turn in your parser project as a UNIX tar archive. When untarred into an empty working directory, a Makefile must appear at the top level. Other files and directory structure may also appear, as appropriate, but must contain no ".class", ".o", ".jar" or executable files.

The Makefile must have a target named "parse". The command

make parse

issued in the top level working directory should build (i.e., compile) your parser and parser driver, resulting in an executable file named parse in the working directory.

The parse program will be run with the command line of the form

   ./parse filename

issued in that working directory. It should parse the input file named on the command line (which it may assume is a Unicode file using UTF-8 encoding).  The results of the parse must appear in an output file which has the same name as the input file, but with an added ".xml" extension.

The format of the output file is described below. As in the previous assignment, you will use the onyx_xml library to generate and output your OXML documents.

If the input file was a syntactically correct Onyx program, then the top-level element of the output OXML document will be one query element, and the format of the file will follow the Onyx AST Specification.

Be sure and read over the discussion on normalization for dealing with FLWR expressions, as there is no AST node for FLWR. Instead, there are separate node types for for and let. The return clause is normalized out of the AST, and there is no return node. [02/05/07]. In addition, the where clause is normalized out of the AST, and a IfThenElseExpr node is used instead.

If the input file contains a syntax error, the top-level element of the output OXML document will be one errors element, containing one syntaxError element with line and column attributes indicating the line number, and column in that line, of the input file of the beginning of the token being read when the first syntax error was detected. Line and column numbers should start at 1.

Automated tools will check the contents of the output file for correctness with respect to the required AST format and the syntax specification. No additional checking of functionality of the parser will be performed, and you will not be tested on any undefined behavior. (You may treat such behavior as you wish. Feel free to ask questions if you are curious about your options.)

Turnin

In addition to your assignment, you must include three other files:

  • a file named MEMBERS which contains the names and login names of the members of your programming team (even if there is only one of you).
  • a file named DECLARATION stating that the work was your own, and that you did not violate the Academic Integrity policies in effect for the course.
  • a file called TeamEval describing your experiences.
  • Refer to the A2 Phases document for additional instructions about these files.

    Assignments that do not include a properly completed DECLARATION, MEMBERS, and TeamEval files will not receive a grade and will be treated as late assignments.

    From the directory containing your Makefile, MEMBERS, DECLARATION, and TeamEval files and the root of your A2 development directory, run the command TURNIN.A2 and follow the dialog. You can run this command as many times as you want up until the assignment deadline; any earlier turnin of yours will just be overwritten. In fact it is recommended that you run the command early, as soon as you have a partial solution to the assignment, to make sure the turnin mechanism is working for you.

    DON'T WAIT UNTIL THE LAST MINUTE TO RUN TURNIN FOR THE FIRST TIME!!!

    Often problems are uncovered when running TURNIN. You'll need to resolve these problems before your submission can be accepted. Waiting until the last minute to run TURNIN for the first time per assignment is done at your own risk!

    Advice

  • Check the document Designing an AST Implementation for suggestions on defining your own AST structures.
  • Look at mk_xml.java for examples of how to use the onyx xml library to create ODOM and OXML objects.
  • There is a sample parser in ~/../public/Examples/DC that implements a revised version of the desk calculator example appearing in the text. See the README file for instructions.

  • Resources

    onyx_xml library
    The onyx_xml library may be used to create OXML documents in memory, to output them to a file, and also to read a compliant OXML document into memory. From ACS machines, it is pre-installed in the CSE 131A Tools directory: ~cs131w/public/Tools/onyx_xml.jar. Documentation is found HERE.

    CUP
    For this assignment, the CUP tool will be useful. From ACS machines, it is pre-installed as a jar file in the CSE 131A Tools directory: ~cs131w/public/Tools/java-cup-v11a.jar.  CUP is free software, released under the GPL, and available for free download directly from the support site located at http://www2.cs.tum.edu/projects/cup/.

    Test Cases
    We have published test cases with corresponding output. The actual test case files and the results reside in ~/../public/Materials/A2/Tests.
    (The filename for each test result adds a .xml suffix to the filename for its corresponding test case.)
    [As of Friday 1/26, we have posted 2 test cases. A more complete set will appear by Monday morning.]

    xmlpp & xmldiff
    Xmlpp is "pretty-printer" that formats OXML files in a readable way; xmldiff is a tool that shows you the differences between two OXML files.  These tools are available in the CSE 131A environment.  Run  xmlpp -h   and   xmldiff -h  for documentation. Use the -ns option for xmldiff to suppress output that is identical. (If the files are identical, there won't be any output)