MCS-388 Lab 2: Scanning and Parsing


In this lab, you will turn your AST classes from the previous lab into a true compiler by equipping them with a scanner and parser. That way, your compiler will be able to read in any program (within its limited language) and generate the translation for that program, rather than always generating the translation for one fixed program as in the prior lab.

Starting point

Download and unzip the file This will produce a directory c2 with subdirectories structured similarly to the previous lab, for use with ant. The two new subdirectories, flex and cup, contain the source code for the scanner and parser, ready for JFlex and Java CUP to translate into Java.

Copy your Java source files from the previous lab from your c1/src into c2/src, because your new compiler will build on your work from the prior lab.

The new source file cup/Parser.cup is a simplified grammar with Java actions embedded in it for constructing AST nodes. Those actions assume that some of your classes from the previous lab have specific names: SumExpr, DifferenceExpr, ProductExpr, QuotientExpr, and RemainderExpr. If you chose different names for your classes, you will need to modify the names in cup/Parser.cup correspondingly, so that it creates instances of your classes.

One you have copied all the files in and adjusted the constructed class names if necessary, you can try building and testing the initial version of the compiler. In a shell with your current directory being c2, give the command ant. With any luck, the output you see should end with a successful execution of assembly code in the spim simulator.

If you look at the input-file used in the testing, you will see that the program it contains consists of a single arithmetic expression. A look at the grammar in cup/Parser.cup should confirm that as the compiler currently stands, the syntax of a program contains nothing but a single expression. Moreover, if you look at flex/scanner.jflex, you'll see another limitation of the current compiler: although it skips whitespace, it has no provision for skipping comments. (You could try putting a comment into input-file and see that it trips up the compiler.) These observations set the stage for the work you need to do in order to expand the compiler to a somewhat more realistic source language. In subsequent labs, you will further expand the language.

Before getting to work on your extensions to the compiler, it would be worth noting that the build.xml file has additional features beyond the ability to run the whole compiler-creation and test compilation process. For example, you can test just the scanner by running ant test_scanner to see what tokens it identifies. I would be happy to explain any of the other testing features that might interest you.

Changes you need to make

First, modify the scanner so that it will skip comments and test that this works. At a minimum you should skip comments that start with // and extend up to the end of the line. If you feel like it, it would be nice to also skip those that start with /* and extend through the next instance of */.

Second, modify the syntax of statements so that a statement is no longer just an unadorned expression, with the command to print out its value being implicit. Instead, a statement should now look like the following example:


This will require changes in both the scanner and the parser. Again, you should stop and test that your compiler works with examples of the modified language.

Third, modify the syntax of programs so that a program is no longer just an unadorned statement, but rather has some extra symbols at the beginning and end, as in the following example:


Once more, this will require changes in both the scanner and the parser. The initial lexeme main can be a special keyword of your language if you like, or you can allow any identifier to appear here and output a procedure with that name. (However, spim will only be able to automatically start the procedure running if its name happens to be main.) As the language currently stands, the keyword approach may make more sense. The identifier approach, however, will be more of a start toward a future lab where we compile programs with multiple procedures. Whichever way you choose to modify the language here, be sure to stop and test it.

Fourth, modify the syntax of programs further so that between the curly braces the programmer can write any number of statements, rather than only one. This will require changing your parser and may also require you to add one more more new AST classes, depending on how thorough a job you did of lab 1. At this point, the language is as complete as it will be for this lab, and you should give your compiler one last testing.

What you should turn in

Please electronically submit your files. The root directory should include a plain text file called README which identifies what changes you made (i.e., what I should grade) and any testing or bug reports. To submit the electronic copy, in the main directory, please type:

  ant clean
  ~wolfe/public/388/submit .