MCS 178: Introduction to Computer Science II (Spring 2011)

Project 5: Global Sequence Alignment


Started: Tuesday 4/5; Due: Friday 4/15

Overview

This assignment is also taken from the projects on the book's web site. In this case, the assignment is the Global Sequence Alignment. Be sure to read my comments below for some clarification and suggestions.

You should work on this project individually.

Getting started

You should get started by reading through Global Sequence Alignment and work through one or two examples by hand. In other words, come up with some short DNA sequences and compute their edit distance using the table method illustrated in the write-up for the sequences AACAGTTACC and TAAGGTCA. You do not have to hand this in, and you may work with a partner on this part.

Required structure

Instead of giving you a template Java file to work from, we are going to have you create the code more or less from scratch. However, in order to enforce some uniformity (which will make the grading easier), we request that you do certain things:

  • You should (just like in Project 2) have a copy of a directory that contains the contents of the sequence directory described in the write-up at the book's web site. I'm assuming you'll put all new files you create in this same directory.
  • The name of the main class should be "EditDistance".
  • You should fill in the readme.txt file.
  • Your final program should function as described at the bottom of the book site's (you choose which of the two output styles), with the following slight difference: you would get the described output by typing java EditDistance < example10.txt.

Submission

You should submit two files: EditDistance.java and readme.txt as described in our instructions on submitting code document.

Gradesheet

  • Implementation (70 pts)
    • Dynamic programming to compute Edit Distance correctly (30 pts)
    • Prints the alignment of two strings correctly (20 pts)
    • Measures the run-time correctly (10 pts)
    • Documentation/Comments (5 pts)
    • Choice of variable names, indentation, other style issues (5 pts)
  • Questions in Readme.txt (30 pts)
    • Alignment explanation (10 pts)
    • Additional test data (5 pts)
    • edit distance and run-time (5 pts)
    • Answer for a, b and largest N (10 pts)