DNA Sequence Alignment Checklist

Frequently Asked Questions

What are the main goals of this assignment? You will (i) solve a fundamental problem in computational biology, (ii) learn about the analysis of algorithms, and (iii) learn about a powerful programming paradigm known as dynamic programming.

How do I read in the two input strings from the file? Use readLine() and redirection as usual.

How do I access the length of a string s? The ith character? Use s.length and s[i], respectively. As with arrays, indices start at 0.

Can I assume that the input characters will always be A, C, G or T? NO! Your program should work equally well for any letter, upper case or lower case. It fact, It shoud work for all characters.

What's a StringIndexOutOfBoundsException? It's just like an ArrayIndexOutOfBoundsException. It results from invoking s[i] with an illegal value of i.

How do I know if my opt array contains the correct values.? You may want to define a function that can be used to print two-dimensional array. For debugging purposes, call this function whenever you want to print the values of your opt array. NOTE - you probably only want to do this on small examples (e.g., input strings of length less than 10).

Which alignment should I print out if there are two or more optimal ones? Output any one you like.

Where can I learn more about dynamic programming? The Longest Common Subsequence (LCS) problem is another example of a dynamic programming problem on strings. However, it is different from the current problem in many ways, so do not simply mimic the code without understanding what it does.

Memory, Timing, and Operating System Issues

What does OutOfMemoryError mean? When java (the JVM emulator) runs, it requests a certain amount of memory from the operating system. The exact amount depends on the version of java and your computer but can vary from 64MB to 1024MB (1GB). After java has started, the total size of all variables in use cannot be larger than what it originally requested. Trying to do so causes an OutOfMemoryError.

For this assignment, the largest test cases use huge arrays, and Java needs to ask for enough memory from the operating system. To explicitly ask for for more (or less) memory, use the -Xmx flag. For example, to request 500 megabytes (500 MB) of memory for a run, use

kotlin -J-Xmx500m DpEditDistanceKt < input.txt

Here 500m means 500 MB. You should adjust this number depending on the amount of memory your computer has and the size of the arrays you will need for the data set you are running. The amount 500MB should get you through ecoli10000.txt. To run ecoli20000.txt you will need to request more memory.

What does "Could not reserve enough space for object heap" mean? This occurs if you use -Xmx with a value that is larger than the amount of available physical memory. Additionally, due to address space limitations, some 32-bit versions of Windows also will give this error if you try to request more than approximately 1.5GB, no matter how much physical memory is installed.

How do I determine how much physical memory is installed on my computer? On Mac, select About this Mac from the Apple menubar. On Windows, press Windows-R (or Run on the Start menu), enter msinfo32 and look for total physical memory.

I'm getting a stack overflow error. What should I do? Ask java for more stack space. For example, to ask for 5MB of stack space, type

kotlin -J-Xss5m MemEditDistanceKt < input.txt

Adjust the amount of stack space request as needed.

How can I measure how long my program takes on each file? To measure the running time of your program, there are a few techniques.

My timing data do not fit a polynomial hypothesis. What could I be doing wrong?

Testing and Debugging

Testing.   To help you check the part of your program that generates the alignment, there are many test files in the data directory.

  1. Many of the small files are designed so that it is easy for you to determine what the correct answer should be by hand. Test your program on these cases to see that it gets these easy cases right.
  2. Here are the optimal edit distances of several of the supplied files.

    ecoli2500.txt   118
    ecoli5000.txt   160
    fli8.txt          6
    fli9.txt          4
    fli10.txt         2
    ftsa1272.txt    758
    gene57.txt        8
    stx1230.txt     521
    stx19.txt        10
    stx26.txt        17
    stx27.txt        19
  3. The test case worked through as an example in the assignment description, which is the same as the example10.txt file, has a unique optimal alignment. (Some test inputs like "xx y" have more than one optimal alignment.) So your code should give the exact same output on example10.txt as in the assignment page.
  4. Here are two more test cases with unique optimal alignments:

    $ kotlin DpEditDistanceKt < data/endgaps7.txt  $ kotlin MemEditDistanceKt < data/fli10.txt
    Edit distance = 4                                  Edit distance = 2
    a - 2                                              T T 0
    t t 0                                              G G 0
    a a 0                                              G G 0
    t t 0                                              C T 1
    t t 0                                              G G 0
    a a 0                                              G G 0
    t t 0                                              A T 1
    - a 2                                              A A 0
                                                       C C 0
                                                       T T 0

Enrichment