A java string is essentially a sequence of characters. For example, the java string constant AFRICA
is a character sequence of length 6. Like arrays, characters in a java string are indexed starting from 0. For a string X, we will write X[i..j) to mean the substring of X consisting of all characters from position i to position j − 1 inclusive. For example, if X is AFRICA
, then X[1..4) is FRI
and X[0..6) is AFRICA
itself.
If one deletes characters at certain positions from a given string X, what remains is called a subsequence of X. For example, deleting the characters at positions 1 and 4 from AFRICA
leaves us with the string ARIA
. So we may say that ARIA
is a subsequence of AFRICA
. Deleting no characters is also permitted. So, for instance, AFRICA
is considered a subsequence of itself.
A sequence Z is a common subsequence of sequences X and Y if Z is a subsequence of both X and Y. For instance, DIN
is a common subsequence of DYNAMICPROGRAMMING
and DIVIDEANDCONQUER
.
A longest common subsequence (LCS) of sequences X and Y is a common subsequence of X and Y of maximum possible length. For instance, DYNAMICPROGRAMMING
and DIVIDEANDCONQUER
have DICOR
as an LCS. This is because DICOR
is their common subsequence and no common subsequence of length 6 exists. DICON
is another LCS, so LCS’s are not unique.
Let two sequences X = X0X1…Xm − 1 and Y = Y0Y1…Yn − 1 be given. We want to find an LCS of X and Y.
For 0 ≤ i < m and 0 ≤ j < n, let opt
(i, j) be the length of an LCS of X[i..m) and Y[j..n).
We seek (0, 0).
Suppose Z = Z[0..k) is an LCS of X[i..m) and Y[j..n).
If Xi = Yj, then necessarily Xi = Z0. We can then show that Z[1..k) is an LCS of X[i + 1..m) and Y[j + 1..n).
If Xi ≠ Yj, then Xi ≠ Z0 or Yj ≠ Z0. If Xi ≠ Z0, we can show that Z is an LCS of X[i + 1..m) and Y[j..n). If Yj ≠ Z0, we can show that Z is an LCS of X[i..m) and Y[j + 1..n).
We know that one of the above cases must occur. This gives us the following recurrence.
opt(i, j) = 0, if i=m or j=n
= opt(i+1, j+1) + 1, if 0<=i<m, and 0<=j<n, and X_i=Y_j
= max {opt(i,j+1), opt(i+1,j)}, if 0<=i<m, and 0<=j< n, and X_i is not equal to Y_j
. | s | a | i | n | t | . |
---|---|---|---|---|---|---|
s | 3 | 2 | 1 | 1 | 1 | 0 |
a | 2 | 2 | 1 | 1 | 1 | 0 |
t | 2 | 2 | 1 | 1 | 1 | 0 |
a | 2 | 2 | 1 | 1 | 0 | 0 |
n | 1 | 1 | 1 | 1 | 0 | 0 |
. | 0 | 0 | 0 | 0 | 0 | 0 |
Step 1. Fill in a table of opt( ⋅ , ⋅ ) values, plus a companion table of maximizers. We can fill in the table row-by-row, column-by-column, or diagonal-by-diagonal.
Step 2. Find the LCS by following maximizer pointers, starting from opt(0, 0).