Implementation of the classic Dynamic Programming problem using the Needleman–Wunsch algorithm which requires quadratic space & time complexity.
Given 2 sequences, find the minimum cost of aligning the 2 sequences (case insensitive).
Gaps can be inserted to 1 sequence or the other, but incur a penalty.
2 = Gap Penalty (δ)
If 2 characters are aligned with each other, there may be a mismatch penalty (αi j)
0= aligning identical letters1= aligning a vowel with a vowel, or a consonant with a consonant3= aligning a vowel with a consonant
Minimum cost = sum of mismatch & gap penalties (the optimal alignment)
There may be multiple optimal paths, but this only finds 1 of them

O(M*N)
Storage: also O(M*N)
- Alphanumeric characters only (all others including spaces are sanitized out)
- Case insensitive
_(Underscore character) represents gaps but only when displaying results (it will be removed & ignored if it's present in a sequence)- View results in a fixed-width font for the 2 sequences to be lined up
predecessorIndexesis calculated when creatingmemoTablebut not used to find the actual alignment, only to show where the values inmemoTablecame from- For example: if
predecessorIndexes[4][4]contains the array[4, 3], it means the value ofmemoTable[4][4](which is6) came frommemoTable[4][3](i.e.case3, a character inseq2was aligned with a gap so it came from the left) predecessorIndexes[0][0]contains the array[-1, -1]because the upper left corner has no predecessor
- For example: if
- Provide 2 strings (edit
testSequences2D array) & runcalculateAndPrintOptimalAlignment() - Optionally change the penalties by passing in arguments to the non-default constructor
CurrentlyvowelVowelMismatchPenalty,consonantConsonantMismatchPenalty&numberNumberMismatchPenaltyare the same, but all are arbitrary
seq1&seq2have a leading space added (after sanitizing illegal & whitespace characters)
This is so thatseq1.charAt(i)&seq2.charAt(j)work in the loops & the string indexes match up
It causes some adjustments for the array sizes.memoTable = new int[seq1.length()][seq2.length()]uses the exact value oflength()which includes the space- Loops like
for(int i=0; i < seq1.length(); i++)usei < seq1.length()to not go out of bounds of the string indexes - In
findAlignment(),int i = seq1.length() - 1;&int j = seq2.length() - 1;since the 2 sequences have leading spaces & arrays indexes start from0 findAlignment()retraces each calculation in thememoTableto see where it came from
- String Alignment using Dynamic Programming - Gina M. Cannarozzi to retrace the memo table & find the alignment

