Identity dot-plots Print
Algorithms - Dot-plots
Written by Jan Schulz   
Tuesday, 10 June 2008 11:23

Identity dot-plots

 

For the further references, tables and images please refer to the 'Introduction to dot-plots' article on this site.

Algorithm

A robust approach to enhance the method of the initial example is the use of a frame with a window size of more than one symbol (McLachlan 1971). Inside the sliding window pairwise matching characters are counted. This allows comparing substrings of defined lengths from the sequences (Figure 3), which act as templates.

 

Figure 3: Example of dot plot creation with a window size of w = 4. In the window, two out of four symbols match (space and ‘E’) and result in a grey level of 50% for the respective substring comparison.

 

 

 

Formally expressed it means sequence T1 and T2 are data words from a finite alphabet. Length of T1 is m and length of T2 is n. Symbols within T1 are addressed by index i, in T2 by index j. The term T1[i] refers to the ith symbol in the sequence T1. Identity dot plots compare all possible substrings with the length of the window size w with each other. The number of comparable substrings of length w from T1 is k=m-w+1 and the respective number of substrings from T2 is l=n-w+1. Results of the comparisons are stored in a l×k matrix M. For every element of the matrix M(i, j), with 1 i l and 1 j k, we compare the substrings ST1,j and ST2,i. Substrings ST1,j and ST2,i are defined as the w consecutive characters from T2[i] to T2[i+w-1] and T1[j] to T1[j+w-1]. In the substrings the pairwise matching symbols are counted. This is similar to the calculation of a Hamming distance (Hamming 1950), except that matches instead of mismatches contribute to the result. The resulting M is a characteristic presentation of the recurrent patterns in the included sequences T1 and T2 with respect to window size w. This means it is unique, although larger patterns remain when w is modified. Applying the method again to the DNA sequence EU127468 (Table 1), the difference becomes obvious (Figure 2, lower panel).

Source

Type t2dArrayInt = Array of Array of Integer;

Function CreateDot plotIdentity (aSequence1, aSequence2 : String; MatchWindowSize : Integer; Var ResultMatrix : t2dArrayInt) : Boolean;
// The function CreateDot plotIdentity creates a matrix representing
// the identity dot plot result for comparison among two sequences
// with a given window size. (c) Dr. Jan Schulz, 21-25. May 2007
// www.code10.info
Const aScore = 1;

Var AlignsSeq1 : Integer;
AlignsSeq2 : Integer;
aPosInSeq1 : Integer;
aPosInSeq2 : Integer;
aTempResult : Integer;
aCounter : Integer;
Begin
// hoping for the best we expect no errors
Result := True;

//calculate number of aligns to render
AlignsSeq1 := Length (aSequence1) - MatchWindowSize + 1;
AlignsSeq2 := Length (aSequence2) - MatchWindowSize + 1;

// number of both aligns must be > 0
If (AlignsSeq1 <= 0) Or (AlignsSeq2 <= 0) THen
Begin
// Return empty but defined ResultMatrix, notify error & terminate
SetLength (ResultMatrix, 0, 0);
Result := False;
Exit;
end;

//allocate ResultMatrix
SetLength (Resultmatrix, AlignsSeq2, AlignsSeq2);

// in sequence 1 a Matchwindow will be applied back off to every position ...
aPosInSeq1 := AlignsSeq1;
Repeat

// same with sequence 2
aPosInSeq2 := AlignsSeq2;
Repeat

//prior to every new comparison reset the variables
aCounter := MatchWindowSize - 1;
aTempResult := 0;

// compare all symbols within the Matchwindow
Repeat

// when two characters match they contribute to the results
If aSequence1 [aPosInSeq1 + aCounter] = aSequence2 [aPosInSeq2 + aCounter] THen
Begin
aTempResult := aTempResult + aScore;
end;

// decrease the number of remaining characters to compare
aCounter := aCounter - 1;

// and find out whether finished with this sliding window
Until aCounter < 0;

// after comparison store result in ResultMatrix
ResultMatrix [aPosInSeq2 - 1, aPosInSeq1-1] := aTempResult;

// decrease remaining comparisons for sequence 2 with the resp. pos in the first
aPosinSeq2 := aPosInSeq2 - 1;
Until aPosInSeq2 <= 0;

//decrease the number of comparisons to be made for sequence 1
aPosinSeq1 := aPosInSeq1 - 1;

//exit when finished
Until aPosInSeq1 <= 0;
end;

 

Last Updated on Friday, 18 March 2011 17:58