Bray-Curtis dissimilarity
Algorithms - Similarity
Written by Jan Schulz
Wednesday, 12 September 2007 23:15

Bray-Curtis dissimilarity

Objective

The non-metric Bray-Curtis dissimilarity (Bray & Curtis 1957) delivers robust and reliable dissimilarity results for a wide range of applications. It is one of the most commonly applied measurements to express relationships in ecology, environmental sciences and related fields.

Equation

Bray-Curtis is a modified Manhattan measurement, where the summed differences between the variables are standardised by the summed variables of the objects. The general equation of the Bray-Curtis dissimilarity is:

In the equation dBCD is the Bray-Curtis dissimilarity between the objects i and j, k is the index of a variable and n is the total number of variables y. The Bray-Curtis similarity dBCS is a slightly modified equation. It can be directly calculated from the dissimilarity value:

dBCS = 1 - dBCD

In opposite to the dissimilarity approach a dBCS value of 0 means a complete absence of relationships.

Synonyms

Bray-Curtis similarity and dissimilarity values are often multiplied by 100 and given as percentile proportions. It is very similar to the definition of the Sørensen distance. Sometimes the term Czekanowski’s coefficient is erroneously used for Bray-Curtis indices.

Usage

When investigating data covering a wide range it might be useful to use a transformation beforehand. It must be considered that Bray-Curtis is not metric when choosing a statistic for the evaluation of the output matrix. When data are ≥0 the Bray-Curtis similarity is within the range of 0 to 1. A value of 1 indicates a complete matching of the two data records in the n-dimensional space. Both dBCD and dBCS are sometimes multiplied by 100 and given as percentile values.

Higher values impact the result of the Bray-Curtis similarity more dominant and imply that these variables are the likely to discriminate between objects. It is not affected by joint zeros (Field et al. 1982), but the result is undefined, when the variables among two objects are entirely 0. In this case the denominator becomes 0 and Clarke et al. (2006) suggest to use a zero-adjusted Bray-Curtis coefficient that includes a virtual dummy variable being 1 for all objects. In the numerator this variable subtracts to zero and in the denominator it sums to 2:

The effect is that objects with variables being entirely zero now have one variable in common and zero is returned.

Algorithm

The algorithm controls whether the data input matrix is rectangular or not. If not the function returns FALSE and a defined, but empty output matrix. When the matrix is rectangular the Bray-Curtis dissimilarity will be calculated. Therefore the dimensions of the respective arrays of the output matrix are set, and the titles for the rows and columns set. As the result is a square matrix, which is mirrored along the diagonal only values for one triangular part and the diagonal are computed. When errors occur during computation the function returns FALSE.

To calculate the Bray-Curtis similarity the Bray-Curtis dissimilarity matrix is computed first and thereafter transformed.

Source

`Function dist_BrayCurtisDissimilarity (InputMatrix : T2dVariantArrayDouble; Var OutputMatrix : T2dVariantArrayDouble) : Boolean;// The function CalcBrayCurtisDissimilarityMatrix calculates the Bray-Curtis dissimilarity// matrix between several cases, which are expected in the rows. The variables are// expected in the columns. Function returns FALSE if at least one cell can not be// calculated. The result matrix is returned in OutputMatrix.// (c) Jan Schulz, 24.December 2005; www.code10.infoVar  InputCols        : Integer;     InputRows        : Integer;     OutputMatrixSize : Integer;     RunnerY          : Integer;     RunnerX          : Integer;     Numerator        : Double;     Denominator      : Double;     i                : Integer;     FirstVal         : Double;     SecondVal        : Double;     Dissimilarity    : Double;Begin  // if one dimension is zero or matrix is not rectangular quit  If Not mtx_IsRectangular (InputMatrix, InputRows, InputCols) THen  Begin    // create an empty matrix, return FALSE and exit    mtx_Create (OutputMatrix, 1, 1, NaN, 'Erroneous Bray-Curtis dissimilarity matrix');    dist_BrayCurtisDissimilarity := False;    Exit;  end;  // let's expect the best case ...  dist_BrayCurtisDissimilarity := True;  // create an output matrix of required size  mtx_Create (OutputMatrix, InputRows, InputRows, NaN, 'Bray-Curtis dissimilarity matrix');  //copy the respective titles  For RunnerY := Low (InputMatrix.RowTitle) to High (InputMatrix.RowTitle) do  Begin    // names for rows and columns are the same in this triangualary matrix    OutputMatrix.RowTitle [RunnerY] := InputMatrix.RowTitle [RunnerY];    OutputMatrix.ColTitle [RunnerY] := InputMatrix.RowTitle [RunnerY];  end;  // compare every object  For RunnerY := Low (OutputMatrix.Cells) to High (OutputMatrix.Cells) do  Begin    // with every other    For RunnerX := Low (OutputMatrix.Cells) to RunnerY do    Begin      Numerator   := 0;      Denominator := 0;      //use all variables of each object under comparison      For i := 0 to High (InputMatrix.Cells [0]) do      Begin        FirstVal  := InputMatrix.Cells [RunnerX, i];        SecondVal := InputMatrix.Cells [RunnerY, i];        If Not (IsNAN (FirstVal) Or IsNan (SecondVal)) THen        Begin          Numerator   := Numerator   + Abs (FirstVal - SecondVal);          Denominator := Denominator + (FirstVal + SecondVal)        end        Else        Begin          dist_BrayCurtisDissimilarity := False;        end;      end;      // can we calculate a Bray-Curtis dissimilarity value for these two objects?      If Denominator <> 0 THen Dissimilarity := Numerator / Denominator                          Else                          Begin                            // can not calculate as denominator is zero                            Dissimilarity := NAN;                            dist_BrayCurtisDissimilarity := False;                          end;      // set the value on both sides of the diagonal or diagonal itself      OutputMatrix.Cells [RunnerX, RunnerY] := Dissimilarity;      OutputMatrix.Cells [RunnerY, RunnerX] := Dissimilarity;    end;  end;end;Function dist_BrayCurtisSimilarity (InputMatrix : T2dVariantArrayDouble; Var OutputMatrix : T2dVariantArrayDouble) : Boolean;// The function dist_BrayCurtisSimilarity calculates the Bray-Curtis similarity// matrix between several cases, which are expected in the rows. The variables are// expected in the columns. Function returns FALSE if at least one cell can not be// calculated. The result matrix is returned in OutputMatrix. This function depends// on the function CalcBrayCurtisDissimilarityMatrix.// (c) Dr. Jan Schulz, 24.December 2005; www.code10.infoVar RunnerX : Integer;    RunnerY : Integer;Begin  // calculate the Bray-Curtis dissimilarity matrix  Result := dist_BrayCurtisDissimilarity (InputMatrix, OutputMatrix);  // convert dissimilarity matrix into a similarity matrix  For RunnerY := Low (OutputMatrix.Cells) to High (OutputMatrix.Cells) do  Begin    For RunnerX := Low (OutputMatrix.Cells [RunnerY]) to High (OutputMatrix.Cells [RunnerY]) do    Begin      OutPutMatrix.Cells [RunnerY, RunnerX] := 1 - OutPutMatrix.Cells [RunnerY, RunnerX];    end;  end;  If Result THen OutputMatrix.MatrixName := 'Bray-Curtis similarity matrix'            Else OutputMatrix.MatrixName := 'Erroneous Bray-Curtis similarity matrix';end;`

Example

For a data matrix aInputMatrix of the type t2dVariantArrayDouble, populated with:

 Data Var1 Var2 Var3 Case1 1 1 1 Case2 1 1 0 Case3 2 2 2 Case4 10 10 10 Case5 11 11 11 Case6 10 5 0

the call of:

aBooleanVar := dist_BrayCurtisDissimilarity (aInputMatrix, aOutputMatrix);

returns the respective Bray-Curtis dissimilaritiy matrix in aOutputMatrix:

 Bray-Curtis dissimilarity Case1 Case2 Case3 Case4 Case5 Case6 Case1 0 0.200 0.333 0.818 0.833 0.778 Case2 0.200 0 0.500 0.875 0.886 0.765 Case3 0.333 0.500 0 0.667 0.692 0.619 Case4 0.818 0.875 0.667 0 0.048 0.333 Case5 0.833 0.886 0.692 0.048 0 0.375 Case6 0.778 0.765 0.619 0.333 0.375 0

Although the Euclidean distance between the objects Case1 and Case3 is the same as between Case4 and Case5, the Bray-Curtis dissimilarity indicates a higher relationship between the objects Case4 and Case5. This is due to the fact that the analysis gives more weight to variables with higher values. Thus, it is very useful when interested in analyses, where high joint presences are more important than sparse ones. This effect can be weakened by initial transformations.

Literature

Bray J.R., Curtis J.T. (1957): An ordination of the upland forest communities of Southern Wisconsin. Ecological Monographies 27:325-349.

Clarke K.R., Somerfield P.J., Chapman M.G. (2006): On resemblance measures for ecological studies, including taxonomic dissimilarities and a zero-adjusted Bray-Curtis coefficient for denuded assemblages. Journal of Experimental Marine Biology and Ecology 330:55-80.

Field J.G., Clarke K.R., Warwick R.M. (1982): A practical strategy for analysing multispecies distribution patterns. Marine Ecology Progress Series 8:37-52.

Last Updated on Monday, 23 November 2015 17:21