Re: Maximal Difference (MAXD)

August 19, 2005 at 4:19 am #478

Keymaster

MAXD tries to find the k most different objects as a subset of N objects forming the complete data set. It does this by searching a standard (lower symmetric) PATN association (dissimilarity) matrix. MAXD also provides for reading a set of sequence numbers of objects in a subset of the complete N objects, and an association matrix, and then calculating the overall maximal difference according to the selected criteria.

The ‘maximal difference’ idea has a number of applications. For example, it could be the driving force behind the question “If a subset (only) of taxa can be reserved, which taxa should they be?” The argument goes that the subset should be those that have in some sense maximal (potentially genetic?) differences. As another example, phase-one of the ALOC algorithm uses a simple form of maximal difference in an attempt to ensure that the multivariate space is adequately sampled by ‘seed’ objects.

The algorithms in MAXD could also be used as robust alternatives of the ‘minimal set’ algorithm as proposed by Margules, Nicholls and Pressey (1988; see the MSET module). The ‘robustness’ in this instance may be justified by making the reasonable assumption that using a matrix of association measures has integrated species information into a more realistic estimate of site differences. Species are usually equally weighted in most pattern analysis. However, retrospective estimation of species ‘weights’ after classification or ordination (from the association matrix) will reveal that species contribute differentially; some strongly revealing underlying gradients while others either noisy or seemingly ‘marching to a different drum’.

Option 1
Is due to me and is very simple! It first finds the two objects separated by the greatest dissimilarity value. It next tests all candidates looking for the object that has maximal total distance to all of the already selected objects. This object is then entered and the remaining candidates are tested in the same way against all previously selected objects.

Option 2
The second option even simpler. It looks for the candidate that has the largest distance to its closest already-selected object. That is, it finds the candidate object with the maxim distance to the closest of the objects already-selected.

Option 3
This option is due to the work of Dan Faith (Australian Museum, Sydney) and relates to a cladistics environment. The first step is the same as option 1. The second and subsequent steps look for that object ‘j’, which has a MAXIMUM of its MINIMAL distance to all other PAIRS of objects. Using ‘k’ and ‘l’ as a typical pair, the distance is calculated as

Distance=(Djk+Djl-Dkl)/2

This corresponds to a taxonomic path length (number of character differences) of the added branch. The concept here is to find the taxa with maximal character difference to those already selected.

Cheers

Lee

PATN - Finding Patterns in Data