MAXD – maximally different objects

This topic is empty.

Viewing 2 posts - 1 through 2 (of 2 total)

Author
Posts
August 18, 2005 at 5:31 am #412
dillewh
Member
Dear Forum,
The DOS-version of PATN contains an experimental module MAXD which interatively selects and lists sites that are most different from those already selected plus indicates a value at which these sites are added to the selected set. Can anyone tell me what this value represents and how it is calculated ?
Thanks in advance
August 19, 2005 at 4:19 am #478
lee
Keymaster
MAXD tries to find the k most different objects as a subset of N objects forming the complete data set. It does this by searching a standard (lower symmetric) PATN association (dissimilarity) matrix. MAXD also provides for reading a set of sequence numbers of objects in a subset of the complete N objects, and an association matrix, and then calculating the overall maximal difference according to the selected criteria.
The ‘maximal difference’ idea has a number of applications. For example, it could be the driving force behind the question “If a subset (only) of taxa can be reserved, which taxa should they be?” The argument goes that the subset should be those that have in some sense maximal (potentially genetic?) differences. As another example, phase-one of the ALOC algorithm uses a simple form of maximal difference in an attempt to ensure that the multivariate space is adequately sampled by ‘seed’ objects.
The algorithms in MAXD could also be used as robust alternatives of the ‘minimal set’ algorithm as proposed by Margules, Nicholls and Pressey (1988; see the MSET module). The ‘robustness’ in this instance may be justified by making the reasonable assumption that using a matrix of association measures has integrated species information into a more realistic estimate of site differences. Species are usually equally weighted in most pattern analysis. However, retrospective estimation of species ‘weights’ after classification or ordination (from the association matrix) will reveal that species contribute differentially; some strongly revealing underlying gradients while others either noisy or seemingly ‘marching to a different drum’.
Option 1
Is due to me and is very simple! It first finds the two objects separated by the greatest dissimilarity value. It next tests all candidates looking for the object that has maximal total distance to all of the already selected objects. This object is then entered and the remaining candidates are tested in the same way against all previously selected objects.
Option 2
The second option even simpler. It looks for the candidate that has the largest distance to its closest already-selected object. That is, it finds the candidate object with the maxim distance to the closest of the objects already-selected.
Option 3
This option is due to the work of Dan Faith (Australian Museum, Sydney) and relates to a cladistics environment. The first step is the same as option 1. The second and subsequent steps look for that object ‘j’, which has a MAXIMUM of its MINIMAL distance to all other PAIRS of objects. Using ‘k’ and ‘l’ as a typical pair, the distance is calculated as
Distance=(Djk+Djl-Dkl)/2
This corresponds to a taxonomic path length (number of character differences) of the added branch. The concept here is to find the taxa with maximal character difference to those already selected.
Cheers
Lee
Author
Posts

Viewing 2 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic.

PATN - Finding Patterns in Data