Two Step vs Bray & Curtis
Home › Forums › PATN and Pattern Analysis › PATN Discussion › Two Step vs Bray & Curtis
- This topic is empty.
- May 5, 2008 at 12:51 am #429MarkMember
First I would like to say that the introduction of PATN into my work has been an absolute godsend. It is doing what it should do and that is making greater sense of my data. Thanks Lee for the development of this software.
Onto my question:
In my analysis I am examining vegetation plot data taken from a variety of ‘named’ vegetation types within a large regional context (central – upper hunter valley NSW). At this stage I have included 95 samples with 380 intrinsic variables (species). I have numerous extrinsic as well (geology, soils, aspect, topographic position, season etc).
Many of my colleges who have analysed datasets for the region have used a PATN analysis (DOS I suspect). One of these analysis, this being the most relevant to my area of interest, used a Bray and Curtis for the association and a Kulzyncski for the ordination. In my analysis I initially used a Bray Curtis for both, then a Two step for both then a Bray Curtis/ Two Step hybrid.
Firstly none of my analysis was completed in a manner consistent with other regional anlayses. Does this represent a problem when attempting to compare the results of my analysis with the results of others?
Secondly, in the web worked examples it is suggested that a two step be used for larger datasets (which I clsassify mine as being). Is the two step to be applied to both assocaition and ordination or just the ordination? When comparing the stress levels between analysis soley using the Bray Curtis and those soley using the two step there were clear differences with two step appearing more desirable (Bray Curtis SL = 0.23 compared to Two Step SL = 0.12). Based on the SL I would prefer to use the two step, but is this a less powerful association measure.
Finally, why would my college use the Kulzyncski in preference for the Two Step (as recommended by one of the worked examples).May 9, 2008 at 4:41 am #509leeKeymaster
With 380 species (and a high ordination stress), it seems appropriate to deal with the species first. There are two ways that this can be done.
You could run a classification of the sites (using Bray and Curtis or Kulzynski as both are very similar) and then look at the Box and Whisker plots to get an idea of how the species are discriminating the groups (and to some extent, relating to each other). You could also run a classification of species using two-step to produce a dendrogram and species groups. In reality, PATN does both in a single ‘analysis’ run anyway.
In fact, when you put the site and species classification together in a two-way table, you have probably the best way of looking at this style of dataset. This is the table that I would spend a fair time with – it will show you the interaction between sites and species. If you can’t make a good story from that, you have other problems.
Before I go further, what values are you using? Counts, abundance, presence/absence? With this style of data, a large proportion of the ‘information’ in the dataset is in the presence/absence component. Adding counts/cover/abundance adds a lot of noise, so if you really want to take some form of abundance into account, I’d recommend considering a transformation onto a 0-5 type scale. This reduces the noise (and the stress!). Just use the most appropriate transformation to get the data into this form. I usually transform such data into integers 0,1,2,3,4 and 5.
You would not normally use two-step for site classification or anything other than two-step for species classifications! Also, whatever association measure is used for the SITE classification IS used for the ordination. Full stop.
As my help notes state, I’d like to see ordination stress below ~0.15 to be happy about publishing anything. To achieve this, a range of options are available (which can be combined in various configurations)-
Recode as noted above. Eliminate redundant and noisy species using the species results information noted above. Produce species groups and then use these as new variables for a site classification. This last option is pretty neat and specially combined with the elimination of noisy or redundant or ubiquitous/common species.
I’ve never found these techniques to fail but iterations of the analysis are required – but this is normal anyway.
Hope this helps.
LeeMay 9, 2008 at 4:55 am #511MarkMember
Thanks for your detailed response.
I am using cover/abundance so my data is 0,1,2,3,4,5,6,7 (with no scores being above 5).
Quote: “You would not normally use two-step for site classification or anything other than two-step for species classifications! Also, whatever association measure is used for the SITE classification IS used for the ordination. Full stop.”
OK. Got you. So to better frame my question – what my collegues must be doing is using Braye & Curtis for row (site) association/ classification and Kulzynski for Column (species) association/ classification. Im assuming that this is appropriate.
Thanks again I will mull over this and try. I’ll let you know how I go.
- You must be logged in to reply to this topic.