Forum Replies Created
“PATN should identify an outlier and this ‘group’ should only attract objects that are closer to it than any other group.”
Yes it did and assigned a group to the one object. I found this unsatisfactory given that all my other centroids represented 6+ objects.
“Not sure what Monte-Carlo process you are referring to”
I ran the PCC and Monte-Carlo (MCOA) routine in a heirachial clustering analysis of the imported group centroids derived from non-heirachial analysis. That’s alright is it not?
“The non-hierarchical process in PATN v3+ uses random seeds (see the help for the complete process). While this sounds odd, it works fine in practice.”
Yes I understand this and happy about it.
RE: CANOCO – I agree with your sentiment but I’m not a statistician and thus get a bit lost in the maths. I like the idea that environmental gradients should be examined by the species variables alone (in this case) rather than integrate other factors as intrinsics. It seems a bit odd to culminate a variety of measures into one analysis – bias could prevail (unless the data is standardised – but there could even be problems there, right?)
The only danger I see with the non-hierachial clustering analysis is if the centroid is ‘skewed’ by a couple of objects. For instance I’ve noticed that the Monte-carlo permutations were ‘nasty’ with the non-hierachial clustering – which is partly due to the reduction of objects in the ordination space (to be expected) but also (i think) the skewed centroids.
I attempted to overcome this issue (partly successfully too) by using the hierachial analysis to first identify obvious outliers, sequentually remove them, reconfirm the data spread with a subsequent hierachial analysis then perform the non-hierachial clustering analysis. My guess is that this increased the reliability of the centroid being in the right spot. Does this make sense?
Since that post I have made ‘quantum’ leaps on this topic. I even bravely ventured into the non-heirachial clustering analysis, imported the row group stats (medians) and then ran a heirachical with ordination. Neat.
I still haven’t quite understood the ‘weaknesses’ of doing what I did above. The advantage was that all quadrats and species were included (minus a couple of obvious outliers as identified in a prelim heirachial analysis). This increases the robustness of the analysis.
However, choosing the right number of groups is a bit tricky – it matters!
What weaknesses are there in comparision to the hierachial clustering?
Also, I have researched CANOCO a bit. It seems PATN and CANOCO do similar things but in different ways. i.e. CANOCO can directly evaluate environmental gradients along with the variables whereas PATN cannot (what I mean is that PATN can only infer the environmental gradients from the variables – species in my case). Is this correct?
I know extrinsics can represent your environmetnal variables, such as aspect etc, thus analysed in PATN, but this seems less robust when compared to CANOCO. Is this correct or am I missing something?
Can I do everything I need to do in PATN or does CANOCO offer me more in other areas?
No worries. Always happy to contribute – it helps me articulate my thoughts better and better still I get exposed to new ideas. Look forward to chatting re: application of PATN.
By the way I’m not actually doing a regional veg study. What I am doing is trying to fit my data analysis (regional data) with a published regional study/ associated map. It hasn’t been the most straight forward analysis – too many ‘factors’ to consider with perhaps the most difficult being the broad community transitions compounded by limited survey sites and disturbances.
I notice your in the lower hunter. Are you using PATN for work to or Uni research?
I have not attempted such a task as yet. I will probably experiment with this sometiome in the future, but based on the type of data I’ve got and the apparent variability in the PATN analysis for the regional vegetation study, I’m not sure if the Fidelity classifications would be all that informative.
I did perform a rather exhaustive google search for the Fidelity stuff and came up with the same link. I don’t think you can import the .ptn file, and yes Mike’s program uses the dos version. Talking with Lee I understand that there may be Fidelity classification built into the next version. That would be nice if it is truely a value adding tool.
Still a bit swamped by the whole thing atm, but from persisting with countless analysis and re analysis I suspect a better way forward would be to look really closely at the two way tables and box whisker plots rather than rely on the Fidelity classification. I could be wrong though and would be happy to be told so whilst I muddle on with my work in isolation ❗
I spent considerable time reading and re-reading the worked examples. After about the 10th time, it takes a while for my brain to switch on, I started to really get the jist of it. If you haven’t invested the time or thoroughly understand the worked examples, then I really recommend looking at them some more. I am still reading tehm on occassion, there is alot of insight in those examples!
have a good weekend
Thanks again for your advice. I will seek out that paper and digest. It seems from its title to be most appropriate.
Yes, I will be publishing the stress value. I hope you didn’t get the impression that I wasn’t. The question was asked because I did not find the stress level for the PATN analysis that the regional vegetation study that relates to my area of interest. As such I had no way of comparitively measuring my data set with that ‘benchmark’ vegetation study. I felt like I was in a bit of a black hole.
None the less I reviewed the dendrogram of that regional study and ‘comapred’ it with mine, with the dichotmoy’s being relatively similar. On that basis I suspect the stress levels may very well be similar.
“disjunctions = high stress” hmmm food for thought and thanks.
Thanks again as I have really appreciated your assistance. Statistics has never been my strong suit, but none the less an interest to say the least.
It was Mike’s fidelity work.
OK I had another long session of reading, toggling and analysis on administering your advice and reduced stress to 0.1946. This is as low as I can get it.
I must admit to neglecting the value of the two way table. Never again. It was quite valuable in helping me gain a better understanding of the data. It is clear from the analysis that there is multiple sources of noise, with most of the noise coming from disturbance related factors (i.e. remnant size and vast array of past land uses including temporal variability). Unfortunately these factors represent truth in the area investigated, thus there is no escaping it.
The other noisy component of the data is the presence of a regional intergrade between at least 2 if not three vegetation communities. One could possibly interpret that as a signal, which is how I’m viewing it, as the presence of intergrades is natural and expected.
One last question. Should one report the resultant stress level? I would think that this is a responsible thing to do as it conveys to the reader the ‘strength’ of the analysis. It can also be used to demonstrate that the data patterns are quite complex, perhaps being an expected outcome?
Thanks for your pointers Lee and cheers till later.
I’ll check to see if it was Mikes fidelity work or yours.
Thanks for your detailed response.
I am using cover/abundance so my data is 0,1,2,3,4,5,6,7 (with no scores being above 5).
Quote: “You would not normally use two-step for site classification or anything other than two-step for species classifications! Also, whatever association measure is used for the SITE classification IS used for the ordination. Full stop.”
OK. Got you. So to better frame my question – what my collegues must be doing is using Braye & Curtis for row (site) association/ classification and Kulzynski for Column (species) association/ classification. Im assuming that this is appropriate.
Thanks again I will mull over this and try. I’ll let you know how I go.