- This topic is empty.
- June 2, 2008 at 10:50 am #431
I don’t know if its shear coincidence or fluke on my behalf, but after a considerable number of analyses to reduce stress I decided to think a bit more about what I was doing (and it hurt!).
Stress is essentially a measure of overall data fit in an ordinated space, right? Well I tested this by including variables with a PCC greater than 0.15 (i.e. correlation), with those excluded from the analysis considered the ‘noisy’ variables. And guess what it just so happens that my SL for that first analysis was 0.1491.
Well I haven’t repeated the theory with a new dataset as yet, so its a one off outcome todate but I’d be interested to know if anyone else has approached their analysis in a similar way and ended up with this result.July 11, 2008 at 6:44 am #521
Masking out (intrinsic) variables with low PCC correlation should reduce stress in ordinations. Ditto if you used the Kruskal-Wallis values from the box and whisker plots (but this is one step removed).
It is important however that the process should not be totally ‘mechanical’. Always seek to understand the trends in the ordination space and the identity of any groups from a classification. In both cases, examination of both intrinsic and extrinsic variables is important.July 12, 2008 at 2:48 pm #522
Since that post I have made ‘quantum’ leaps on this topic. I even bravely ventured into the non-heirachial clustering analysis, imported the row group stats (medians) and then ran a heirachical with ordination. Neat.
I still haven’t quite understood the ‘weaknesses’ of doing what I did above. The advantage was that all quadrats and species were included (minus a couple of obvious outliers as identified in a prelim heirachial analysis). This increases the robustness of the analysis.
However, choosing the right number of groups is a bit tricky – it matters!
What weaknesses are there in comparision to the hierachial clustering?
Also, I have researched CANOCO a bit. It seems PATN and CANOCO do similar things but in different ways. i.e. CANOCO can directly evaluate environmental gradients along with the variables whereas PATN cannot (what I mean is that PATN can only infer the environmental gradients from the variables – species in my case). Is this correct?
I know extrinsics can represent your environmetnal variables, such as aspect etc, thus analysed in PATN, but this seems less robust when compared to CANOCO. Is this correct or am I missing something?
Can I do everything I need to do in PATN or does CANOCO offer me more in other areas?July 14, 2008 at 4:22 am #523
I tend to like non-hierarchical classification. Hierarchical classification is designed to optimize the hierarchy while the non-hierarchical classification optimizes the groups.
Your approach (non-hierarchical classification – take centroids an use them in hierarchical classification & ordination) is a standard. It is by far the most expedient strategy when you have a lot of objects or complexity.
But as you say, how many groups? I don’t find this all that hard as I tend to focus on what results/outcomes I’m trying to communicate. If for example, I’m dealing with ‘management’, I produce 5 groups as ‘5’ different things is about as much as most managers want to hear about.
It comes down the ‘naming’ the groups. There is good reason to take the number of groups to the point where naming is not easy. Then back off knowing that you have a good notion as to what the variation is, and if the groups are say ‘ecologically justifiable’, reproducible etc.
CANOCO – I don’t like it for a number of reasons – the important of which is that it blends exploratory analysis (ordination) with confirmatory analysis (regression). I think this is philosophically abhorrent. In PATN, I take the approach of letting the data produce the patterns (ordination/groups) and then interpreting these patterns with extrinsic/environmental data. I don’t believe in mixing the two. Anyway – that’s my rationale and, as for the foreseeable future, that’s what PATN will look like.
My aim is to make the analysis more powerful (yet simple in concept), more robust, the tools easier to use and the interface/graphics fun.
LeeJuly 14, 2008 at 5:45 am #524
The only danger I see with the non-hierachial clustering analysis is if the centroid is ‘skewed’ by a couple of objects. For instance I’ve noticed that the Monte-carlo permutations were ‘nasty’ with the non-hierachial clustering – which is partly due to the reduction of objects in the ordination space (to be expected) but also (i think) the skewed centroids.
I attempted to overcome this issue (partly successfully too) by using the hierachial analysis to first identify obvious outliers, sequentually remove them, reconfirm the data spread with a subsequent hierachial analysis then perform the non-hierachial clustering analysis. My guess is that this increased the reliability of the centroid being in the right spot. Does this make sense?July 14, 2008 at 6:41 am #525
Classification in general is far less affected (badly) by outliers than is ordination. MDS is based on a regression (between input associations and reduced space associations). Non-hierarchical classification should not be unduly affected by outliers. PATN should identify an outlier and this ‘group’ should only attract objects that are closer to it than any other group.
Not sure what Monte-Carlo process you are referring to.
The non-hierarchical process in PATN v3+ uses random seeds (see the help for the complete process). While this sounds odd, it works fine in practice. In the DOS version of PATN, you could use any seed file.
LeeJuly 14, 2008 at 7:05 am #526
“PATN should identify an outlier and this ‘group’ should only attract objects that are closer to it than any other group.”
Yes it did and assigned a group to the one object. I found this unsatisfactory given that all my other centroids represented 6+ objects.
“Not sure what Monte-Carlo process you are referring to”
I ran the PCC and Monte-Carlo (MCOA) routine in a heirachial clustering analysis of the imported group centroids derived from non-heirachial analysis. That’s alright is it not?
“The non-hierarchical process in PATN v3+ uses random seeds (see the help for the complete process). While this sounds odd, it works fine in practice.”
Yes I understand this and happy about it.
RE: CANOCO – I agree with your sentiment but I’m not a statistician and thus get a bit lost in the maths. I like the idea that environmental gradients should be examined by the species variables alone (in this case) rather than integrate other factors as intrinsics. It seems a bit odd to culminate a variety of measures into one analysis – bias could prevail (unless the data is standardised – but there could even be problems there, right?)July 28, 2008 at 5:15 am #527
Once outliers are identified, they disappear as a problem. Admitted, you need to get rid of them for ordination to be most effective.
Re Monte-Carlo, ok. I should have referred to ‘MCAO’ as a permutation test rather than MC. Oh well, it’s in the system now.
Re CANOCO, it is up to you in PATN what variables you use in an ordination (SSH) – the intrinsic variables and what you may want to use for ‘environmental variables’ extrinsic variables. Standardizing variables is a serious issue in itself. Abundance data usually needs some transformation, but not standardization. Environmental data is the opposite.
- You must be logged in to reply to this topic.