- This topic is empty.
- August 9, 2005 at 6:10 am #409MarcMember
Hi PATN People,
I wonder if I could just ask some advice on how to reduce a large dataset? I have 2100 vegetation sites by 730 species and would like to carry out a non-hierarchical cluster analysis and then further analyse groups of interest.
I understand how to push the buttons and get PATN to group the data, and view box-whisker plots but I don’t have a good grasp on how to proceed from here.
Sometimes I think I’m getting the hang of it, and it’s a real buzz, but then I come to a grinding holt. I don’t have a detailed background in this kind of thing but I have read heaps and I’m really enjoying it. Just having a bit of trouble put the whole process together.
Does anyone know of some literature with basic worked examples, or have any other suggestions?
A little bit of guidance would be appreciated.
MarcAugust 10, 2005 at 5:19 am #476leeKeymaster
I was hoping some of the uers would chime in on this. I’d be interested to see what steps they would take. By the way, I have another case study (eco data) to be released with v3.04.
The most important outcome from PATN is to name the groups. Box and Whisker plots are most helpful for this. I’d consider using two-step association on your species (and name the resulting groups).
I’d then use a two-way table to help confirm your suspicions. Enough or too many groups?
I usually find that I have to iterate through sites and more likely species to reduce noise. B&W and species classification is basic for this.
What about environment data and correlates? What’s driving the variation between the groups? This is basic.
These are just a few ideas for you to kick around.
LeeAugust 15, 2005 at 1:30 am #477MarcMember
I have been exploring my data and trying your suggestions. The analysis hasn’t always been easy to interpret so I have returned to my raw data and tried to reduce the noise. I have removing the annual species to partially avoid the influence of seasonality between surveys.
Things are a bit clearer but as you suggest there the process is iterative and so still is lots to do adn learn. I look forward to the eco data in v3.04.
MarcNovember 9, 2005 at 12:54 am #485leeKeymaster
Again, I was hoping that some of the talented PATN users out there would come out of the woodwork and offer some of their experiences. I am certainly not the font of wisdom, just another view.
OK, on this size dataset non-hierarchical clustering is appropriate. Reducing the complexity by classification is what PATN is about (among other things). Run the classification, label the groups by looking at the box and whisker and then focus in on groups of interest by eliminating the objects in other groups in teh Data Table (first save the analysis though!). See other postings I did yesterday on this.
Depending on the size of the dataset, run non-hierarchical or hierarchical (as the latter will give substructure), ordinate and then analyze the results. The ordination plot is the result to aim at with PATN V3 as there is so much that you can do to help to understand your data with this display-base.
There is not any really good literature I can point you at sad to say. This advice that I seem to offer regulalry is forcing me to consider writing a book myself.
Hope that helps,
- You must be logged in to reply to this topic.