beta values for UPGMA

This topic is empty.

Viewing 6 posts - 1 through 6 (of 6 total)

Author
Posts
April 7, 2005 at 6:45 am #403
jbruhl
Member
Dear Lee and Co.
A general recommendation over the years, including recently by Dan Faith, is to use a beta value of -0.25, but I notice that the default is -0.1.
Lee, can you please comment on the choice of default value and on the relative merit of -0.25?
Cheers
Jeremy
April 8, 2005 at 1:03 pm #459
lee
Keymaster
Hi Jeremy
When Godfrey Lance and Bill Williams developed their combinatorial algorithm, they applied their beta value to what they called flexible WPGMA (weighted pair-group using ArithMetic Averaging). This clustering strategy weights GROUPS equally. This implies that objects are weighted differently during the fusion process. Using this approach, they tended not to like what they called ‘chaining’, the situation where a joins b, then c joins a and b and then d joins a, b and c and so on. It just made the dendrogram difficult to interpret. They liked the ‘tidier’ groups when beta was set to -0.25 (as did many others!). This value effectively dilates the data space making groups separate from each other as fusion progression (like what is happening now with stars).
I prefer ‘reality’.
Flexible UPGMA (developed by myself, Dan Faith and Glenn Milligan) is a weighted pair group counterpart to flexible WPGMA but the former weights objects equally through the fusion process (group weighting changes). The beta values in the two approaches are not totally equivalent. Simulation studies (with seriously complex, but known data that there is no space to elaborate on) suggested that a beta value of -0.1 best recovers known groupings.
Decreasing the beta value will tend to make the groups more equal in size. While this makes a dendrogram easier to interpret, the cost is a greater probability of missclassifications. Using a beta value of -0.1 is conservative. I would not however default to values as low as -0.25 by default.
The negative beta value does have the effect of neatly countering any underestimation of association between distant objects. See my SSH algorithm for more on this issue.
Does that help?
Lee
April 10, 2005 at 12:02 am #460
jbruhl
Member
Thanks, Lee. Your reply is most helpful, and I wonder whether I was misquoting Dan in my previous email (gosh perhaps it was Les or Mike).
OK, I see your argument for preferring -0.1 over -0.25, but I guess one would have more confidence in the pattern if recovering the same pattern in both cases, agree? And, much moreso if both phenograms agreed with an SSH ordination of the same data, right.
Though, I do wonder what you would do, Lee, if you had a -0.25 phenogram agree in detail with an SSH ordination but somewhat disagree with a -0.1 phenogram. I would still go in favour of the congruence between analyses. How about you (or is this to hypthetical for your liking)?
Cheers
Jeremy
April 15, 2005 at 5:30 am #461
lee
Keymaster
Hi Jeremy
Sorry for the delay. I’ve been up in the Tasmanian mountains all week.
Yes, I would agree that ‘congruence’ between a SSH result an a classification would be comforting. This is not necessarily easy to accomplish though. ‘Visual’ checks maybe ok on very small datasets. Classifying a (Euclidean) ultrametric matrix from SSH and comparing it with the original classification has its complications even so. For example, you would need to check how well each point has been handled by SSH. In comparing classification with SSH, I’d normally expect the classification to be more robust, unless the stress was VERY low (<~0.05).
At the moment, PATN doesn’t produce ultrametrics or an individual stress breakdown, but we could.
Congruence between beta=-0.1 and beta=-0.25 would be nice!
There is no doubt that situations will occur where higher negative beta values will produce a ‘better’ result, even if you know what truth is.
I will take another look at beta using simulation and see what I come up with. Any other user feedback on this issue would be warmly welcomed.
Lee
April 29, 2005 at 2:48 am #464
Derek Johnson
Member
Hi Lee and Jeremy (and others)
I’m glad there has been some discussion on beta values. I often try out extreme beta settings (plus some in between) to see how the dendrograms re-arrange themselves, then look for a congruence between them and the ordination, but I’m only working with small data sets. I tried switching to groups to compare the two types of analyses and this made it easier. The only minor difficulty was comparing the allocated names of the groups. The dendrogram uses the farthest branch for each group name, but the SSH uses consecutive numbers. If it used (as an option) the label closest to the centroid instead, it might be a bit easier to compare to the dendrogram group names. I’m not sure – just a thought.
Derek.
May 15, 2005 at 11:24 pm #465
lee
Keymaster
Hi Guys
I have been pondering what Derek said about groups. I don’t think I understand the issue. The groups are defined either by the dendrogram ordering or by non-hierarchical classification. In the case of the dendrogram, the group labelling follows a simple algorithm – starting from the top of the dendrogram (the right side in PATN), the group containing the lowest sequenced object (the highest row or leftmost column in the Data Table) is ‘rotated’ to the top of the dendrogram. The process is repeated down the dendrogram. Group ‘1’ will therefore always contain the object in row 1 or the variable in column 1.
Variations in classification strategy (changing association measure or beta value, or adding ro semoving a few objects for example) should produce groups that are similar in definition, but there is less guarantee the more radical the change.
Non-hierarchical clasification will produce groups that are also generally int the order of the sequence of the rows. Object 1 is likely (but not guaranteed) to be in group ‘1’.
Once groups are defined, PATN maintains their definition. In SSH (and all other post-classification options), the group numbers displayed will be those as defined by classification.
Lee
Author
Posts

Viewing 6 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic.

PATN - Finding Patterns in Data