Re: Beta values


Hi Jeremy

When Godfrey Lance and Bill Williams developed their combinatorial algorithm, they applied their beta value to what they called flexible WPGMA (weighted pair-group using ArithMetic Averaging). This clustering strategy weights GROUPS equally. This implies that objects are weighted differently during the fusion process. Using this approach, they tended not to like what they called ‘chaining’, the situation where a joins b, then c joins a and b and then d joins a, b and c and so on. It just made the dendrogram difficult to interpret. They liked the ‘tidier’ groups when beta was set to -0.25 (as did many others!). This value effectively dilates the data space making groups separate from each other as fusion progression (like what is happening now with stars).

I prefer ‘reality’.

Flexible UPGMA (developed by myself, Dan Faith and Glenn Milligan) is a weighted pair group counterpart to flexible WPGMA but the former weights objects equally through the fusion process (group weighting changes). The beta values in the two approaches are not totally equivalent. Simulation studies (with seriously complex, but known data that there is no space to elaborate on) suggested that a beta value of -0.1 best recovers known groupings.

Decreasing the beta value will tend to make the groups more equal in size. While this makes a dendrogram easier to interpret, the cost is a greater probability of missclassifications. Using a beta value of -0.1 is conservative. I would not however default to values as low as -0.25 by default.

The negative beta value does have the effect of neatly countering any underestimation of association between distant objects. See my SSH algorithm for more on this issue.

Does that help?