For a recent working paper, I did some cluster analyses. Plural, my friend, because there's not one way. Before reading further, please understand that cluster analysis is an explorative, non-inferential method.
Problem
The Stata manual says it all: "Some researchers claim that there are as many clustering methods as there are researchers. This is untrue, there are many more methods than researchers." The wording may not be exact, but I agree with the statement. In this post, I will address some difficulties.
Issues
- Linkage method: there are five linkage methods, which define how cases get grouped: which distances to look after. There is no optimal method: single linkage causes linking patterns (one cases after another joining the same cluster), average linkage and ward's linkage are sensitive to outliers, ward's linkage distance measurement is not possible to interprete and centroid linkage will even refuse to return dendrograms.
- Similarity distance measurement: there are many distance measurements: simple, Euclidian, city block, Mahalanobis, ... Which one to choose? There's no truth.
- Cases: excluding cases may skew central values of clusters, resulting in wrongly added cases
- Order: in two-step cluster analysis, some pre-clustering is done before tackling the full data set (because it may be too large). Shuffle your deck and you'll get different results.
- Variables: ideally you cluster orthogonal factors, still, and in the general case, it doesn't mean that each variable/factor is equally important for clustering. They will all have the same a priori impact on clustering though.
- Succes garanteed: you'll always get a result. Does that mean you found something? No.
- Stopping rules: there are many rules to determine the optimal number of clusters (see Mulligan, 1985). Calinski is default in Stata. There are some issues though, such as multiple optima or none at all. And does it make sense to choose a 23 cluster solution?
Conclusion
Cluster analysis is a wonderful way to reduce and explore data. However, I would recommend to experiment with different ways and keep my hands of it if they bear no similarity. Heck, a cluster analysis of outcomes would be useful!
Judgement: not to be trusted