On the
Day-4 of Business Analytics class we were taught Hierarchical Clustering, Hierarchical clustering is a widely used data
analysis tool. The idea is to build a binary tree of the data that successively
merges similar groups of points. Visualizing this tree provides a useful
summary of the data.
In data mining, hierarchical clustering is a method of cluster analysis which seeks to build a hierarchy of clusters. Strategies for
hierarchical clustering generally fall into two types:
- Agglomerative: This is a "bottom up" approach: each observation starts
in its own cluster, and pairs of clusters are merged as one moves up the
hierarchy.
- Divisive: This is a "top down" approach: all observations start
in one cluster, and splits are performed recursively as one moves down the
hierarchy.
Hierarchical
vs. k-means clustering:
Ø Recall that k-means requires
• A number of clusters k
• An initial assignment of data to clusters
• A distance measure between data d(xn, xm)
Ø Hierarchical clustering only
requires a measure of similarity between groups of data points.
Agglomerative
Clustering:
• Each level of the resulting tree is a
segmentation of the data
• The algorithm results in a sequence of
groupings
• It is up to the user to choose a “natural” clustering
from this sequence
Dendogram:
• Agglomerative clustering is monotonic
• The similarity between merged clusters is
monotone decreasing with the level of the merge.
• Dendrogram: Plot each merge at the
(negative) similarity between the two merged groups
• Provides an interpretable visualization of
the algorithm and data
• Useful summarization tool, part of why
hierarchical clustering is popular
From -
No comments:
Post a Comment