Monday, September 10, 2012

Team F (Rachit Duggad)


On the Day-4 of Business Analytics class we were taught Hierarchical Clustering, Hierarchical clustering is a widely used data analysis tool. The idea is to build a binary tree of the data that successively merges similar groups of points. Visualizing this tree provides a useful summary of the data.

In data mining, hierarchical clustering is a method of cluster analysis which seeks to build a hierarchy of clusters. Strategies for hierarchical clustering generally fall into two types:
  • Agglomerative: This is a "bottom up" approach: each observation starts in its own cluster, and pairs of clusters are merged as one moves up the hierarchy.
  • Divisive: This is a "top down" approach: all observations start in one cluster, and splits are performed recursively as one moves down the hierarchy.

Hierarchical vs. k-means clustering:

Ø  Recall that k-means requires
• A number of clusters k
• An initial assignment of data to clusters
• A distance measure between data d(xn, xm)

Ø  Hierarchical clustering only requires a measure of similarity between groups of data points.

Agglomerative Clustering:
• Each level of the resulting tree is a segmentation of the data
• The algorithm results in a sequence of groupings
• It is up to the user to choose a “natural” clustering from this sequence

Dendogram:
• Agglomerative clustering is monotonic
• The similarity between merged clusters is monotone decreasing with the level of the merge.
• Dendrogram: Plot each merge at the (negative) similarity between the two merged groups
• Provides an interpretable visualization of the algorithm and data
• Useful summarization tool, part of why hierarchical clustering is popular

 From - 
Rachit Duggad
Team F

No comments:

Post a Comment