Business Analytics Workshop SIBM 2011 Marketing : Team F (Rachit Duggad)

Monday, September 10, 2012

Team F (Rachit Duggad)

On the Day-4 of Business Analytics class we were taught Hierarchical Clustering, Hierarchical clustering is a widely used data analysis tool. The idea is to build a binary tree of the data that successively merges similar groups of points. Visualizing this tree provides a useful summary of the data.

In data mining, hierarchical clustering is a method of cluster analysis which seeks to build a hierarchy of clusters. Strategies for hierarchical clustering generally fall into two types:

Agglomerative: This is a "bottom up" approach: each observation starts in its own cluster, and pairs of clusters are merged as one moves up the hierarchy.
Divisive: This is a "top down" approach: all observations start in one cluster, and splits are performed recursively as one moves down the hierarchy.

Hierarchical vs. k-means clustering:

Ø Recall that k-means requires

• A number of clusters k

• An initial assignment of data to clusters

• A distance measure between data d(xn, xm)

Ø Hierarchical clustering only requires a measure of similarity between groups of data points.

Agglomerative Clustering:

• Each level of the resulting tree is a segmentation of the data

• The algorithm results in a sequence of groupings

• It is up to the user to choose a “natural” clustering from this sequence

Dendogram:

• Agglomerative clustering is monotonic

• The similarity between merged clusters is monotone decreasing with the level of the merge.

• Dendrogram: Plot each merge at the (negative) similarity between the two merged groups

• Provides an interpretable visualization of the algorithm and data

• Useful summarization tool, part of why hierarchical clustering is popular

From -

Rachit Duggad
Team F

Business Analytics Workshop SIBM 2011 Marketing

Monday, September 10, 2012

Team F (Rachit Duggad)

No comments:

Post a Comment