Wednesday, September 5, 2012

Day 3 - Team H


Non-Hierarchical Clustering

K-Means- It is clustering technique in which the number of expected cluster is initially specified to obtain the desired level of clustering. The required number of clusters can either be obtained through hierarchical clustering.
How to use Dendogram-


* * * * * * H I E R A R C H I C A L  C L U S T E R   A N A L Y S I S * * * * * *


 Dendrogram using Average Linkage (Between Groups)

                         Rescaled Distance Cluster Combine

    C A S E      0         5        10        15        20        25
  Label     Num  +---------+---------+---------+---------+---------+

  billsms     4   òûòø
  billothr    5   ò÷ ùòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòø
  billfix     2   òûò÷                                             ó
  billtalk    3   ò÷                                               ó
  mntspend    1   òòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòò÷

To prepare a Dendogram we need following inputs-
A)     Name of the variable/Cases
B)      Measure-
a.        Interval- Eucilidean
b.      Binary- Eucilidean/Jaccard
c.       Count
After giving the above input a Dendogram similar to one shown in the above figure gets generated and the number of clusters is calculated by setting a cut-off line at the desired iteration level. Generally, it is drawn at a point where next item to combine is relatively long distance.

How to calculate Jaccard distance-
Jaccard distance depends on the number of similar values the two variables to be analyzed have, for e.g.- If A & B are two variables and the possible binary values they can have be YES & NO then the Jaccard distance between A & B can be calculated using the formula:
(Number of YES matches/the total number of response) -  NO matches  
So, the more similar the variation in the variables is there the lesser is the Jaccard distance.

How to calculate Eucilidean distance-
There are number of ways to calculate Euclidean distance, one of them is average method. In average method we calculate the distance between various points in the two clusters, then all the distances are averaged, and that distance is the Euclidean distance.

Author-
Kuldeep R.

No comments:

Post a Comment