Monday, September 10, 2012

DAY:4 Team:D VINOD JOSHI


DAY: 4 –Team: D-VINOD JOSHI

K-Means Clustering?
Simply saying it is an algorithm to classify or to group your objects based on attributes/features into K Number of group. K is positive integer number. The grouping is done by minimizing the sum of squares of distances between data and the corresponding cluster centroid. Thus, the purpose of K-mean clustering is to classify the data. At times one can find outlier that is only 1 or 2 people in a cluster and we have to do validation exercise of the outlier to see if it is genuine and then we select data cases in such a way to eliminate the outliers.
E.g. Select Monthly expenditure <600. To recognise the outliers we use Box Plot Graph. Box Plot Graph is a convenient way of graphically depicting groups of numerical data through their five-number summaries: the smallest observation (sample minimum), lower quartile (Q1), median (Q2), upper quartile (Q3), and largest observation (sample maximum). A box plot may also indicate which observations, if any, might be considered outliers.


 Hierarchical
  The hierarchical algorithms result in a tree-like dendrogram.
·         At the top of the tree each observation is represented as a separated “cluster”.
·         At intermediate levels observations are grouped into fewer “cluster” than at the higher levels.
·          At the bottom, all of the observations are merged into one “cluster”.
·         In some problems, entire tree structure may be of interest.
·         In others, tree is just a convenient tool for obtaining a partition.
·         This is done by cutting the tree at a suitable level which forces a particular partition.
·         Some hierarchical algorithms form the tree from the bottom up in a divisive fashion, but most     work agglomeratively from the top down.

Dendogram:
• Agglomerative clustering is monotonic
• The similarity between merged clusters is monotone decreasing with the level of the merge.
• Dendrogram: Plot each merge at the (negative) similarity between the two merged groups
• Provides an interpretable visualization of the algorithm and data




No comments:

Post a Comment