Monday, September 10, 2012

Team F ( Pixy Raina Balmuchu )


Cluster Analysis
Cluster analysis is an exploratory procedure that is used to identify groups of similar objects (e.g. people, stimuli, books, singers, etc.) in a large collection of objects. The identified groups have members that are similar to each other and different from the members in other groups. The approach is similar in spirit to MDS, but produces discrete groups without any spatial representation. The identified groups can be used in subsequent analyses.
K- mean
This method of clustering is very different from the hierarchical clustering and Ward method, which are applied when there is no prior knowledge of how many clusters there may be or what they are characterized by. K-means clustering is used when you already have hypotheses concerning the number of clusters in your cases or variables. You may want to ‘tell’ the computer to form exactly three clusters that are to be as distinct as possible. This is the type of research question that can be addressed by the k-means clustering algorithm. In general, the k-means method will produce the exact k different clusters demanded of greatest possible distinction. Very frequently, both the hierarchical and the k-means techniques are used successively. The former (Ward’s method) is used to get some sense of the possible number of clusters and the way they merge as seen from the dendrogram. Then the clustering is rerun with only a chosen optimum number in which to place all the cases (k means clustering). One of the biggest problems with cluster analysis is identifying the optimum number of clusters. As the fusion process continues, increasingly dissimilar clusters must be fused, i.e. the classification becomes increasingly artificial. Deciding upon the optimum number of clusters is largely subjective, although looking at a dendrogram may help. Clusters are interpreted solely in terms of the variables included in them. Clusters should also contain at least four elements. Once we drop to three or two elements it ceases to be meaningful.







Exhibit to finding out K-mean.




Hierarchical
The hierarchical algorithms result in a tree-like dendrogram.
·         At the top of the tree each observation is represented as a separated “cluster”.
·         At intermediate levels observations are grouped into fewer “cluster” than at the higher levels.
·          At the bottom, all of the observations are merged into one “cluster”.
·         In some problems, entire tree structure may be of interest.
·         In others, tree is just a convenient tool for obtaining a partition.
·         This is done by cutting the tree at a suitable level which forces a particular partition.
·         Some hierarchical algorithms form the tree from the
bottom up in a divisive fashion, but most work
agglomeratively from the top down.

Exhibit of dendrogram created from hierarchical algorithms.
              

No comments:

Post a Comment