Cluster
Analysis
Cluster
analysis is an exploratory procedure that is used to identify groups of similar
objects (e.g. people, stimuli, books, singers, etc.) in a large collection of
objects. The identified groups have members that are similar to each other and
different from the members in other groups. The approach is similar in spirit
to MDS, but produces discrete groups without any spatial representation. The
identified groups can be used in subsequent analyses.
K-
mean
This
method of clustering is very different from the hierarchical clustering and
Ward method, which are applied when there is no prior knowledge of how many
clusters there may be or what they are characterized by. K-means clustering is
used when you already have hypotheses concerning the number of clusters in your
cases or variables. You may want to ‘tell’ the computer to form exactly three
clusters that are to be as distinct as possible. This is the type of research
question that can be addressed by the k-means clustering algorithm. In general,
the k-means method will produce the exact k different clusters demanded of
greatest possible distinction. Very frequently, both the hierarchical and the
k-means techniques are used successively. The former (Ward’s method) is used to
get some sense of the possible number of clusters and the way they merge as
seen from the dendrogram. Then the clustering is rerun with only a chosen
optimum number in which to place all the cases (k means clustering). One of the
biggest problems with cluster analysis is identifying the optimum number of clusters.
As the fusion process continues, increasingly dissimilar clusters must be
fused, i.e. the classification becomes increasingly artificial. Deciding upon
the optimum number of clusters is largely subjective, although looking at a
dendrogram may help. Clusters are interpreted solely in terms of the variables
included in them. Clusters should also contain at least four elements. Once we
drop to three or two elements it ceases to be meaningful.
Exhibit
to finding out K-mean.
Hierarchical
The
hierarchical algorithms result in a tree-like dendrogram.
·
At
the top of the tree each observation is represented as a separated “cluster”.
·
At
intermediate levels observations are grouped into fewer “cluster” than at the
higher levels.
·
At the bottom, all of the observations are
merged into one “cluster”.
·
In
some problems, entire tree structure may be of interest.
·
In
others, tree is just a convenient tool for obtaining a partition.
·
This
is done by cutting the tree at a suitable level which forces a particular
partition.
·
Some
hierarchical algorithms form the tree from the
bottom
up in a divisive fashion, but most work
agglomeratively
from the top down.
Exhibit of
dendrogram created from hierarchical algorithms.
No comments:
Post a Comment