As my teammate Rohit
had already mentioned in his previous blog about Hierarchical clusters, we now
introduce the critical factor in this type of clustering and our first key
learning – The Dendrogram.
A dendrogram is
basically a visual representation of correlated data. The distance measure between two clusters can
be calculated as:
D= 1 – C
where, D = distance
& C= correlation between spot clusters
For highly correlated
spots, they have a correlation close to 1, hence the distance becomes very
close to zero. Therefore, highly correlated clusters are nearer the left end of
the dendrogram and as we move away, the clusters get bigger.
Finding the number of
clusters within a dendrogram:
Now as we all know,
cutting a dendrogram at a certain point gives a set of clusters. This brings us
to our second key learning. Where should we cut the dendrogram?
Strictly from a
theoretical point of view there is no definitive answer to this since cluster
analysis is essentially an exploratory approach and the interpretation of the
resulting hierarchical structure is entirely context dependent. What we learnt in
class was drawing a cut off by looking at the Agglomeration Schedule.
What we do in this
technique is we see the difference between the values of the coefficients as
shown in the figure above. Here we notice that for the first three the
difference is very close to one another, i.e. it is not pronounced. However,
when we proceed on to the fourth one we see a vast change in the difference of
the coefficients. So we enter our cutoff at this point.
The method of analysis
adopted today was the Jaccard method and calculated the Jaccard Index:
JI = Yes Matches/ (Total Matches- No Matches)
OLAP Cubes
The third key learning
in class today was On Line Analytic Process or OLAP Cubes. It is a method of
storing data in a multidimensional form generally for reporting purposes. The
parameters which are followed are Summary variables and Grouping variables.
Using this technique which incidentally is very user friendly and convenient we
can build up a story and arrive at our desired hypothesis.
Using this technique we
applied it in configuring how to bundle different Value Added Services from the
viewpoint of mobile service providers.
References:
By: Trilochan Pariyar, Team C
No comments:
Post a Comment