Wednesday, September 5, 2012

Day 3 of Business Analytics: Team C



As my teammate Rohit had already mentioned in his previous blog about Hierarchical clusters, we now introduce the critical factor in this type of clustering and our first key learning – The Dendrogram.

A dendrogram is basically a visual representation of correlated data.  The distance measure between two clusters can be calculated as:
                                                            D= 1 – C
where, D = distance & C= correlation between spot clusters
For highly correlated spots, they have a correlation close to 1, hence the distance becomes very close to zero. Therefore, highly correlated clusters are nearer the left end of the dendrogram and as we move away, the clusters get bigger.
Finding the number of clusters within a dendrogram:
Now as we all know, cutting a dendrogram at a certain point gives a set of clusters. This brings us to our second key learning. Where should we cut the dendrogram?
Strictly from a theoretical point of view there is no definitive answer to this since cluster analysis is essentially an exploratory approach and the interpretation of the resulting hierarchical structure is entirely context dependent. What we learnt in class was drawing a cut off by looking at the Agglomeration Schedule.

What we do in this technique is we see the difference between the values of the coefficients as shown in the figure above. Here we notice that for the first three the difference is very close to one another, i.e. it is not pronounced. However, when we proceed on to the fourth one we see a vast change in the difference of the coefficients. So we enter our cutoff at this point.
The method of analysis adopted today was the Jaccard method and calculated the Jaccard Index:

                                    JI =  Yes Matches/ (Total Matches- No Matches)

OLAP Cubes
The third key learning in class today was On Line Analytic Process or OLAP Cubes. It is a method of storing data in a multidimensional form generally for reporting purposes. The parameters which are followed are Summary variables and Grouping variables. Using this technique which incidentally is very user friendly and convenient we can build up a story and arrive at our desired hypothesis.


Using this technique we applied it in configuring how to bundle different Value Added Services from the viewpoint of mobile service providers.

References:

By: Trilochan Pariyar, Team C



No comments:

Post a Comment