Tuesday, September 4, 2012


Day 2 - Team F

The day started with one more practice example to solve the hypothesis developed on the given data of retail store satisfaction levels. 

We were also taught that the knowledge of statistics should only be used as a guideline for decision-making as there could be lot of spurious data and sometimes, the respondents may rely on their intuition or gut-feel to give their responses. To avoid the ‘neutral’ responses, we may do well if we design the scale responses into 4 points instead of 5 points. Also, there is a case that if we consider the data entirely (i.e. in a global manner) positive and negative responses may cancel out. So, the implications derived may be misleading. Here, we can consider only the sectional or local data to derive specific and accurate implications.

Then the new concept of ‘Clustering’ was introduced to us. Clustering is divided into two types:
Hierarchical and non-hierarchical.

Hierarchical clustering is divided into further two types: Divisive and Agglomerative. In divisive clustering, from one big cluster, new clusters emerge by the process of segregation and categorization. In agglomerative clustering, many clusters merge to form one cluster by step-by-step clusterization of the different pairs of clusters depending on the similarity between the clusters. We use hierarchical clustering if the objects are less than 50

We use the method of ‘k-means’ in non-hierarchical clustering. We use this method if the objects are more than 50.

The example of clustering used practically is ‘proportionate hazard analysis’. This method is widely used in hospitals in order to forecast when the hospital room is available. It is also used in the banking sector to estimate credit default.

Clustering process consists of three steps: First step is ‘the selection of process’. It depends on the objective of clustering. The next step is the ‘distance measurement’. Here, different techniques can be used like correlation, probability, averages, etc. The last step is the ‘clustering criteria’.

The measurement of distance is always between two objects. If two objects merge to become one cluster, then that cluster is now considered as one new object. Distances are now measured between two clusters or between cluster and an object.

Some of the important distance measurement types are internal, count and binary. Other types of distances are ‘city block distance’ and ‘Euclidean distance’.

The ‘dendogram’ graphically shows how the clustering is done. It also tells how many clusters are there. It is the critical element of the hierarchical process.

-Rohan Kulkarni

No comments:

Post a Comment