Thursday, September 6, 2012

DAY 2 - TEAM G

Analytics – A vast ocean of knowledge mining, where we have entered. In today’s world, there is no shortage for data, only for smart people who could decipher the nuances from the available data! 

On day 2 of the Business Analytics Workshop at SIBM Bangalore by Team nmore, we enter the domains of cluster analysis. Cluster analysis is that which classifies data into small number of mutually exclusive and exhaustive groups, ensuring that there will be much likeness within groups and as much difference among groups as possible. A typical use of cluster analysis is to facilitate market segmentation by identifying subjects or individuals who have similar needs, lifestyles, or responses to marketing strategies.
The basic methods of clustering used in computer packages are of two types:
a)      Hierarchical clustering/Linkage methods
b)      Non-hierarchical clustering/Nodal methods
The first method does not take input from us as to how many clusters are to be formed. A range of solutions is provided by the computer, from a 1-cluster solution to an n-cluster solution (n being the number of objects in the study). The second method takes an input from us about the number of clusters to be formed. K-means method belongs to the non-hierarchical method of clustering.
Note: Hierarchical clustering is used when the number of objects is less than 50 and non-hierarchical method used when the number of objects is greater than 50. This is done so that the total number of clusters is limited in number.
In each case, a distance measure has to be calculated for distances between objects/nodes being clustered. One of the most commonly used measures is Euclidean distance.
Generally, interval-scaled variables are ideally suited for cluster analysis. Continuous or ratio-scaled variables can also be used but the instances are rarer. Standardisation may be necessary if the units of measurement of different variables are widely different from each other.
The clustering process consists of the following 3 basic steps:
a)      Selection of variables
b)      Distance Measurement
c)       Clustering criteria
* Clustering criteria tells us how to measure the distance among objects & objects, clusters & clusters and objects & clusters.
Distance dissimilarity Measures:
a)      Interval data – Euclidean distance, Squared Euclidean distance, Chebychev, block, Minkowski or customised
b)      Count data – Chi-square measure or phi-square measure
c)       Binary data - Euclidean distance, Squared Euclidean distance, size difference, variance, shape or Lance and Williams. [Enter values for Present and Absent to specify which two values are meaningful]
Distance measurement:
a)      Nearest neighbour – In this method, the distance between the two nearest objects in two different clusters are considered as the distance
b)      Farthest neighbour – In this method, the distance between the two farthest objects in two different clusters are considered as the distance
c)       Centroid clustering – In this method, the distance between the centroids of each cluster is taken as the distance
There are other methods available for distance measurement, but we do not go beyond the above 3 methods for business analytics.
Dendrogram – It is the pictorial representation of the clustering process. It can be used to assess the cohesiveness of the clusters formed and can provide information about the appropriate number of clusters to keep.

References:
·         Nargundkar,R. Marketing Research – Text and Cases. Third Edition
·         Zikmund,W. Business Research Methods. Seventh Edition


Author:
Anand Chandran

No comments:

Post a Comment