Thursday, September 6, 2012

Day 3 - Team A

On Day 3 we learned about Clustering.

There are 2 type of Clustering

1) Hierarchical

2) Non-hierarchical
 
We focused on Hierarchical Clustering. It is used when the number of variables under consideration is less than 50 in number. It involves grouping of various variables on the basis of certain commonality, analysing it and creating a story out of it. It help marketeers in product bundling to maximise profits and market penetration. 
 
The critical factor while considering the clustering is Dendogram. It is a branching diagram representing a hierarchy of categories based on degree of similarity or number of shared characteristics. 
To access the Hierarchical Cluster in SPSS go to

SPSS > ANALYZE > CLASSIFY > HIERARCHICAL CLUSTER
 

Select the STATISTICS option > Check AGGLOMERATION
 
 
Click CONTINUE and Proceed to PLOT option and make sure to check DENDOGRAM
 
 
For business purpose we use INTERVAL or BINARY.
INTERVAL - we use Euclidean Distance
BINARY - we use either Jaccard or Simple Matching.
 
After this Click Continue.....Dendogram is obtained in Output File.
 
Statistical Method of determining how many clusters exist:

A cluster ends when the next object is at a distance relatively larger than the ones within the group.
Another way to differentiate between clusters is to draw a vertical line on the Dendogram based on how large the distance is when a new object joins to form a cluster. The line should be drawn where the distance is maximum.
 
We also learnt about the Jaccard Index:
 
JI = Yes Matches/(Total Matches- No Matches)
 
Another technique we learnerd today was about OLAP - Online Analytical Processing:
 
OLAP cube is the representation of the data in a meaningful way to study and analyze. It is characterized by dynamic multi-dimensional analysis of consolidated enterprise data.
 
 
 

 
 
 
 
 
 
 
 
 

 

No comments:

Post a Comment