Day 2 - Team F
The
day started with one more practice example to solve the hypothesis developed on
the given data of retail store satisfaction levels.
We
were also taught that the knowledge of statistics should only be used as a
guideline for decision-making as there could be lot of spurious data and
sometimes, the respondents may rely on their intuition or gut-feel to give
their responses. To avoid the ‘neutral’ responses, we may do well if we design
the scale responses into 4 points instead of 5 points. Also, there is a case
that if we consider the data entirely (i.e. in a global manner) positive and
negative responses may cancel out. So, the implications derived may be
misleading. Here, we can consider only the sectional or local data to derive
specific and accurate implications.
Then
the new concept of ‘Clustering’ was introduced to us. Clustering is divided
into two types:
Hierarchical
and non-hierarchical.
Hierarchical
clustering is divided into further two types: Divisive and Agglomerative. In
divisive clustering, from one big cluster, new clusters emerge by the process
of segregation and categorization. In agglomerative clustering, many clusters
merge to form one cluster by step-by-step clusterization of the different pairs
of clusters depending on the similarity between the clusters. We use
hierarchical clustering if the objects are less than 50
We
use the method of ‘k-means’ in non-hierarchical clustering. We use this method
if the objects are more than 50.
The
example of clustering used practically is ‘proportionate hazard analysis’. This
method is widely used in hospitals in order to forecast when the hospital room
is available. It is also used in the banking sector to estimate credit default.
Clustering
process consists of three steps: First step is ‘the selection of process’. It
depends on the objective of clustering. The next step is the ‘distance
measurement’. Here, different techniques can be used like correlation,
probability, averages, etc. The last step is the ‘clustering criteria’.
The
measurement of distance is always between two objects. If two objects merge to
become one cluster, then that cluster is now considered as one new object.
Distances are now measured between two clusters or between cluster and an
object.
Some
of the important distance measurement types are internal, count and binary.
Other types of distances are ‘city block distance’ and ‘Euclidean distance’.
The
‘dendogram’ graphically shows how the clustering is done. It also tells how
many clusters are there. It is the critical element of the hierarchical
process.
-Rohan
Kulkarni
No comments:
Post a Comment