Tuesday, September 4, 2012

Day:2 Team :D


                                                                Team:D
 Cluster Analysis
Cluster analysis is an exploratory data analysis tool for solving classification problems.  Its object is to sort cases (people, things, events, etc) into groups, or clusters, so that the degree of association is strong between members of the same cluster and weak between members of different clusters.  Each cluster thus describes, in terms of the data collected, the class to which its members belong; and this description may be abstracted through use from the particular to the general class or type.
Cluster analysis is thus a tool of discovery.  It may reveal associations and structure in data which, though not previously evident, nevertheless are sensible and useful once found.  The results of cluster analysis may contribute to the definition of a formal classification scheme, such as a taxonomy for related animals, insects or plants; or suggest statistical models with which to describe populations; or indicate rules for assigning new cases to classes for identification and diagnostic purposes; or provide measures of definition, size and change in what previously were only broad concepts; or find exemplars to represent classes.






Cluster analysis is the statistical method of partitioning a sample into homogeneous classes to produce an operational classification.  Such a classification may help:
  • Formulate hypotheses concerning the origin of the sample, e.g. In evolution studies.
  • Describe a sample in terms of a typology, e.g. For market analysis or administrative purposes.
  • Predict the future behavior of population types, e.g. In modeling economic prospects for different industry sectors.
  • Optimize functional processes, e.g. Business site locations or product design.
  • Assist in identification, e.g. in diagnosing diseases.
  • Measure the different effects of treatments on classes within the population, e.g. With analysis of variance.
Chi-Square Test
Chi-square is a statistical test commonly used to compare observed data with data we would expect to obtain according to a specific hypothesis. For example, if, according to Mendel's laws, you expected 10 of 20 offspring from a cross to be male and the actual observed number was 8 males, then you might want to know about the "goodness to fit" between the observed and expected. Were the deviations (differences between observed and expected) the result of chance, or were they due to other factors. How much deviation can occur before you, the investigator, must conclude that something other than chance is at work, causing the observed to differ from the expected? The chi-square test is always testing what scientists call the null hypothesis, which states that there is no significant difference between the expected and observed result
Chi-Square Formula




References:
en.wikipedia.org/wiki/Cluster_analysis
http://www.clustan.com/what_is_cluster_analysis.html
www.ndsu.edu/pubweb/~mcclean/plsc431/mendel/mendel4.htm

BY: Abhisek Machama

No comments:

Post a Comment