Application of Descriptive Statistics and Eigenvalues
Descriptive statistics include
the numbers, tables, charts, and graphs used to, organize, summarize, and
present raw data. Descriptive
statistics are most often used to examine:
- · Central tendency (location) of data, i.e. where data tend to fall, as measured by the mean, median, and mode.
- · Dispersion (variability) of data, i.e. how spread out data is, as measured by the variance and its square root, the standard deviation.
- · Skew (symmetry) of data, i.e. how concentrated data are at the low or high end of the scale, as measured by the skew index.
- · Kurtosis (peakedness) of data, i.e. how concentrated data are around a single value, as measured by the kurtosis index
- Descriptive statistics can: (advantages):
- · be essential for arranging and displaying data
- form the basis of rigorous data analysis
- · be much easier to work with, interpret, and discuss than raw data
- · help examine the tendencies, spread, normality, and reliability of a data set
- · be rendered both graphically and numerically
- · include useful techniques for summarizing data in visual form
- · form the basis for more advanced statistical methods
- Descriptive statistics can: (disadvantages)
- be misused, misinterpreted, and incomplete
- · be of limited use when samples and populations are small
- · demand a fair amount of calculation and explanation
- · fail to fully specify the extent to which non-normal data are a problem
- · offer little information about causes and effects
- · be dangerous if not analysed completely
Any description of a data set
should include examination of the above.
As a rule, looking at central tendency via the mean, median, and mode
and dispersion via the variance or standard deviation is not sufficient. Descriptive
statistics are recommended when the objective is to describe and discuss a data
set more generally and conveniently than would be possible using raw data
alone. They are routinely used in reports
which contain a significant amount of qualitative or quantitative data.
Descriptive statistics help
summarize and support assertions of fact. Note that a thorough understanding of
descriptive statistics is essential for the appropriate and effective use of
all normative and cause-and-effect statistical techniques, including hypothesis
testing, correlation, and regression analysis.
Unless descriptive statistics are
fully grasped, data can be easily misunderstood and, thereby, misrepresented. All
four moments should be explored whenever possible. Skew and kurtosis should be examined any time
you deal with interval data since they jointly help determine whether the
variable underlying a frequency distribution is normally distributed. Since normal distribution is a key assumption
behind most statistical techniques, the skew and kurtosis of any interval data
set must be analysed. Data that show significant
variation skew, or kurtosis should not be used in making inferences, drawing
conclusions, or espousing recommendations.
Eigenvalues (latent values): In multivariate statistics,
eigenvalues give the variance of a linear function of the
variables. Eigenvalues measure the amount of the variation explained by each
principal component (PC) and will be largest for the first PC and smaller for
the subsequent PCs. An eigenvalue greater than 1 indicates that PCs account for
more variance than accounted by one of the original variables in standardized
data. This is commonly used as a cut-off point for which PCs are retained.
One most important statistical
application in which eigenvalues of the covariance matrix play a key role is
Principal Component Analysis (PCA). It is a linear dimensionality reduction
procedure, which can also be thought of as a model selection technique.
No comments:
Post a Comment