Outliers
Outliers
are cases that have data values that are very different from the data values
for the majority of cases in the data set.
Outliers
are important because they can change the results of data analysis.
Whether to
include or exclude outliers from a data analysis depends on the reason why the
case is an outlier and the purpose of the analysis.
Investigating
outliers carefully
Often
outliers contain valuable information about the process under investigation or
the data gathering and recording process. Before considering the possible
elimination of these points from the data, one should try to understand why
they appeared and whether it is likely similar values will continue to appear.
Of course, outliers are often bad data points.
Box Plot:
To detect Outliers
Graphs> Legacy> Boxplots
It is a graphical representation of data that shows a data
set’s lowest value, highest value, median value, and the size of the first and
third quartile. The box plot is useful in analyzing small data sets that do not
lend themselves easily to histograms. Because of the small size of a box plot,
it is easy to display and compare several box plots in a small space. A box
plot is a good alternative or complement to a histogram and is usually better
for showing several simultaneous comparisons.
Following are the steps to use Box plot:
Data collection
Depth of median is calculated
Draw and label the axes of the graph.
Draw the box plots: Construct the boxes, insert
median points, and attach upper and lower adjacent limits. Identify outliers
(values outside the upper and lower adjacent limits) with asterisks.
Analyze the results: A box plot shows the
distribution of data. The line between the lowest adjacent limit and the bottom
of the box represent one-fourth of the data. One-fourth of the data falls
between the bottom of the box and the median, and another one-fourth between
the median and the top of the box. The line between the top of the box and the
upper adjacent limit represents the final one-fourth of the data observations.
Once the pattern of data variation is clear, the next step is to develop an
explanation for the variation.
No comments:
Post a Comment