Box Plot Graphical Presentation:
What is it?
The box
plot is a graphical representation of data that shows a data set’s lowest
value, highest value, median value, and the size of the first and third
quartile. The box plot is useful in analyzing small data sets that do not lend
themselves easily to histograms. Because of the small size of a box plot, it is
easy to display and compare several box plots in a small space. A box plot is a
good alternative or complement to a histogram and is usually better for showing
several simultaneous comparisons.
How to read it?
The
box-and-whisker plot is an exploratory graphic, created by John W. Tukey, used
to show the distribution of a dataset (at a glance). Think of the type of data
you might use a histogram with, and the box-and-whisker (or box plot, for
short) could probably be useful.The box plot, although very useful, seems to get lost in areas outside of Statistics, but I'm not sure why. It could be that people don't know about it or maybe are clueless on how to interpret it. In any case, here's how you read a box plot.
How to use it?
1. Collect
and arrange data. Collect the data and arrange it into
an ordered set from lowest value to highest.
2. Calculate
the depth of the median. d(M) =(n+1)/2
Where d =
depth; the number of observations to count from the beginning of the ordered data
set
M = median
n = number
of observations in the set of data
If the
ordered data set contains an odd number of values, the formula will identify
which of the values will be the median. If the ordered data set contains an
even number of values, the median will be midway between two of the values.
3.
Calculate the depth of the first
quartile. d(Q1)
=(1)n + 2
4
Where d =
depth; the number of observations to count from the beginning of the ordered data
set
(Q1)
= the first quartile
n = number
of observations in the set of data
The first
quartile will be the value of the data item identified by this formula.
4. Calculate
the depth of the third quartile.
d(Q3)
= (3)n + 2
4
Where d =
depth; the number of observations to count from the beginning of the ordered data
set
(Q3)
= the third quartile
n = number
of observations in the set of data
The third
quartile will be the value of the data item identified by this formula.
5. Calculate
the interquartile rage (IQR).
This range
is the difference between the first and third quartile vales. (Q3
- Q1)
6. Calculate
the upper adjacent limit.
This is
the largest data value that is less than or equal to the third quartile plus
1.5 X IQR. Q3 + [(Q3
- Q1) X 1.5]
7. Calculate
the lower adjacent limit.
This value
will be the smallest data value that is greater than or equal to the first
quartile minus 1.5 X IQR. Q1 - [(Q3
- Q1) X 1.5]
8. Draw
and label the axes of the graph.
The scale
of the vertical axis must be large enough to encompass the greatest value of the
data sets. The horizontal axis must be large enough to encompass the number of
box plots to be drawn.
9. Draw
the box plots.
Construct
the boxes, insert median points, and attach upper and lower adjacent limits. Identify
outliers (values outside the upper and lower adjacent limits) with asterisks.
10. Analyze
the results.
A box plot shows the distribution of data. The line between the lowest adjacent limit and the bottom of the box represent one-fourth of the data. One-fourth of the data falls between the bottom of the box and the median, and another one-fourth between the median and the top of the box. The line between the top of the box and the upper adjacent limit represents the final one-fourth of the data observations. Once the pattern of data variation is clear, the next step is to develop an explanation for the variation.
Author-
Akanksha Durgvanshi
No comments:
Post a Comment