More Example - Descriptive Statisticd
More Example - Descriptive Statisticd
More Example - Descriptive Statisticd
When raw data have been grouped in different classes then it is said to be grouped data.
Height of students:
(171,161,155,155,183,191,185,170,172,177,183,190,139,149,150,150,152,158,159,174,178,179,
190,170,143,165,167,187,169,182,163,149,174,174,177,181,170,182,170,145,143): This is
raw/ungrouped data.
The following table shows the grouped data from the above mentioned raw data
NOTE: Grouped-data mean will be explained later in this blog. Click here to read more about the
cumulative frequency
Before we study more about grouped and ungrouped data it is important to understand what do
we mean by “Central Tendencies”?
As the names suggest, central tendencies have something to do with the center. Central tendency
is the central location in a probability distribution. There are many measures for central
tendencies like mean, mode, median, interquartile range, percentiles, geometric mean, harmonic
mean, etc. The most common measures of central tendencies used are discussed below.
MODE: The most frequently occurring item/value in a data set is called mode. Bimodal is used
in the case when there is a tie b/w two values. Multimodal is when a given dataset has more than
two values with the same occurring frequency.
eg 7,11,14.25,15,15,15,15,15,19,19,29,81. Mode is 15
(ii)MEDIAN: The median of a dataset is described as the middlemost value in the ordered
arrangement of the values in the dataset.
NOTE: For an odd number of the dataset, the median is the middle value. For an even number of
the dataset, the median is the average of the two middle values.
eg 15,11,14,3,21,17,22,16,19,16,5,7,9,20,4
(iii)MEAN: Also known as the arithmetic average. It is calculated by the summation of all
values divided by the number of values.
(iv)PERCENTILE: This form of central tendency divides a group of data into 100 parts. The
nth percentile of a dataset is described as n values below that “nth value” and (100-n) values
above that “nth value”.
i = (P/100)*N
i: percentile position
(a) If ‘i’ be a whole number, then the percentile is at average the ‘i’ and ‘i+1’ position.
(b) If ‘i’ is not a whole number, then percentile value is at ‘i+1’ position.
i = (70/100)*1450
i = 1015
(v) QUARTILE: This form of central tendency divides a group into four sub-parts.
NOTE: The variability aspect of any data enables us to a better description of the data.
Both curves have the same mean but their scatter is different.
(i) RANGE: The difference b/w the largest value and the smallest value in a dataset is called the
range of the dataset. The range is also a representation of the end/extreme values.
Range helps in the construction of control charts on the data.
(ii) INTERQUARTILE RANGE: The interquartile range is the difference b/w the first and
third quartile.
It comes in handy because users are more interested in the middle values than the extreme ends.
(iii) MEAN ABSOLUTE DEVIATION: It is the average of the absolute values of deviations
around the mean of the dataset.
(iv) VARIANCE: It is the square of deviations about the arithmetic mean for a set of numbers.
NOTE: The final result is expressed in terms of the squared unit of measurement.
eg, the standard deviation of the data in the above example is 6.086
NOTE: Standard deviations are used in computing confidence intervals and hypothesis testing.
The standard deviation has the same unit as the raw data.
“The real usage of standard deviation can be understood through the Empirical rule and
Chebyshev’s Theorem. Both will be discussed in detail in coming up blogs”
(vi) COEFFICIENT OF VARIATION: It is the ratio of the standard deviation to the mean of
the data.
Mode = The mode of group data is the frequency of the modal class. The max frequency in the
above example is for intervals 7to9 i.e 19. Hence, the mode is 8
Abbreviations :
f: frequency
N: total frequency
i: initial point(N/2 will give us the location of the median value, i.e 30 in the above example). 29
entries will fit up to class interval “7 to 9”. Hence, the value of ‘i’ is 7.
MED: the frequency of the class where the median exists. For the above example the value of
MED=19.