More Example - Descriptive Statisticd

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 5

What is grouped data?

When raw data have been grouped in different classes then it is said to be grouped data.

For example, consider the following :

Height of students:
(171,161,155,155,183,191,185,170,172,177,183,190,139,149,150,150,152,158,159,174,178,179,
190,170,143,165,167,187,169,182,163,149,174,174,177,181,170,182,170,145,143): This is
raw/ungrouped data.

The following table shows the grouped data from the above mentioned raw data

NOTE: Grouped-data mean will be explained later in this blog. Click here to read more about the
cumulative frequency

Before we study more about grouped and ungrouped data it is important to understand what do
we mean by “Central Tendencies”?

As the names suggest, central tendencies have something to do with the center. Central tendency
is the central location in a probability distribution. There are many measures for central
tendencies like mean, mode, median, interquartile range, percentiles, geometric mean, harmonic
mean, etc. The most common measures of central tendencies used are discussed below.

MODE: The most frequently occurring item/value in a data set is called mode. Bimodal is used
in the case when there is a tie b/w two values. Multimodal is when a given dataset has more than
two values with the same occurring frequency.

eg 7,11,14.25,15,15,15,15,15,19,19,29,81. Mode is 15

(ii)MEDIAN: The median of a dataset is described as the middlemost value in the ordered
arrangement of the values in the dataset.
NOTE: For an odd number of the dataset, the median is the middle value. For an even number of
the dataset, the median is the average of the two middle values.

eg 15,11,14,3,21,17,22,16,19,16,5,7,9,20,4

Let’s arrange this data in ascending order

3,4,5,7,8,9,11,14,15,16,16,17,19,19,20,22,22. The median is n+1/2 = 17+1/2 = 18/2 = 9

Advantage of Median : It is not influenced by larger values. It remain immune to outliers.

“The data must be at least ordinal for the median to be meaningful”

(iii)MEAN: Also known as the arithmetic average. It is calculated by the summation of all
values divided by the number of values.

eg, The mean of “15,11,14,3,21,17,22,16,19,16,5,7,9,20,4” is 13.26667.

(iv)PERCENTILE: This form of central tendency divides a group of data into 100 parts. The
nth percentile of a dataset is described as n values below that “nth value” and (100-n) values
above that “nth value”.

Now, let’s see how to calculate percentiles.

STEP 1: Arrange the data in ascending order.

STEP 2: The ith percentile location is :

i = (P/100)*N

i: percentile position

N: total no. in the dataset

P: the percentile of interest.

STEP: Determining the location by either (a) or (b)

(a) If ‘i’ be a whole number, then the percentile is at average the ‘i’ and ‘i+1’ position.

(b) If ‘i’ is not a whole number, then percentile value is at ‘i+1’ position.

eg. Suppose we want to determine the 70th percentile of 1450 numbers.

i = (70/100)*1450
i = 1015

P = 1015th number + 1016th number/2

(v) QUARTILE: This form of central tendency divides a group into four sub-parts.

First Quartile =25th percentile

Second Quartile =50th percentile

Third Quartile = 75th percentile

Fourth Quartile = 100th percentile.

NOTE: The second quartile is equal to the median of the data.

Understanding the measures of variability of ungrouped data.

The measure of variability describes the spread or scatter of the dataset.

NOTE: The variability aspect of any data enables us to a better description of the data.

Both curves have the same mean but their scatter is different.

(i) RANGE: The difference b/w the largest value and the smallest value in a dataset is called the
range of the dataset. The range is also a representation of the end/extreme values.
Range helps in the construction of control charts on the data.

(ii) INTERQUARTILE RANGE: The interquartile range is the difference b/w the first and
third quartile.

It comes in handy because users are more interested in the middle values than the extreme ends.

(iii) MEAN ABSOLUTE DEVIATION: It is the average of the absolute values of deviations
around the mean of the dataset.

(iv) VARIANCE: It is the square of deviations about the arithmetic mean for a set of numbers.

NOTE: The final result is expressed in terms of the squared unit of measurement.

(v) STANDARD DEVIATION: It is the square root of the variance.

eg, the standard deviation of the data in the above example is 6.086

NOTE: Standard deviations are used in computing confidence intervals and hypothesis testing.
The standard deviation has the same unit as the raw data.
“The real usage of standard deviation can be understood through the Empirical rule and
Chebyshev’s Theorem. Both will be discussed in detail in coming up blogs”

(vi) COEFFICIENT OF VARIATION: It is the ratio of the standard deviation to the mean of
the data.

eg The coefficient of variation in the above example is (6.086/9.4)*100=64.7.

Calculating measures of central tendencies of grouped data.

Consider the following data:

Mean = ∑fx/n = 6.93

Median = i+(N/2 — C.W)/MED = 7.105

Mode = The mode of group data is the frequency of the modal class. The max frequency in the
above example is for intervals 7to9 i.e 19. Hence, the mode is 8

Abbreviations :

f: frequency

N: total frequency

CW: class width

i: initial point(N/2 will give us the location of the median value, i.e 30 in the above example). 29
entries will fit up to class interval “7 to 9”. Hence, the value of ‘i’ is 7.

MED: the frequency of the class where the median exists. For the above example the value of
MED=19.