Business Statistics & Analytics For Decision Making Assignment 1 Franklin Babu

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

BUSINESS

STATISTICS &
ANALYTICS FOR
DECISION MAKING
MBA - 105

ASSIGNMENT -1

NAME – FRANKLIN BABU


SEMESTER – 1
DECEMBER 2021
Q1. Describe statistics with summary measures.

A1. In descriptive statistics, summary statistics are used to summarize a set


of observations, in order to communicate the largest amount of information as
simply as possible. Statisticians commonly try to describe the observations in

 a measure of location, or central tendency, such as the arithmetic mean


 a measure of statistical dispersion like the standard mean absolute deviation
 a measure of the shape of the distribution like skewness or kurtosis
 if more than one variable is measured, a measure of statistical dependence such
as a correlation coefficient
A common collection of order statistics used as summary statistics are the five-
number summary, sometimes extended to a seven-number summary, and the
associated box plot.
Entries in an analysis of variance table can also be regarded as summary statistics.
EG:-

Location
Common measures of location, or central tendency, are the arithmetic
mean, median, mode, and interquartile mean.
Spread
Common measures of statistical dispersion are the standard
deviation, variance, range, interquartile range, absolute deviation, mean absolute
difference and the distance standard deviation. Measures that assess spread in
comparison to the typical size of data values include the coefficient of variation.
The Gini coefficient was originally developed to measure income inequality and is
equivalent to one of the L-moments.
A simple summary of a dataset is sometimes given by quoting particular order
statistics as approximations to selected percentiles of a distribution.
Shape
Common measures of the shape of a distribution are skewness or kurtosis, while
alternatives can be based on L-moments. A different measure is the distance
skewness, for which a value of zero implies central symmetry.
Dependence
The common measure of dependence between paired random variables is
the Pearson product-moment correlation coefficient, while a common alternative
summary statistic is Spearman's rank correlation coefficient. A value of zero for
the distance correlation implies independence.

Q2. Explain measures of central tendency with suitable numerical example.

A2. MEASURES OF CENTRAL TENDENCY

Definition
The central tendency is stated as the statistical measure that represents the single
value of the entire distribution or a dataset. It aims to provide an accurate
description of the entire data in the distribution.

Measures of Central Tendency


The central tendency of the dataset can be found out using the three important
measures namely mean, median and mode.

1. Mean
The mean represents the average value of the dataset. It can be calculated as
the sum of all the values in the dataset divided by the number of values. In
general, it is considered as the arithmetic mean. Some other measures of
mean used to find the central tendency are as follows:

 Geometric Mean
 Harmonic Mean
 Weighted Mean
It is observed that if all the values in the dataset are the same, then all
geometric, arithmetic and harmonic mean values are the same. If there is
variability in the data, then the mean value differs. Calculating the mean
value is completely easy. The formula to calculate the mean value is given
as

The histogram given below shows that the mean value of symmetric
continuous data and the skewed continuous data.

In symmetric data distribution, the mean value is located accurately at the


centre. But in the skewed continuous data distribution, the extreme values in
the extended tail pull the mean value away from the centre. So it is
recommended that the mean can be used for the symmetric distributions.
2. Median
Median is the middle value of the dataset in which the dataset is arranged in
the ascending order or in descending order. When the dataset contains an
even number of values, then the median value of the dataset can be found by
taking the mean of the middle two values.
Median for odd number of values
N+1/2 th term
Median for even number of values
N/2 th term
Consider the given dataset with the odd number of observations arranged in
descending order – 23, 21, 18, 16, 15, 13, 12, 10, 9, 7, 6, 5, and 2

Here 12 is the middle or median number that has 6 values above it and 6
values below it.
Now, consider another example with an even number of observations that
are arranged in descending order – 40, 38, 35, 33, 32, 30, 29, 27, 26, 24, 23,
22, 19, and 17
When you look at the given dataset, the two middle values obtained are 27
and 29.
Now, find out the mean value for these two numbers.
i.e.,(27+29)/2 =28
Therefore, the median for the given data distribution is 28.

3. Mode
The mode represents the frequently occurring value in the dataset.
Sometimes the dataset may contain multiple modes and in some cases, it
does not contain any mode at all.
Consider the given dataset 5, 4, 2, 3, 2, 1, 5, 4, 5

Since the mode represents the most common value. Hence, the most
frequently repeated value in the given dataset is 5.
Based on the properties of the data, the measures of central tendency are
selected.
 If you have a symmetrical distribution of continuous data, all the three
measures of central tendency hold good. But most of the times, the
analyst uses the mean because it involves all the values in the
distribution or dataset.
 If you have skewed distribution, the best measure of finding the
central tendency is the median.
 If you have the original data, then both the median and mode are the
best choice of measuring the central tendency.
 If you have categorical data, the mode is the best choice to find the
central tendency.

Q3. Explain measure of dispersion with suitable numerical example.

A3. Measures of Dispersion


In statistics, the measures of dispersion help to interpret the variability of data i.e.
to know how much homogenous or heterogeneous the data is. In simple terms, it
shows how squeezed or scattered the variable is.

Types of Measures of Dispersion


There are two main types of dispersion methods in statistics which are:

 Absolute Measure of Dispersion


 Relative Measure of Dispersion

Absolute Measure of Dispersion


An absolute measure of dispersion contains the same unit as the original data set.
Absolute dispersion method expresses the variations in terms of the average of
deviations of observations like standard or means deviations. It includes
range, standard deviation, quartile deviation, etc.
The types of absolute measures of dispersion are:

1. Range: It is simply the difference between the maximum value and the
minimum value given in a data set. Example: 1, 3,5, 6, 7 => Range = 7 -1= 6
2. Standard Deviation: The square root of the variance is known as the
standard deviation i.e. S.D. = √σ.
3. Quartiles and Quartile Deviation: The quartiles are values that divide a list
of numbers into quarters. The quartile deviation is half of the distance
between the third and the first quartile.
4. Mean and Mean Deviation: The average of numbers is known as the mean
and the arithmetic mean of the absolute deviations of the observations from
a measure of central tendency is known as the mean deviation (also called
mean absolute deviation).

Relative Measure of Dispersion


The relative measures of dispersion are used to compare the distribution of two or
more data sets. This measure compares values without units. Common relative
dispersion methods include:

1. Co-efficient of Range
2. Co-efficient of Standard Deviation
3. Co-efficient of Quartile Deviation
4. Co-efficient of Mean Deviation

EG:- RANGE
EG:- MEAN DEVIATION

Q4. Differentiate between correlation and regression with suitable example.

You might also like