Levels of Data

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 26

WEBINAR ON

USE OF

STATISTICAL
TOOLS
IN SOCIAL SCIENCE RESEARCH

1
2
When analysing the data, there are different types of
statistical measures that can be used to help to
interpret the numbers.
However, based on what type of data is available, not
all statistics can be used.
It is important to understand what type of data one
has available in order to determine what type of
statistical analysis can be completed.

3
 The most elementary scale in measurement is does no more than
identify the category into which the individual, event or objects may
be classified.

 When referring to nominal data, it is help to realize that in this case


the numbers are completely arbitrary. They don't really mean
anything. Any questions that group people into categories are usually
considered nominal.

 For example, when tabulating the survey, one could assign the
number 1 to the respondents and the number 2 to another group of
respondents. These numbers are completely arbitrary. One could just
have easily assigned the numbers 51 and 885, as an example. The
numbers don't mean anything in relation to the categories.

 Examples: Gender, colour of the eye, religion, caste

4
 Ordinal data requires that there is order in the data.
Ordinal numbers are used to indicate rank and nothing
more. The ordinal scale is used to arrange individuals or
objects in a series ranging from the highest to the lowest
according to the particular characteristic being measured.
 One can not assume that the intervals are equal.
 For example, when ranking the outcomes of a race, the
first place winner is 1 and the second place winner is 2.
There is a reason and a order behind the assignment of the
numbers. However, this doesn't tell us that the number 1
finisher is twice as good as the number 2 finisher.
 Example: Intelligence, job satisfaction

5
Apart from the ranking ordering the data, the interval
scale allows one to state precisely how far apart are
the individuals, the objects or the events that form
the focus of the enquiry. It permits certain
mathematical procedures previously untenable at the
other two levels.
Example: Age, income, height, weight
Interestingly enough, the way you phrase the
question is directly related to the type of data that
you get back.

6
It is the highest level of measurement. It subsumes
all the other three. A ration scale incudes an absolute
zero.
Because it has an absolute zero, it is possible to do all
the operations like, addition, subtraction,
multiplications and division. Educational and
psychological test can assume ration level
measurement. It is most used in physical sciences.
Example: If weigh is used as an example, no mass at
all is zero. 1000 grams is 400 grams heavier than 600
grams and twice as heavy as 500.

7
A discrete variable is a variable which can take on
numerals or values that are specific distinct point on
the scale.
Example: Gender is a discrete variable. It can take
only a fixed number of values. Color of the eye,
religion, caste

8
It can take on any value between the points on a
scale. It can be measured with differing degrees of
exactness depending on the measuring instrument.
Example: Weight. It can take any number of values
from zero to infinity. Height, time, age

9
 The first step in any data analysis strategy is to
calculate summary measures to get a general feel for
the data. Summary measures for a data set are often
referred to as descriptive statistics. Descriptive
statistics fall into three main categories:
 Measures of position (or central tendency)
 Measures of variability
 Measures of skewness
They can be useful for beginning data analysis, for
comparing multiple data sets, and for reporting final
results of a survey.
10
 The mean is simply the average of all the items in a sample.
To compute a sample mean, add up all the sample values
and divide by the size of the sample.
 The continue values for seven smokers are:
73, 58, 67, 93, 33, 18, and 147
. If you added up these values you would get a sum of 489.
Divide that sum by 7 to get a mean of 69.9.
 We will sometimes make the distinction between the
sample mean and the population mean. The population
mean (often represented by the Greek letter mu) is simply
the average of all the items in a population. Because a
population is usually very large, the population mean is
usually an unknown constant.
11
The median is the middle observation in a data set.
That is, 50 per cent of the observation are above the
median and 50 per cent are below the median (for
sets with an even number of observation, the median
is the average of the middle two observation).
The median is often used when a data set is not
symmetrical, or when there are outlying observation.
For example, median income is generally reported
rather than mean income because of the outlying
observation.

12
The Mode is the value around which the greatest
number of observation are concentrated, or quite
simply the most common observation. Mode is often
used with nominal data, but is not the preferred
measure for other types of data.
One can have three modes in a set of data

13
While measures of position describe where the data
points are concentrated, measures of variability
measure the dispersion (or spread) of the data set.

14
 The range is the difference between the largest and the
smallest observations in the data set. This is a limited
measure because it depends on only two of the numbers in
the data set.
 Using the above data set again, the range is 149, but that
does not provide any information regarding the
concentration of the data at the low end of the scale.
Another limitation of range is that it is affected by the
number of observations in the data set. Generally, the
more observation there are, the more spread out they will
be. One use of range in everyday life is in newspaper stock
market summaries, which give the day's high and low
numbers.

15
Unlike range, variance takes into consideration all the
data points in the data set. If all the observation are
the same, the variance would be zero. The more
spread out the observation are, the larger the
variance.

16
Standard deviation is the positive square root of the
variance, and is the most common measure of
variability. Standard deviation indicates how close to
the mean the observations are. The larger the
standard deviation, the more variation there is in the
data set.

17
Measures of position and variability tell us where the
data are located and how dispersed they are.
Measures of skewness are concerned with whether
the data are symmetrically distributed, or the shape of
the distribution.
Most people are familiar with the distribution
referred to as the normal, or bell-shaped, curve. Many
of the statistics we use assume the data are
distributed normally.

18
The first step in any data analysis strategy is to
determine what you want to know, or your purpose in
analyzing the data. Ideally, you should have
determined this before collecting your data, but all
too frequently this is not the case. Many of the
commonly used statistical tests can be classified into
one of three categories:
Description
Comparison
Association

19
The purpose of descriptive statistics is to describe the
data. The type of data will determine which
descriptive statistic is appropriate. Specifically, one
can only calculate a mean with interval or ratio data,
whereas a mode can be calculated with nominal,
ordinal, interval or ratio data.

20
A common goal in conducting research is to
determine if differences exist between two or more
groups.
For example, we may be interested in determining if
people who defect to another service provider are
different from those who choose to remain. While the
most common examples of this type of analysis focus
on differences of means and variances, it is important
to note we can analyze many types of differences
including correlation coefficients, proportions and
percentages. The statistic used is determined by the
type of data you have.
21
 Examples
 Chi-Square
 A common goal in conducting research is to determine if
differences exist between two or more groups. For
example, we may be interested in determining if people
who defect to another service provider are different from
those who choose to remain. While the most common
examples of this type of analysis focus on differences of
means and variances, it is important to note we can
analyze many types of differences including correlation
coefficients, proportions and percentages. The statistic
used is determined by the type of data you have.

22
Example

t-test

One of the most common statistical tests to use for


comparisons with interval data is the t-test. The t-test
compares the means of two groups, and then
determines whether those two means are different
enough to be statistically different.

23
Example: One way ANOVA(Interval)
While the t-test is useful for testing differences
between two groups, frequently we are interested in
more than two groups. In those cases, we often rely
on the Analysis of Variance (ANOVA) To tell us if
those groups are different on some variable of
interest. For example, if the training example from
above included a third group (i.e., a combination of
on- and off-site training) it would require use of the
ANOVA instead of the t-test.

24
Example: Factorial ANOVA
Frequently we are interested in understanding the
effects of varying levels of two or more variables on a
third variable. In such a case, we are unable to use the
One-way ANOVA because it is limited to comparisons
of the effects of one variable on another. Essentially, a
factorial ANOVA analyzes the impact of both the
variables independently as well as jointly to
determine how they affect another variable of
interest.

25
26

You might also like