Main Title: Planning Data Analysis Using Statistical Data

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 40

MAIN TITLE

Planning data
analysis using
statistical data
Lesson 5 YOUR
BUSINESS
NAME
• YOUR BUSINESS PURPOSE
PURPOSE OF DATA ANALYSIS
PLAN
• Describe data sets;
• Determine the degree of relationship of
variables
• Determine the differences between the
variables
• Predict outcomes; and
• Compare variables
DATA ANALYSIS STRATEGIES
• Exploratory Data Analysis
-this type of data is used when it is not clear what to expect from the data,
this strategy uses numerical and visual presentations like graphs.
• Descriptive data analysis
-this data analysis used to described, show or summarize data in a
meaningful way leading to a simple interpretation. The common
descriptive statistic data are, frequency, percentage, measure of central
tendency and measures of dispersion.
• Inferential Data Analysis
-this data analysis test the hypotheses about the set data to reach
conclusion, this data includes the test of significance of difference such as;
T-test, analysis of variance (ANOVA) and test of relationship such as;
product moment coefficient or correlation or Pearson r, Spearman rho,
linear regression and Chi-square test.
QUANTITATIVE ANALYSIS
The following are the levels of measurement
scales:
NOMINAL SCALE
ORDINAL SCALE
INTERVAL SCALE
RATIO SCALE
• Nominal scale
- a nominal scale of measurement is used to
label variables. It is sometimes called
categorical data
• Ordinal scale
-ordinal scale of measurement assigns order
on items on the characteristics being measured.
The order in role(e.g first, second, third); order
of agreement( e.g agree, disagree) or economic
status(e.g low, average high)
• Interval scale
- a scale that has equal units of measurement,
thereby making it possible to interpret the
order of the scale scores and distance between
them. However, interval scales do not have true
zero. In addition it can be addition or
subtraction but it cannot be multiplied.
• Ratio scale
- it is considered as the highest level of
measurement, it has the characteristics of the
interval scale thus it has zero point. All
descriptive and inferential statistics may be
applied.
A. DESCRIPTIVE DATA ANALYSIS

1. MEASURES OF CENTRAL
TENDENCY
-The common measures of central
tendency, sometimes called measures
of location or center, includes mean,
median and mode.
1.1MEAN
-often called the arithmetic average of a set
data. Frequently used for interval or ratio
data, the symbol (x bar) denote the arithmetic
mean.
A. For Ungrouped data (mean)
• FOR GROUP DATA (MEAN)
Formula for group data:

• The weighted mean


Formula and example for weighted mean:
1.2 MEDIAN
- Is the midpoint of the distribution , it represents the
date where 50% of the values fall down and the 50% fall
above it

•For ungrouped data (median)


-the median may be calculated from ungrouped data by
doing the following steps.
1.Arrange the items (scores, responses, observations) from
lowest to highest
2.Count to the middle value. For an odd number of values
arranged from lowest highest, the median corresponds to
value. If the array contains an even number of observations,
the median is the average of the two values.
FOR GROUP DATA (median)
Example and formula:
1.3 MODE
The mode is the most frequently
occurring value in a set of observations, in
cases where there is more than one observation
which is the highest but with equal frequency,
the distribution is bimodal( with 2 highest
observations) or multimodal with more than
two highest observations. In cases, where every
item has an equal number of observations,
there is no mode. The mode is appropriate for
nominal data.
EXAMPLES:
The ages of fifteen (15) persons assembled in a room are
as follows:
16,18,18,25,25,25,30,34,36 and 38
Solution:
An age of 25 is the mode because it has been recorded
three times in a sample, more than any other age

The number of hours spent by 10 students in an internet


café was as follows:
2,2,2,3,3,4,4,4,4,5,5
Solution:
Both 2 and 4 have a frequency of 3. The data is therefore
bimodal.
Answer: mode = 2 and 4
2. MEASURES OF DISPERSION
The extent of the spread, or the
dispersion of the data is described by
a group of measures called measures
of dispersion, also called measures of
variability. The measures to be
considered are the range, average or
mean deviation, standard deviation
and the variance.
2.1 THE RANGE
Range is the difference between the largest and the
smallest values in a set of data.
Consider the following scores obtained by ten (10)students
participating in a mathematics contest:
6,10,12,15,18,18,20,23,25,28
Thus, this range is 22. the scores ranges from 6-28.
2.2 AVERAGE (MEAN)DEVIATION
This measure of spread is defined as the absolute or
deviation between the values in a set of data and the mean,
divided by the total number of values in the set of data.
In mathematics, the term “absolute” represented by the sign
“||”simply means taking the value of a number without regard
to positive or negative sign
FOR UNGROUPED DATA
2.3 STANDARD DEVIATION
The standard deviation (SD) is
measure from the spread of variation of
data about the mean.
SD is computed by calculating the
average distance that the average value
is from the mean.
FORMULA:

EXAMPLE:
Let us consider the same data used in the illustration for using the range.
The values are 6,10,12,15,18,18,20,23,2528
INTERPRETATION OF STANDARD
DEVIATION
The standard deviation allows you to reach
conclusions about scores in the distribution the
following conclusions cane be reached if that
distribution of scores a normal:
1.Approximately 68% of the scores in the sample
falls within one standard deviation of the man.
2.Approximately 95% of the scores in the sample
falls within two standards deviation of the mean.
3.Approximately 99%of the scores in the sample
falls with threes standard deviations of the mean.
TEST OF SIGNIFICANCE OF DIFFERENCE (T-
TEST)
•Between Means- For independent samples
(when respondents of different groups like boys and girls)
• CORRELATED/DEPENDENT SAMPLES
(When the same set of respondents or paired
sets of respondents are involved)
• BETWEEN PROPORTIONS OR
PERCENTAGE
 For independent samples

 For correlated/dependent samples


ANALYSIS OF VARIANCE (ANOVA)
ANOVA is used when significance of difference of
means of two or more groups are to be determined at one time.
•ONE-WAY ANALYSIS OFVARIANCE
A typical ANOVA table:

ANOVA relies on the F-ration to test the hypothesis that the two variances
are equal; that is, the subgroups are from the same population. “Between
groups” refers to the variation between each group mean and the grand or
overall mean
2. TEST OF RELATIONSHIP
•Spearman rank-order correlation or spearman rho
-this used when data available are express in terms of rank
(ordinal variable).

•Chi-square test for independence


-this is used when data available are expressed in terms of
frequencies or percentages (nominal variables)
Case 1: Multinomial
Case 2: Contingency Table
• Product- moment coefficient of correlation or Pearson r
-this is used when data are expressed in terms of scores
such as weights and heights or scores in a test (ratio or
interval)
Case 1: When deviations from the mean are used

Case 2: When raw scores on the original observations are


used
• T-test to test the significance of Pearson r
The T-test to test the significance of Pearson r is used to determine if the
value of the computed coefficient of correlation is significant. That is,
does it represent a real correlation or is obtained coefficient or
correlation merely brought by.

The coefficient of determination (r²)can also be used to indicate what


proportion of the total variation in the independent variable is explained
by the linear relationship with the independent variable. You can
multiply by 100 to convert the coefficient of determination to percent.
TESTING THE HYPOTHESIS
Lesson 6
POPULATION AND SAMPLE
- A measure based on a population is
called a parameter while a measured based on
a sample is called a statistic.
- an inferential statistics requires that
a sample be drawn by random sampling
STATISTICAL SIGNIFICANCE
-This mean that a relationship between two or more variables is
caused by something other than by random chance. Significant is also means
probably true (not due to chance).
-statistical hypothesis testing is used to determined whether the
result of data set is statistically significant.

HYPOTHESIS
-Is a preconceived idea, assumed to be true and has to be tested for
truth or falsity.
There are two types of hypothesis the null hypothesis and the alternative
hypothesis.
•Null hypothesis
-indicates that there is NO difference between the group means in
the comparison.
•Alternative hypothesis
-indicates that there is a TRUE difference between the group means.
TYPE I AND II ERRORS
-Two types of errors involved with hypothesis testing. Type I
error is committed when the researcher rejected the null hypothesis even if
its true. The type II error occurs when the data from the sample produce
results that fail to reject the null hypothesis when in fact the null
hypothesis is false and should be rejected.

PARAMETIC AND NON PARAMETRIC STATISTICS


- Parametric test are used for interval and ratio scales of
measurement, it requires samples and observations. The population
should have equal variances.
-Nonparametric do not specify normally distributed population
and similarity of variances, it uses nominal and ordinal data.
ACTIVITIES

You might also like