D1UA401B Research Methodology-UNIT-4 Pazhanisamy-BBA IV Semester Section19

Data Processing, Univariate and Bivariate Analysis of
Data, Hypothesis Testing, Analysis of Variance

Techniques, (Z-Test, t-Test, F-Test), Non-Parametric
Tests: Chi-square test.
Advanced Data Analysis Techniques: Correlation and
Regression Analysis,-Simple and multiple Quantitative
estimate of Linear Correlation, Testing the significance of
Correlation Coefficient, Test of significance of
Regression parameters, Uses of Regression Analysis.
Prepared by : R.Pazhanisamy, Assistant Professor of Economics,SOB,GU.

Understand . . .
 How to do univariate analysis for business
 Understandings How predictions are made in
business using the univariate and bi variate
analysis
18-2
Understand . . .
 How to test regression models for linearity
and whether the equation is effective in
fitting the data.
 Nonparametric measures of association and
the alternatives they offer when key
assumptions and requirements for parametric
techniques cannot be met.
18-3
UNIVARIATE ANALYSIS
Univariate analysis focuses on analyzing a single variable at

a time. It involves summarizing and describing the
characteristics and distribution of a single variable without
considering relationships with other variables.
Common techniques used in univariate analysis include
measures of central tendency (e.g., mean, median, mode),
measures of dispersion (e.g., variance, standard deviation),
graphical representations (e.g., histograms, box plots, bar
charts), and frequency distributions.
Univariate analysis provides insights into the distribution,
variability, and patterns within individual variables,
allowing researchers to understand their properties and
identify outliers, trends, or abnormalities.
18-4
BIVARIATE ANALYSIS
Bivariate analysis examines the relationship between two variables
simultaneously. It involves exploring how changes in one variable are associated
with changes in another variable. Common techniques used in vicariate analysis
include correlation analysis, scatter plots, contingency tables, and cross-
tabulations.
Correlation analysis quantifies the strength and direction of the relationship

between two continuous variables using correlation coefficients (e.g., Pearson
correlation coefficient, Spearman rank correlation coefficient). Scatter plots
visually represent the relationship between two continuous variables.
For categorical variables, contingency tables and cross-tabulations display the
frequency distribution of one variable across different categories of another
variable, allowing for comparisons and assessment of associations.
Bivariate analysis helps identify patterns, associations, and dependencies
between variables, facilitating hypothesis testing, prediction, and understanding
of causal relationships.
18-5
For continuous linearly related
Pearson correlation coefficient
variables
For nonlinear data or relating a main
Correlation ratio (eta) effect to a continuous dependent
variable
One continuous and one dichotomous

Biserial variable with an underlying normal
distribution
Three variables; relating two with the

Partial correlation
third’s effect taken out
Three variables; relating one variable

Multiple correlation
with two others
Predicting one variable from another’s

Bivariate linear regression
scores
18-6
Inductive Deductive
Reasoning Reasoning
17-7
Inferential Descriptive
Statistics Statistics
17-8
17-9
As Abacus states in
this ad, when
researchers ‘sift
through the chaos’ and
‘find what matters’ they
experience the “ah ha!”
moment.
17-10
Classical statistics Bayesian statistics
Objective view of Extension of classical
probability approach
Established hypothesis Analysis based on
is rejected or fails to sample data
be rejected Also considers
Analysis based on established subjective
sample data probability estimates
17-11
Null
 H0:  = 50 mpg
 H0:  < 50 mpg
 H0:  > 50 mpg
Alternate
 HA:  = 50 mpg
 HA:  > 50 mpg
 HA:  < 50 mpg
17-12
17-13
17-14
17-15
17-16
17-17
17-18
17-19
True
True value
value of
of parameter
parameter
Alpha
Alpha level
level selected
selected
One
One or
or two-tailed
two-tailed test
test used
used
Sample
Sample standard
standard deviation
deviation
Sample
Sample size
size
17-20
17-21
State
Statenull
null
hypothesis
hypothesis
Interpret
Interpret the
the Choose
Choose
test
test statistical
statistical test
test
Stages
Stages
Obtain
Obtain critical
critical Select
Select level
levelof
of
test
test value
value significance
significance
Compute
Compute
difference
difference
value
value
17-22
Parametric Nonparametric
17-23
Independent
Independent observations
observations
Normal
Normal distribution
distribution
Equal
Equal variances
variances
Interval
Interval or
or ratio
ratio scales
scales
17-24
17-25
17-26
17-27
Easy
Easy to
to understand
understand and
and use
use
Usable
Usable with
with nominal
nominal data
data
Appropriate
Appropriate for
for ordinal
ordinal data
data
Appropriate
Appropriate for
for non-normal
non-normal
population
population distributions
distributions
17-28
How many samples are involved?
If two or more samples:

are the individual cases independent or related?
Is the measurement scale

nominal, ordinal, interval, or ratio?
17-29
k-Sample Tests
Two-Sample Tests
_______________________________________ _______________________________________
_____ _____
Measureme One-Sample Related Independent Related Independent

nt Scale Case Samples Samples Samples Samples
Nominal • Binomial • McNemar • Fisher exact • Cochran Q • x2 for k
• x2 one-sample test samples
test • x2 two-
samples test
Ordinal • Kolmogorov- • Sign test • Median test • Friedman • Median
Smirnov one- two-way extension
sample test • Wilcoxon • Mann- ANOVA • Kruskal-
• Runs test matched- Whitney U Wallis one-
pairs test • Kolmogorov way ANOVA
-Smirnov
• Wald-
Wolfowitz
Interval and • t-test • t-test for • t-test • Repeated- • One-way
Ratio paired measures ANOVA
• Z test samples • Z test ANOVA • n-way
ANOVA
17-30
 Is there a difference between observed
frequencies and the frequencies we would
expect?
 Is there a difference between observed and
expected proportions?
 Is there a significant difference between some
measures of central tendency and the
population parameter?
17-31
Z-test t-test
17-32
Null Ho: = 50 mpg
Statistical test t-test
Significance level .05, n=100
Calculated value 1.786
Critical test value 1.66
(from Appendix C,
Exhibit C-2)
17-33
18-34
18-35
18-36
18-37
18-38
Z-test and Chi-square test are statistical tests used to compare
groups or test hypotheses. Z-test is used when the sample size is
large and the population standard deviation is known, and is
used to test hypotheses about the mean of a normal
population2. It is used to compare two groups by comparing
their population proportions.
Chi-square test is used when the sample size is small, and is
used to test hypotheses about the distribution of a categorical
variable. It can be used to compare the difference in population
proportions between two or more groups, or to compare one
group to a value1. A chi-square test for equality of two
proportions is exactly the same thing as a z-test.
18-39
Expected
Intend Number Percent Frequencies
Living Arrangement to Join Interviewed (no. interviewed/200) (percent x 60)
Dorm/fraternity 16 90 45 27
Apartment/rooming
13 40 20 12
house, nearby
Apartment/rooming
16 40 20 12
house, distant
Live at home 15 30 15 9
_____ _____ _____ _____
Total 60 200 100 60
17-40
Null Ho: 0 = E
Statistical test One-sample chi-square
Significance level .05
Calculated value 9.89

(from Appendix C, Exhibit C-3)
Reject H0
17-41
18-42
There was no significant relationship between handedness
and nationality, Χ2 (1, N = 428) = 0.44, p = .505.
18-43
 Use a Z test when you need to compare group means.
Use the 1-sample analysis to determine whether
a population mean is different from a hypothesized
value. Or use the 2-sample version to determine
whether two population means differ.
 A Z test is a form of inferential statistics. It uses
samples to draw conclusions about populations.
For example:
 One sample: Do employees training program have an
average IQ score different than a hypothesized value of
100?
 Two sample: Do two IQ boosting programs have
different mean scores for employees ?
(it require the population standard deviation)
18-44
 This analysis uses sample data to evaluate hypotheses that refer to
population means (µ). The hypotheses depend on whether you’re
assessing one or two samples.
 One-Sample Z Test Hypotheses
 Null hypothesis (H0): The population mean equals a hypothesized
value (µ = µ0).
 Alternative hypothesis : (HA): The population mean DOES NOT
equal a hypothesized value (µ ≠ µ0).
 When the p-value is less or equal to your significance level (e.g.,
0.05), reject the null hypothesis. Your sample data support the
notion that the population mean does not equal the hypothesized
value.
18-45
 Two-Sample Z Test Hypotheses
 Null hypothesis (H0): Two population means are
equal (µ1 = µ2).
 Alternative hypothesis (HA): Two population means
are not equal (µ1 ≠ µ2).
 Again, when the p-value is less than or equal to
your significance level, reject the null hypothesis.
The difference between the two means is
statistically significant. Your sample data support
the idea that the two population means are
different.
18-46
17-47
 Example 1sample T test : (known sd)
 A farming company wants to know if a new fertilizer
has improved crop yield or not.
 Historic data shows the average yield of the farm is 20
tonne per acre. They decide to test a new organic
fertilizer on a smaller sample of farms and observe the
new yield is 20.175 tonne per acre with a standard
deviation of 3.02 tonne for 12 different farms.
 Did the new fertilizer work?
18-48
18-49
 Suppose the IQ levels among individuals in two different cities are
known to be normally distributed each with population standard
deviations of 15.
 A scientist wants to know if the mean IQ level between individuals
in city A and city B are different, so she selects a simple random
sample of 20 individuals from each city and records their IQ
levels.
 To test this, she will perform a two sample z-test at significance
level α = 0.05 using the following steps:
 x1 (sample 1 mean IQ) = 100.65
 n1 (sample 1 size) = 20
 x2 (sample 2 mean IQ) = 108.8
 n2 (sample 2 size) = 20-
 (Since the p-value (0.0858) is not less than the significance level
(.05), the scientist will fail to reject the null hypothesis.)
18-50
18-51
17-52
T-Test: Formula and solved examples (collegedunia.com)
18-53
Null Ho: A sales = B sales
Statistical test t-test
Significance level .05 (one-tailed)
Calculated value 1.97, d.f. = 20
(from Appendix C, Exhibit C-2)
17-54
 F Test is usually used as a generalized Statement for
comparing two variances( Ho cannot be rejected)
18-55
18-56
 The purpose of a one-way ANOVA test is to
determine the existence of a statistically significant
difference among several group means.
All data are hypothetical. 17-57

 Suppose we want to know whether or not three different exam
prep programs lead to different mean scores on a certain exam.
To test this, we recruit 30 students to participate in a study and
split them into three groups.
 The students in each group are randomly assigned to use one of
the three exam prep programs for the next three weeks to
prepare for an exam. At the end of the three weeks, all of the
students take the same exam.
 From the output table we see that the F test statistic is 2.358 and
the corresponding p-value is 0.11385.
 Since this p-value is not less than 0.05, we fail to reject the null
hypothesis.
 This means we don’t have sufficient evidence to say that there is
a statistically significant difference between the mean exam
scores of the three groups .
18-58
18-59
SSE = 21.4 + 10 + 5.4 + 10.6 = 47.4
18-60
18-61
 Suppose we have the following dataset with one response
variable y and two predictor variables X1 and X2:
Next, make the following regression sum calculations:

Σx12 = ΣX12 – (ΣX1)2 / n = 38,767 – (555)2 / 8 = 263.875
Σx22 = ΣX22 – (ΣX2)2 / n = 2,823 – (145)2 / 8 = 194.875
Σx1y = ΣX1y – (ΣX1Σy) / n = 101,895 – (555*1,452) / 8 = 1,162.5
Σx2y = ΣX2y – (ΣX2Σy) / n = 25,364 – (145*1,452) / 8 = -953.5
Σx1x2 = ΣX1X2 – (ΣX1ΣX2) / n = 9,859 – (555*145) / 8 = -200.375
18-62
18-63
18-64
 a priori contrasts  K-independent-samples
 Alternative hypothesis tests
 Analysis of variance  K-related-samples tests
(ANOVA  Level of significance
 Bayesian statistics  Mean square
 Chi-square test  Multiple comparison
 Classical statistics tests (range tests)
 Critical value  Nonparametric tests
 F ratio  Normal probability plot
 Inferential statistics
17-65
Remember
 Null hypothesis  Region of acceptance

 Observed significance  Region of rejection
level  Statistical significance
 One-sample tests  t distribution
 One-tailed test  Trials
 p value  t-test
 Parametric tests  Two-independent-
 Power of the test samples tests
 Practical significance
17-66
Remember
 Two-related-samples  Type II error

tests  Z distribution
 Two-tailed test  Z test
 Type I error
17-67
Phi Chi-square based for 2*2 tables
CS based; adjustment when one table

Cramer’s V
dimension >2
CS based; flexible data and distribution
Contingency coefficient C
assumptions
Lambda PRE based interpretation
PRE based with table marginals

Goodman & Kruskal’s tau
emphasis
Uncertainty coefficient Useful for multidimensional tables
Kappa Agreement measure
18-68
Is there a relationship between X and Y?
What is the magnitude of the relationship?
What is the direction of the relationship?
18-69
18-70
18-71
18-72
X
X causes
causes Y
Y
Y
Y causes
causes X
X
X
X and
and YY are
are activated
activated by
by
one
one or
or more
more other
other variables
variables
X
X and
and YY influence
influence each
each
other
other reciprocally
reciprocally
18-73
18-74
A coefficient is not remarkable simply
because it is statistically significant!
It must be practically meaningful.
18-75
18-76
18-77
Y
X
Price per Case
Average Temperature (Celsius)
(FF)
12 2,000
16 3,000
20 4,000
24 5,000
Mean =18 Mean = 3,500
18-78
Y is completely unrelated to X
and no systematic pattern is evident
There are constant values of

Y for every value of X
The data are related but

represented by a nonlinear function
18-79
18-80
Total proportion of variance in Y explained by
X
Desired r2: 80% or more
18-81
18-82
18-83
18-84
 Artifact correlations  Phi
 Bivariate correlation  Coefficient of
analysis determination (r2)
 Bivariate normal  Concordant
distribution  Correlation matrix
 Chi-square-based  Discordant
measures  Error term
 Contingency coefficient  Goodness of fit
C  lambda
 Cramer’s V
18-85
• Linearity • Pearson correlation
• Method of least squares coefficient
• Ordinal measures • Prediction and confidence
• Gamma bands
• Somers’s d
• Proportional reduction in
error (PRE)
• Spearman’s rho
• Regression analysis
• tau b
• Regression coefficients
• tau c
18-86
• Intercept • Scatterplot
• Slope • Simple prediction
• Residual • tau
18-87
Dependency Interdependency
19-88
Multiple
Multiple Regression
Regression
Discriminant
Discriminant Analysis
Analysis
MANOVA
MANOVA
Structural
Structural Equation
Equation Modeling
Modeling (SEM)
(SEM)
Conjoint
Conjoint Analysis
Analysis
19-89
Develop Control Test
self-weighting for and
estimating confounding explain
equation to Variables causal
predict values theories
for a DV
19-90
19-91
19-92
Forward
Backward
Stepwise
19-93
Collinearity
Statistics
VIF
Choose one of the variables
1.000
and delete the other
2.289
Create a new variable
2.289
that is a composite of the others
2.748
3.025
3.067
19-94
A. Predicted Success
Number of
Actual Group Cases 0 1
Unsuccessful 0 15 13 2
86.70% 13.30%
Successful 1 15 3 12
20.00% 80.00%
Note: Percent of “grouped” cases correctly classified: 83.33%
B. Unstandardized Standardized
X1 .36084 .65927
X1 2.61192 .57958
X1 .53028 .97505
Constant 12.89685
19-95
19-96
19-97
Factor
Factor Analysis
Analysis
Cluster
Cluster Analysis
Analysis
Multidimensional
Multidimensional Scaling
Scaling
19-98
19-99
19-100
19-101
19-102
19-103
If X is sales and Y is profit in the business for
any X –sales values the profit can be estimated .
19-104
Select
Select sample
sample to
to cluster
cluster
Define
Define variables
variables
Compute
Compute similarities
similarities
Select
Select mutually
mutually exclusive
exclusive clusters
clusters
Compare
Compare and
and validate
validate cluster
cluster
19-105
 Average linkage method • Confirmatory factor
 Backward elimination analysis
 Beta weights • Conjoint analysis
 Centroid • Dependency techniques
 Cluster analysis • Discriminant analysis
 Collinearity • Dummy variable
 Communality • Eigenvalue
• Factor analysis
19-106
 Factors • Multidimensional
 Forward selection scaling (MDS)
 Holdout sample • Multiple regression
 Interdependency • Multivariate analysis
techniques • Multivaria analysis of
 Loadings variance (MANOVA)
 Metric measures • Nonmetric measures
 Multicollinearity • Path analysis
19-107
 Path diagram • Stepwise selection
 Principal components • Stress index
analysis • Structural equation
 Rotation modeling
 Specification error • Utility score
 Standardized
coefficients
19-108

D1UA401B Research Methodology-UNIT-4 Pazhanisamy-BBA IV Semester Section19

Uploaded by

Copyright:

Available Formats

D1UA401B Research Methodology-UNIT-4 Pazhanisamy-BBA IV Semester Section19

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

D1UA401B Research Methodology-UNIT-4 Pazhanisamy-BBA IV Semester Section19

Uploaded by

Copyright:

Available Formats

Data Processing, Univariate and Bivariate Analysis of

Data, Hypothesis Testing, Analysis of Variance

Prepared by : R.Pazhanisamy, Assistant Professor of Economics,SOB,GU.

Univariate analysis focuses on analyzing a single variable at

Correlation analysis quantifies the strength and direction of the relationship

One continuous and one dichotomous

Three variables; relating two with the

Three variables; relating one variable

Predicting one variable from another’s

If two or more samples:

Is the measurement scale

Measureme One-Sample Related Independent Related Independent

Total 60 200 100 60

Statistical test One-sample chi-square

Significance level .05

Calculated value 9.89

Critical test value 7.82

All data are hypothetical. 17-57

Next, make the following regression sum calculations:

 Null hypothesis  Region of acceptance

 Two-related-samples  Type II error

CS based; adjustment when one table

Lambda PRE based interpretation

PRE based with table marginals

Kappa Agreement measure

What is the magnitude of the relationship?

What is the direction of the relationship?

Mean =18 Mean = 3,500

There are constant values of

The data are related but

You might also like