D1UA401B Research Methodology-UNIT-4 Pazhanisamy-BBA IV Semester Section19

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 108

Data Processing, Univariate and Bivariate Analysis of

Data, Hypothesis Testing, Analysis of Variance


Techniques, (Z-Test, t-Test, F-Test), Non-Parametric
Tests: Chi-square test.
Advanced Data Analysis Techniques: Correlation and
Regression Analysis,-Simple and multiple Quantitative
estimate of Linear Correlation, Testing the significance of
Correlation Coefficient, Test of significance of
Regression parameters, Uses of Regression Analysis.

Prepared by : R.Pazhanisamy, Assistant Professor of Economics,SOB,GU.


Understand . . .
 How to do univariate analysis for business
 Understandings How predictions are made in
business using the univariate and bi variate
analysis

18-2
Understand . . .
 How to test regression models for linearity
and whether the equation is effective in
fitting the data.
 Nonparametric measures of association and
the alternatives they offer when key
assumptions and requirements for parametric
techniques cannot be met.

18-3
UNIVARIATE ANALYSIS

Univariate analysis focuses on analyzing a single variable at


a time. It involves summarizing and describing the
characteristics and distribution of a single variable without
considering relationships with other variables.
Common techniques used in univariate analysis include
measures of central tendency (e.g., mean, median, mode),
measures of dispersion (e.g., variance, standard deviation),
graphical representations (e.g., histograms, box plots, bar
charts), and frequency distributions.
Univariate analysis provides insights into the distribution,
variability, and patterns within individual variables,
allowing researchers to understand their properties and
identify outliers, trends, or abnormalities.
18-4
BIVARIATE ANALYSIS
Bivariate analysis examines the relationship between two variables
simultaneously. It involves exploring how changes in one variable are associated
with changes in another variable. Common techniques used in vicariate analysis
include correlation analysis, scatter plots, contingency tables, and cross-
tabulations.

Correlation analysis quantifies the strength and direction of the relationship


between two continuous variables using correlation coefficients (e.g., Pearson
correlation coefficient, Spearman rank correlation coefficient). Scatter plots
visually represent the relationship between two continuous variables.
For categorical variables, contingency tables and cross-tabulations display the
frequency distribution of one variable across different categories of another
variable, allowing for comparisons and assessment of associations.
Bivariate analysis helps identify patterns, associations, and dependencies
between variables, facilitating hypothesis testing, prediction, and understanding
of causal relationships.

18-5
For continuous linearly related
Pearson correlation coefficient
variables
For nonlinear data or relating a main
Correlation ratio (eta) effect to a continuous dependent
variable

One continuous and one dichotomous


Biserial variable with an underlying normal
distribution

Three variables; relating two with the


Partial correlation
third’s effect taken out

Three variables; relating one variable


Multiple correlation
with two others

Predicting one variable from another’s


Bivariate linear regression
scores
18-6
Inductive Deductive
Reasoning Reasoning

17-7
Inferential Descriptive
Statistics Statistics

17-8
17-9
As Abacus states in
this ad, when
researchers ‘sift
through the chaos’ and
‘find what matters’ they
experience the “ah ha!”
moment.

17-10
Classical statistics Bayesian statistics
Objective view of Extension of classical
probability approach
Established hypothesis Analysis based on
is rejected or fails to sample data
be rejected Also considers
Analysis based on established subjective
sample data probability estimates

17-11
Null
 H0:  = 50 mpg
 H0:  < 50 mpg
 H0:  > 50 mpg
Alternate
 HA:  = 50 mpg
 HA:  > 50 mpg
 HA:  < 50 mpg

17-12
17-13
17-14
17-15
17-16
17-17
17-18
17-19
True
True value
value of
of parameter
parameter

Alpha
Alpha level
level selected
selected

One
One or
or two-tailed
two-tailed test
test used
used

Sample
Sample standard
standard deviation
deviation

Sample
Sample size
size

17-20
17-21
State
Statenull
null
hypothesis
hypothesis

Interpret
Interpret the
the Choose
Choose
test
test statistical
statistical test
test
Stages
Stages

Obtain
Obtain critical
critical Select
Select level
levelof
of
test
test value
value significance
significance
Compute
Compute
difference
difference
value
value
17-22
Parametric Nonparametric

17-23
Independent
Independent observations
observations

Normal
Normal distribution
distribution

Equal
Equal variances
variances

Interval
Interval or
or ratio
ratio scales
scales

17-24
17-25
17-26
17-27
Easy
Easy to
to understand
understand and
and use
use

Usable
Usable with
with nominal
nominal data
data

Appropriate
Appropriate for
for ordinal
ordinal data
data

Appropriate
Appropriate for
for non-normal
non-normal
population
population distributions
distributions

17-28
How many samples are involved?

If two or more samples:


are the individual cases independent or related?

Is the measurement scale


nominal, ordinal, interval, or ratio?

17-29
k-Sample Tests
Two-Sample Tests
_______________________________________ _______________________________________
_____ _____

Measureme One-Sample Related Independent Related Independent


nt Scale Case Samples Samples Samples Samples
Nominal • Binomial • McNemar • Fisher exact • Cochran Q • x2 for k
• x2 one-sample test samples
test • x2 two-
samples test
Ordinal • Kolmogorov- • Sign test • Median test • Friedman • Median
Smirnov one- two-way extension
sample test • Wilcoxon • Mann- ANOVA • Kruskal-
• Runs test matched- Whitney U Wallis one-
pairs test • Kolmogorov way ANOVA
-Smirnov
• Wald-
Wolfowitz
Interval and • t-test • t-test for • t-test • Repeated- • One-way
Ratio paired measures ANOVA
• Z test samples • Z test ANOVA • n-way
ANOVA

17-30
 Is there a difference between observed
frequencies and the frequencies we would
expect?
 Is there a difference between observed and
expected proportions?
 Is there a significant difference between some
measures of central tendency and the
population parameter?

17-31
Z-test t-test

17-32
Null Ho: = 50 mpg
Statistical test t-test
Significance level .05, n=100
Calculated value 1.786
Critical test value 1.66
(from Appendix C,
Exhibit C-2)

17-33
18-34
18-35
18-36
18-37
18-38
Z-test and Chi-square test are statistical tests used to compare
groups or test hypotheses. Z-test is used when the sample size is
large and the population standard deviation is known, and is
used to test hypotheses about the mean of a normal
population2. It is used to compare two groups by comparing
their population proportions.
Chi-square test is used when the sample size is small, and is
used to test hypotheses about the distribution of a categorical
variable. It can be used to compare the difference in population
proportions between two or more groups, or to compare one
group to a value1. A chi-square test for equality of two
proportions is exactly the same thing as a z-test.

18-39
Expected
Intend Number Percent Frequencies
Living Arrangement to Join Interviewed (no. interviewed/200) (percent x 60)

Dorm/fraternity 16 90 45 27

Apartment/rooming
13 40 20 12
house, nearby

Apartment/rooming
16 40 20 12
house, distant

Live at home 15 30 15 9
_____ _____ _____ _____

Total 60 200 100 60

17-40
Null Ho: 0 = E

Statistical test One-sample chi-square

Significance level .05

Calculated value 9.89

Critical test value 7.82


(from Appendix C, Exhibit C-3)
Reject H0

17-41
18-42
There was no significant relationship between handedness
and nationality, Χ2 (1, N = 428) = 0.44, p = .505.

18-43
 Use a Z test when you need to compare group means.
Use the 1-sample analysis to determine whether
a population mean is different from a hypothesized
value. Or use the 2-sample version to determine
whether two population means differ.
 A Z test is a form of inferential statistics. It uses
samples to draw conclusions about populations.
For example:
 One sample: Do employees training program have an
average IQ score different than a hypothesized value of
100?
 Two sample: Do two IQ boosting programs have
different mean scores for employees ?
(it require the population standard deviation)
18-44
 This analysis uses sample data to evaluate hypotheses that refer to
population means (µ). The hypotheses depend on whether you’re
assessing one or two samples.
 One-Sample Z Test Hypotheses
 Null hypothesis (H0): The population mean equals a hypothesized
value (µ = µ0).
 Alternative hypothesis : (HA): The population mean DOES NOT
equal a hypothesized value (µ ≠ µ0).
 When the p-value is less or equal to your significance level (e.g.,
0.05), reject the null hypothesis. Your sample data support the
notion that the population mean does not equal the hypothesized
value.

18-45
 Two-Sample Z Test Hypotheses
 Null hypothesis (H0): Two population means are
equal (µ1 = µ2).
 Alternative hypothesis (HA): Two population means
are not equal (µ1 ≠ µ2).
 Again, when the p-value is less than or equal to
your significance level, reject the null hypothesis.
The difference between the two means is
statistically significant. Your sample data support
the idea that the two population means are
different.
18-46
17-47
 Example 1sample T test : (known sd)
 A farming company wants to know if a new fertilizer
has improved crop yield or not.
 Historic data shows the average yield of the farm is 20
tonne per acre. They decide to test a new organic
fertilizer on a smaller sample of farms and observe the
new yield is 20.175 tonne per acre with a standard
deviation of 3.02 tonne for 12 different farms.
 Did the new fertilizer work?

18-48
18-49
 Suppose the IQ levels among individuals in two different cities are
known to be normally distributed each with population standard
deviations of 15.
 A scientist wants to know if the mean IQ level between individuals
in city A and city B are different, so she selects a simple random
sample of 20 individuals from each city and records their IQ
levels.
 To test this, she will perform a two sample z-test at significance
level α = 0.05 using the following steps:
 x1 (sample 1 mean IQ) = 100.65
 n1 (sample 1 size) = 20
 x2 (sample 2 mean IQ) = 108.8
 n2 (sample 2 size) = 20-
 (Since the p-value (0.0858) is not less than the significance level
(.05), the scientist will fail to reject the null hypothesis.)
18-50
18-51
17-52
T-Test: Formula and solved examples (collegedunia.com)

18-53
Null Ho: A sales = B sales
Statistical test t-test
Significance level .05 (one-tailed)
Calculated value 1.97, d.f. = 20
Critical test value 1.725
(from Appendix C, Exhibit C-2)

17-54
 F Test is usually used as a generalized Statement for
comparing two variances( Ho cannot be rejected)

18-55
18-56
 The purpose of a one-way ANOVA test is to
determine the existence of a statistically significant
difference among several group means.

All data are hypothetical. 17-57


 Suppose we want to know whether or not three different exam
prep programs lead to different mean scores on a certain exam.
To test this, we recruit 30 students to participate in a study and
split them into three groups.
 The students in each group are randomly assigned to use one of
the three exam prep programs for the next three weeks to
prepare for an exam. At the end of the three weeks, all of the
students take the same exam.

 From the output table we see that the F test statistic is 2.358 and
the corresponding p-value is 0.11385.
 Since this p-value is not less than 0.05, we fail to reject the null
hypothesis.
 This means we don’t have sufficient evidence to say that there is
a statistically significant difference between the mean exam
scores of the three groups .

18-58
18-59
SSE = 21.4 + 10 + 5.4 + 10.6 = 47.4

18-60
18-61
 Suppose we have the following dataset with one response
variable y and two predictor variables X1 and X2:

Next, make the following regression sum calculations:


Σx12 = ΣX12 – (ΣX1)2 / n = 38,767 – (555)2 / 8 = 263.875
Σx22 = ΣX22 – (ΣX2)2 / n = 2,823 – (145)2 / 8 = 194.875
Σx1y = ΣX1y – (ΣX1Σy) / n = 101,895 – (555*1,452) / 8 = 1,162.5
Σx2y = ΣX2y – (ΣX2Σy) / n = 25,364 – (145*1,452) / 8 = -953.5
Σx1x2 = ΣX1X2 – (ΣX1ΣX2) / n = 9,859 – (555*145) / 8 = -200.375
18-62
18-63
18-64
 a priori contrasts  K-independent-samples
 Alternative hypothesis tests
 Analysis of variance  K-related-samples tests
(ANOVA  Level of significance
 Bayesian statistics  Mean square
 Chi-square test  Multiple comparison
 Classical statistics tests (range tests)
 Critical value  Nonparametric tests
 F ratio  Normal probability plot
 Inferential statistics
17-65
Remember

 Null hypothesis  Region of acceptance


 Observed significance  Region of rejection
level  Statistical significance
 One-sample tests  t distribution
 One-tailed test  Trials
 p value  t-test
 Parametric tests  Two-independent-
 Power of the test samples tests
 Practical significance
17-66
Remember

 Two-related-samples  Type II error


tests  Z distribution
 Two-tailed test  Z test
 Type I error

17-67
Phi Chi-square based for 2*2 tables

CS based; adjustment when one table


Cramer’s V
dimension >2
CS based; flexible data and distribution
Contingency coefficient C
assumptions

Lambda PRE based interpretation

PRE based with table marginals


Goodman & Kruskal’s tau
emphasis
Uncertainty coefficient Useful for multidimensional tables

Kappa Agreement measure

18-68
Is there a relationship between X and Y?

What is the magnitude of the relationship?

What is the direction of the relationship?

18-69
18-70
18-71
18-72
X
X causes
causes Y
Y

Y
Y causes
causes X
X

X
X and
and YY are
are activated
activated by
by
one
one or
or more
more other
other variables
variables

X
X and
and YY influence
influence each
each
other
other reciprocally
reciprocally
18-73
18-74
A coefficient is not remarkable simply
because it is statistically significant!
It must be practically meaningful.

18-75
18-76
18-77
Y
X
Price per Case
Average Temperature (Celsius)
(FF)
12 2,000

16 3,000

20 4,000

24 5,000

Mean =18 Mean = 3,500

18-78
Y is completely unrelated to X
and no systematic pattern is evident

There are constant values of


Y for every value of X

The data are related but


represented by a nonlinear function

18-79
18-80
Total proportion of variance in Y explained by
X
Desired r2: 80% or more

18-81
18-82
18-83
18-84
 Artifact correlations  Phi
 Bivariate correlation  Coefficient of
analysis determination (r2)
 Bivariate normal  Concordant
distribution  Correlation matrix
 Chi-square-based  Discordant
measures  Error term
 Contingency coefficient  Goodness of fit
C  lambda
 Cramer’s V
18-85
• Linearity • Pearson correlation
• Method of least squares coefficient
• Ordinal measures • Prediction and confidence
• Gamma bands
• Somers’s d
• Proportional reduction in
error (PRE)
• Spearman’s rho
• Regression analysis
• tau b
• Regression coefficients
• tau c

18-86
• Intercept • Scatterplot
• Slope • Simple prediction
• Residual • tau

18-87
Dependency Interdependency

19-88
Multiple
Multiple Regression
Regression

Discriminant
Discriminant Analysis
Analysis

MANOVA
MANOVA

Structural
Structural Equation
Equation Modeling
Modeling (SEM)
(SEM)

Conjoint
Conjoint Analysis
Analysis
19-89
Develop Control Test
self-weighting for and
estimating confounding explain
equation to Variables causal
predict values theories
for a DV

19-90
19-91
19-92
Forward

Backward

Stepwise

19-93
Collinearity
Statistics
VIF
Choose one of the variables
1.000
and delete the other

2.289
Create a new variable
2.289
that is a composite of the others

2.748
3.025
3.067
19-94
A. Predicted Success
Number of
Actual Group Cases 0 1
Unsuccessful 0 15 13 2
86.70% 13.30%
Successful 1 15 3 12
20.00% 80.00%
Note: Percent of “grouped” cases correctly classified: 83.33%

B. Unstandardized Standardized
X1 .36084 .65927
X1 2.61192 .57958
X1 .53028 .97505
Constant 12.89685

19-95
19-96
19-97
Factor
Factor Analysis
Analysis

Cluster
Cluster Analysis
Analysis

Multidimensional
Multidimensional Scaling
Scaling

19-98
19-99
19-100
19-101
19-102
19-103
If X is sales and Y is profit in the business for
any X –sales values the profit can be estimated .
19-104
Select
Select sample
sample to
to cluster
cluster

Define
Define variables
variables

Compute
Compute similarities
similarities

Select
Select mutually
mutually exclusive
exclusive clusters
clusters

Compare
Compare and
and validate
validate cluster
cluster
19-105
 Average linkage method • Confirmatory factor
 Backward elimination analysis
 Beta weights • Conjoint analysis
 Centroid • Dependency techniques
 Cluster analysis • Discriminant analysis
 Collinearity • Dummy variable
 Communality • Eigenvalue
• Factor analysis

19-106
 Factors • Multidimensional
 Forward selection scaling (MDS)
 Holdout sample • Multiple regression
 Interdependency • Multivariate analysis
techniques • Multivaria analysis of
 Loadings variance (MANOVA)
 Metric measures • Nonmetric measures
 Multicollinearity • Path analysis

19-107
 Path diagram • Stepwise selection
 Principal components • Stress index
analysis • Structural equation
 Rotation modeling
 Specification error • Utility score
 Standardized
coefficients

19-108

You might also like