Multivariate Analysis

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 34

MULTIVARIATE ANALYSIS

Classification of MV techniques
• Dependence methods

• Interdependence methods
Classification of MV techniques
Are some of the
variables
Dependent
on others?

Yes No

Interdependence
Dependence methods
methods
What are dependence methods?
• If a multivariate technique attempts to

explain or predict the dependent variables

on the basis of 2 or more independent

then we are analyzing dependence


Techniques
• Multiple regression analysis

• Multiple discriminant analysis

• Multivariate analysis of variance

• Canonical correlation analysis are all


dependence methods.
Analysis of Interdependence
• The goal of interdependence methods is
to
give meaning to a set of variables or to
seek to group things together.

No one variable or variable subset is to be


predicted from the others or explained by
them
Some methods
• The most common of these methods are
factor analysis, cluster analysis and
multidimensional scaling. A manager
might utilize these techniques to identify
profitable market segments or clusters.
Can be used for classification of similar
cities on the basis of population size,
income distribution etc;
• As in other forms of data analysis, the
nature of measurement scales will
determine which MV technique is
appropriate for the data
Classification of dependence
Methods
How many
variables
are dependent?

Multiple dependent
One dependent Several dependent and
variable variables independent
variables
One dependent
variable

Metric Non-metric

Multiple discriminant
Multiple regression
analysis
Several dependent
variables

Metric Non-Metric

MANOVA Conjoint
Multiple dependent and
independent variables

Canonical Analysis
Analysis of dependence-Multiple
regression
• Extension of bivariate regression analysis

• Allows for the simultaneous investigation


of the effect of two or more independent
variables on a single interval-scaled
dependent variable.

• In reality several factors are likely to affect


such a dependent variable.
The model
• An example of a multiple regression equation
is
• Y = a + B1X1 + B2X2 + B3X3+ …………..BnXn
+e
• Where B0= a constant, the value of Y when
all X values =0
• Bi= slope of the regression surface, B
represents the regression coefficient
associated with each X
• E= an error term, normally distributed about
a mean of 0
Example
• Y = 102.18 + 387X1 + 115.2X2 + 6.73X3

• R2 = 0.845

• F value 14.6
Regression coefficients
• The regression coefficient can either be

stated in raw score units (actual X values)

or as standardized coefficients values in

terms of their standard deviation.


Interpretation
• When regression coefficient are standardized
they are called as beta weights B an their
values indicate the relative importance of the
associated X values especially when
predictors are unrelated.

• If B1= .60 and B2 = .20 then X1 has three


times the influence on Y as X2
• In multiple regression the coefficients B1
and B2 etc are called coefficients of partial
regression because the independent
variables are correlated with other
independent variables
• The coefficient of multiple determination indicates
the percentage of variation in Y explained by the
variation in the independent variables.

• R2 = .845 tells us that the variation in the


independent accounted for 84.5% of the variance in
the dependent variable.

• Adding more of the independent variables in the


equation explains more of the variation in Y.
• To test for statistical significance an F-test
comparing the different sources of
variation is necessary. The F test allows
for testing the relative magnitudes of the
sum of squares due to the regression
(SSe) and the error sum of squares (SSr)
with their appropriate degrees of freedom
• A continuous interval-scaled dependent
variable is required in multiple regression as
in bivariate regression
• Interval scaling is also required for the
independent variables.
• However dummy variables such as the binary
variable in our example may be utilized.
• A dummy variable is one that has two or
more distinct levels 0 and 1
Uses of Multiple regression
• It is often used to develop a self-
weighting estimating equation by

which to predict values for a criterion

variable (DV) from the values of several

predictor variables (IV)


Uses of Multiple regression
A descriptive application of multiple reg

calls for controlling for confounding variables

to better evaluate the contribution of other

variables- control brand and study effect of

price alone
Uses of Multiple regression
• To test and explain casual theories-
referred to as Path analysis-reg is used to
describe an entire structure of linkages that
have advanced from casual theories

• Used as an inference tool to test


hypotheses and estimate population
values
How can equations be built?
• Regression eqn can be built with
• specific variables

• all variables

• select a method that sequentially adds or


removes variables.
Methods in adding and removal of
methods
1. Forward selection starts with the
constant and variables that result in large
R2 increases.
2. Backward elimination begins with a
model containing all independent var and
removes var that changes R2 the least.
Stepwise selection
Most popular, combines the two.
• The independent var that contributes the most
to explaining the dependent var is added first.
• Subsequent var are added based on their
incremental contribution over the first var
whenever they meet the criterion of entering the
Eqn (eg a level of sig of .01. var may be removed
at each step if they meet the removal criterion
which is larger sig level than for entry
Collinearity and Multicollinearity
• Is a situation where two or more of the
independent variables are highly correlated
and this can have a damaging effect on the
multiple regression.

• When this condition exists, the estimated


regression coeff can fluctuate widely from
sample to sample making it risky to interpret
the coeff as an as an important indicator of
predictor var
Just how high can acceptable
correlations be between indep var?
• There is no definitive answer, but cor at .
80 or> should be dealt with in one of the
following two ways

• 1. Choose one of the var and delete the


other
Just how high can acceptable
correlations be between indep var?
2. Create a new var that is composite of the
highly inter-correlated variables use this
var in place of its components. Making
this decision with a corr matrix alone is not
sufficient.
Just how high can acceptable
correlations be between indep var?
The exhibit shows a VIF index. This is a
measure of the effect of other indep var
on a reg coeff.
Large values of 10 or more suggests
Collinearity or Multicollinearity. With only
3 predictors his is not a problem.
Just how high can acceptable
correlations be between indep var?
1. The exhibit shows a VIF index. This is a
measure of the effect of other indep var
on a reg coeff. Large values of 10 or
more suggests Collinearity or
Multicollinearity. With only 3 predictors
his is not a problem.
Difficulties in regression
Another difficulty with reg occurs when
researchers fail to evaluate the eqn with
data beyond those used originally to
calculate it.
A solution would be to set aside a
portion of the data and use only the
remainder to estimate the eqn. This is
called a hold out eg.
Difficulties in regression
One then uses the eqn on the holdout data
to calculate R2. This can then be
compared to the original R2 to determine
how well the eqn predicts beyond the
database.

You might also like