Research+Methodology+ +Multivariate+Analysis

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 13

QUANTITATIVE AND RESEARCH METHODS IN BUSINESS

STUDY NOTES RESEARCH METHODOLOGY Dr. O S SARAVANAN M.Com., M.Sc., M.B.A., Ph.D, I.C.W.A.(I),

SYNOPSIS 1. Multivariate Analysis 2. Factor Analysis 3. Multiple Regression Analysis 4. Discriminant Analysis 5. Cluster Analysis 6. Conjoint Analysis 7. Statistical Packages MULTIVARIATE ANALYSIS
Multivariate analysis (MVA) is based on the statistical principle of multivariate statistics, which involves observation and analysis of more than one statistical variable at a time. In design and analysis, the technique is used to perform trade studies across multiple dimensions while taking into account the effects of all variables on the responses of interest.

Uses for multivariate analysis include:


1. Consumer and market research 2. Quality control and quality assurance across a range of industries such as food and beverage, paint, pharmaceuticals, chemicals, energy, telecommunications, etc 3. Process optimization and process control 4. Research and development 5. Design for capability (also known as capability-based design)
6. Inverse design, where any variable can be treated as an independent variable 7. Analysis of Alternatives (AoA), the selection of concepts to fulfill a customer need

8. Analysis of concepts with respect to changing scenarios 9. Identification of critical design drivers and correlations across hierarchical levels.

Process of multivariate analysis

Obtain a summary or an overview of a table. This analysis is often called Principal Components Analysis or Factor Analysis. In the overview, it is possible to identify the dominant patterns in the data, such as groups, outliers, trends, and so on. The patterns are displayed as two plots

Analyze groups in the table, how these groups differ, and to which group individual table rows belong. This type of analysis is called Classification and Discriminant Analysis Find relationships between columns in data tables, for instance relationships between process operation conditions and product quality. The objective is to use one set of variables (columns) to predict another, for the purpose of optimization, and to find out which columns are important in the relationship. The corresponding analysis is called Multiple Regression Analysis or Partial Least Squares (PLS), depending on the size of the data table

******* FACTOR ANALYSIS


Factor analysis is a statistical method used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved, uncorrelated variables called factors. In other words, it is possible, for example, that variations in three or four observed variables mainly reflect the variations in fewer such unobserved variables. Factor analysis searches for such joint variations in response to unobserved latent variables. The observed variables are modeled as linear combinations of the potential factors, plus "error" terms. The information gained about the interdependencies between observed variables can be used later to reduce the set of variables in a dataset. Factor analysis originated in psychometrics, and is used in behavioral sciences, social sciences, marketing, product management, operations research, and other applied sciences that deal with large quantities of data. Factor analysis is related to principal component analysis (PCA), but the two are not identical. Latent variable models, including factor analysis, use regression modelling techniques to test hypotheses producing error terms, while PCA is a descriptive statistical technique

Type of factor analysis


Exploratory factor analysis (EFA) is used to uncover the underlying structure of a relatively large set of variables. The researcher's a priori assumption is that any indicator may be associated

with any factor. This is the most common form of factor analysis. There is no prior theory and one uses factor loadings to intuit the factor structure of the data. Confirmatory factor analysis (CFA) seeks to determine if the number of factors and the loadings of measured (indicator) variables on them conform to what is expected on the basis of preestablished theory. Indicator variables are selected on the basis of prior theory and factor analysis is used to see if they load as predicted on the expected number of factors. The researcher's a priori assumption is that each factor (the number and labels of which may be specified a priori) is associated with a specified subset of indicator variables. A minimum requirement of confirmatory factor analysis is that one hypothesizes beforehand the number of factors in the model, but usually also the researcher will posit expectations about which variables will load on which factors. The researcher seeks to determine, for instance, if measures created to represent a latent variable really belong together.

Factor analysis in marketing The basic steps are:


Identify the salient attributes consumers use to evaluate products in this category. Use quantitative marketing research techniques (such as surveys) to collect data from a Input the data into a statistical program and run the factor analysis procedure. The Use these factors to construct perceptual maps and other product positioning devices.

sample of potential customers concerning their ratings of all the product attributes.

computer will yield a set of underlying attributes (or factors).

Advantages of Factor Analysis

Both objective and subjective attributes can be used provided the subjective attributes can Factor Analysis can be used to identify hidden dimensions or constructs which may not be It is easy and inexpensive to do

be converted into scores

apparent from direct analysis

Disadvantages of Factor Analysis

Usefulness depends on the researchers' ability to collect a sufficient set of product

attributes. If important attributes are missed the value of the procedure is reduced.

If sets of observed variables are highly similar to each other and distinct from other items,

factor analysis will assign a single factor to them. This may make it harder to identify factors that capture more interesting relationships.

Naming the factors may require background knowledge or theory because multiple

attributes can be highly correlated for no apparent reason.

******* MULTIPLE REGRESSION ANALYSIS


WHAT IS MULTIPLE REGRESSION? Multiple regression is a statistical technique that allows us to predict someones score on one variable on the basis of their scores on several other variables. An example might help. Suppose we were interested in predicting how much an individual enjoys their job. Variables such as salary, extent of academic qualifications, age, sex, number of years in full-time employment and socioeconomic status might all contribute towards job satisfaction. If we collected data on all of these variables, perhaps by surveying a few hundred members of the public, we would be able to see how many and which of these variables gave rise to the most accurate prediction of job satisfaction. We might find that job satisfaction is most accurately predicted by type of occupation, salary and years in full-time employment, with the other variables not helping us to predict job satisfaction. When using multiple regression in psychology, many researchers use the term independent variables to identify those variables that they think will influence some other dependent variable. We prefer to use the term predictor variables for those variables that may be useful in predicting the scores on another variable that we call the criterion variable. Thus, in our example above, type of occupation, salary and years in full-time employment would emerge as significant predictor variables, which allow us to estimate the criterion variable how satisfied someone is likely to be with their job. As we have pointed out before, human behaviour is inherently noisy and therefore it is not possible to produce totally accurate predictions, but multiple regression allows us to identify a set of predictor variables which together provide a useful estimate of a participants likely score on a criterion variable. In simple words Multiple regression analysis is a powerful technique used for predicting the unknown value of a variable from the known value of two or more variables- also called the predictors.

More precisely, multiple regression analysis helps us to predict the value of Y for given values of X1, X2, , Xk. For example the yield of rice per acre depends upon quality of seed, fertility of soil, fertilizer used, temperature, rainfall. If one is interested to study the joint affect of all these variables on rice yield, one can use this technique. An additional advantage of this technique is it also enables us to study the individual influence of these variables on yield. THE MULTIPLE REGRESSION MODEL In general, the multiple regression equation of Y on X1, X2, , Xk is given by: Y = b0 + b1 X1 + b2 X2 + + bk Xk USAGE Multiple regression analysis is used when one is interested in predicting a continuous dependent variable from a number of independent variables. If dependent variable is dichotomous, then logistic ASSUMPTIONS Multiple regression technique does not test whether data are linear. On the contrary, it proceeds by assuming that the relationship between the Y and each of Xis is linear. Hence as a rule, it is prudent to always look at the scatter plots of (Y, Xi), i= 1, 2,,k. If any plot suggests non linearity, one may use a suitable transformation to attain linearity. Another important assumption is non existence of multicollinearity- the independent variables are not related among themselves. At a very basic level, this can be tested by computing the correlation coefficient between each pair of independent variables. Other assumptions include those of homoscedasticity and normality. regression should be used.

******* DISCRIMINANT ANALYSIS


1. Discriminant analysis is a statistical method that is used by researchers to help them understand the relationship between a "dependent variable" and one or more "independent variables." A dependent variable is the variable that a researcher is trying to explain or predict from the values of the independent variables. Discriminant analysis is similar to regression analysis and analysis of variance

(ANOVA). The principal difference between discriminant analysis and the other two methods is with regard to the nature of the dependent variable. 2. Discriminant analysis requires the researcher to have measures of the dependent variable and all of the independent variables for a large number of cases. In regression analysis and ANOVA, the dependent variable must be a "continuous variable." A numeric variable indicates the degree to which a subject possesses some characteristic, so that the higher the value of the variable, the greater the level of the characteristic. A good example of a continuous variable is a person's income. 3. In discriminant analysis, the dependent variable must be a "categorical variable." The values of a categorical variable serve only to name groups and do not necessarily indicate the degree to which some characteristic is present. An example of a categorical variable is a measure indicating to which one of several different market segments a customer belongs; another example is a measure indicating whether or not a particular employee is a "high potential" worker. The categories must be mutually exclusive; that is, a subject can belong to one and only one of the groups indicated by the categorical variable. While a categorical variable must have at least two values (as in the "high potential" case), it may have numerous values (as in the case of the market segmentation measure). As the mathematical methods used in discriminant analysis are complex, they are described here only in general terms. We will do this by providing an example of a simple case in which the dependent variable has only two categories. 4. Discriminant analysis is most often used to help a researcher predict the group or category to which a subject belongs. For example, when individuals are interviewed for a job, managers will not know for sure how job candidates will perform on the job if hired. Suppose, however, that a human resource manager has a list of current employees who have been classified into two groups: "high performers" and "low performers." These individuals have been working for the company for some time, have been evaluated by their supervisors, and are known to fall into one of these two mutually exclusive categories. The manager also has information on the employees' backgrounds: educational attainment, prior work experience, participation in training programs, work attitude measures, personality characteristics, and so forth. This information was known at the time these employees were hired. The manager

wants to be able to predict, with some confidence, which future job candidates are high performers and which are not. A researcher or consultant can use discriminant analysis, along with existing data, to help in this task. 5. There are two basic steps in discriminant analysis. The first involves estimating coefficients, or weighting factors, that can be applied to the known characteristics of job candidates (i.e., the independent variables) to calculate some measure of their tendency or propensity to become high performers. This measure is called a "discriminant function." Second, this information can then be used to develop a decision rule that specifies some cut-off value for predicting which job candidates are likely to become high performers. 6. The equation is quite similar to a regression equation.

******* CLUSTER ANALYSIS


Cluster analysis or clustering is the task of assigning a set of objects into groups (called clusters) so that the objects in the same cluster are more similar (in some sense or another) to each other than to those in other clusters. Cluster analysis is a collection of statistical methods, which identifies groups of samples that behave similarly or show similar characteristics. In common parlance it is also called look-a-like groups. The simplest mechanism is to partition the samples using measurements that capture similarity or distance between samples. In this way, clusters and groups are interchangeable words. Often in market research studies, cluster analysis is also referred to as a segmentation method. In neural network concepts, clustering method is called unsupervised learning (refers to discovery as against prediction even discovery in loose sense may be called prediction, but it does not have predefined learning sets to validate the knowledge). Typically in clustering methods, all the samples with in a cluster is considered to be equally belonging to the cluster (as against belonging with certain probability). If each observation has its unique probability of belonging to a group (cluster) and the application is interested more about these probabilities than we have to use (binomial) multinomial models. Cluster sampling refers to a sampling method that has the following properties.

The population is divided into N groups, called clusters. The researcher randomly selects n clusters to include in the sample.

The number of observations within each cluster Mi is known, and M = M1 + M2 + M3 + ... + MN-1 + MN. Each element of the population can be assigned to one, and only one, cluster. One-stage sampling. All of the elements within selected clusters are included in the sample. Two-stage sampling. A subset of elements within selected clusters are randomly selected for inclusion in the sample.

This covers two types of cluster sampling methods.


Cluster Sampling: Advantages and Disadvantages Assuming the sample size is constant across sampling methods, cluster sampling generally provides less precision than either simple random sampling or stratified sampling. This is the main disadvantage of cluster sampling. Given this disadvantage, it is natural to ask: Why use cluster sampling? Sometimes, the cost per sample point is less for cluster sampling than for other sampling methods. Given a fixed budget, the researcher may be able to use a bigger sample with cluster sampling than with the other methods. When the increased sample size is sufficient to offset the loss in precision, cluster sampling may be the best choice. When to Use Cluster Sampling Cluster sampling should be used only when it is economically justified - when reduced costs can be used to overcome losses in precision. This is most likely to occur in the following situations.

Constructing a complete list of population elements is difficult, costly, or impossible. For example, it may not be possible to list all of the customers of a chain of hardware stores. However, it would be possible to randomly select a subset of stores (stage 1 of cluster sampling) and then interview a random sample of customers who visit those stores (stage 2 of cluster sampling).

The population is concentrated in "natural" clusters (city blocks, schools, hospitals, etc.). For example, to conduct personal interviews of operating room nurses, it might make sense to randomly select a sample of hospitals (stage 1 of cluster sampling) and then interview all of the operating room nurses at that hospital. Using cluster sampling, the interviewer could conduct many interviews in a single day at a single hospital. Simple random sampling, in contrast, might require the interviewer to spend all day traveling to conduct a single interview at a single hospital.

Even when the above situations exist, it is often unclear which sampling method should be used. Test different options, using hypothetical data if necessary. Choose the most cost-effective approach; that is, choose the sampling method that delivers the greatest precision for the least cost.

******* CONJOINT ANALYSIS


Conjoint analysis is a statistical technique used in market research to determine how people value different features that make up an individual product or service. The objective of conjoint analysis is to determine what combination of a limited number of attributes is most influential on respondent choice or decision making. A controlled set of potential products or services is shown to respondents and by analyzing how they make preferences between these products, the implicit valuation of the individual elements making up the product or service can be determined. These implicit valuations (utilities or part-worths) can be used to create market models that estimate market share, revenue and even profitability of new designs. Conjoint originated in mathematical psychology and was developed by marketing professor Paul Green at the University of Pennsylvania and Data Chan. Other prominent conjoint analysis pioneers include professor V. Seenu Srinivasan of Stanford University who developed a linear programming (LINMAP) procedure for rank ordered data as well as a self-explicated approach, Richard Johnson (founder of Sawtooth Software) who developed the Adaptive Conjoint Analysis technique in the 1980s and Jordan Louviere (University of Iowa) who invented and developed Choice-based approaches to conjoint analysis and related techniques such as MaxDiff. Today it is used in many of the social sciences and applied sciences including marketing, product management, and operations research. It is used frequently in testing customer acceptance of new product designs, in assessing the appeal of advertisements and in service design. It has been used in product positioning, but there are some who raise problems with this application of conjoint analysis (see disadvantages). Conjoint analysis techniques may also be referred to as multiattribute compositional modelling, discrete choice modelling, or stated preference research, and is part of a broader set of trade-off analysis tools used for systematic analysis of decisions. These tools include Brand-Price TradeOff, Simalto, and mathematical approaches such as evolutionary algorithms or Rule Developing Experimentation.

Advantages

estimates psychological tradeoffs that consumers make when evaluating several attributes measures preferences at the individual level uncovers real or hidden drivers which may not be apparent to the respondent themselves realistic choice or shopping task able to use physical objects if appropriately designed, the ability to model interactions between attributes can be used to

together

develop needs based segmentation Disadvantages


designing conjoint studies can be complex with too many options, respondents resort to simplification strategies difficult to use for product positioning research because there is no procedure for

converting perceptions about actual features to perceptions about a reduced set of underlying features

respondents are unable to articulate attitudes toward new categories, or may feel forced to poorly designed studies may over-value emotional/preference variables and undervalue does not take into account the number items per purchase so it can give a poor reading of

think about issues they would otherwise not give much thought to

concrete variables

market share

*******

STATISTICAL PACKAGES Meaning Statistical software are specialized computer programs for statistical analysis. Packages 1. Aabel Graphic display and plotting of statistical data sets
2. ADAPA batch and real-time scoring of statistical models 3. Angoss 4. ASReml for restricted maximum likelihood analyses 5. BMDP general statistics package

6. CalEst general statistics and probability package with didactic tutorials 7. Data Applied for building statistical models 8. DPS comprehensive statistics package 9. EViews for econometric analysis 10. FAME a system for managing time series statistics and time series databases 11. FinMath - A .NET numerical library containing descriptive statistics, distributions, factor

analysis, regression analysis and many others.


12. GAUSS programming language for statistics 13. GenStat general statistics package 14. GLIM early package for fitting generalized linear models 15. GraphPad InStat Very simple with lots of guidance and explanations 16. GraphPad Prism Biostatistics and nonlinear regression with clear explanations 17. IMSL Numerical Libraries software library with statistical algorithms 18. JMP visual analysis and statistics package 19. LISREL statistics package used in structural equation modeling 20. Maple programming language with statistical features 21. Mathematica programming language with statistical features 22. MATLAB programming language with statistical features 23. MedCalc for biomedical sciences 24. Mentor for market research 25. Minitab general statistics package 26. MLwiN multilevel models (free to UK academics) 27. NCSS general statistics package 28. NMath Stats statistical package for .NET Framework 29. O-Matrix programming language 30. OriginPro statistics and graphing, programming access to NAG library 31. Partek general statistics package with specific applications for genomic, HTS, and QSAR

data
32. Primer-E Primer environmental and ecological specific. 33. PV-WAVE programming language comprehensive data analysis and visualization with

IMSL statistical package

34. Q research software quantitative data analysis software for market research 35. Quantum part of the SPSS MR product line, mostly for data validation and tabulation in

Marketing and Opinion Research


36. RATS comprehensive econometric analysis package 37. SAS comprehensive statistical package 38. SHAZAM comprehensive econometrics and statistics package 39. SigmaStat for group analysis 40. SOCR online tools for teaching statistics and probability theory 41. Speakeasy numerical computational environment and programming language with many

statistical and econometric analysis features


42. SPSS comprehensive statistics package 43. Stata comprehensive statistics package 44. Statgraphics general statistics package 45. STATISTICA comprehensive statistics package 46. STATIT comprehensive statistical package 47. StatXact package for exact nonparametric and parametric statistics 48. Systat general statistics package 49. S-PLUS general statistics package 50. Unistat general statistics package that can also work as Excel add-in 51. The Unscrambler (free-to-try commercial Multivariate analysis software for Windows) 52. WINKS Statistical Data Analysis and Graphs from TexaSoft a general statistics

package designed for scientific data analysis Add-ons


1. 2.

Analyse-it add-on to Microsoft Excel for statistical analysis Sigma Magic - add-on to Microsoft Excel for statistical analysis designed for Lean Six Sigma SigmaXL add-on to Microsoft Excel for statistical and graphical analysis SPC XL add-on to Microsoft Excel for general statistics SUDAAN add-on to SAS and SPSS for statistical surveys XLfit add-on to Microsoft Excel for curve fitting and statistical analysis XLSTAT add-on to Microsoft Excel for statistics and multivariate data analysis

3. 4. 5. 6. 7.

8.

Stats Helper add-on to Microsoft Excel for descriptive statistics and Six Sigma *************

You might also like