Mco 22
Mco 22
Mco 22
APPLICATION
MCO-22
MCOM STUDY MATERIAL
Types of data
Information or data can be classified into two broad categories, such as:
1. Primary data
2. Secondary data
A. Primary Data: Primary data is the data that is collected for the first time through personal experiences or
evidence. It is also described as raw data or first-hand information. The investigator supervises and controls
the data collection process directly. The data is mostly collected through observations, physical testing, mailed
questionnaires, surveys, personal interviews, telephonic interviews, case studies, and focus groups, etc.
B. Secondary Data: Secondary data is a second-hand data that is already collected and recorded by some
researchers for their purpose, and not for the current research problem. It is accessible in the form of data
collected from different sources such as government publications, censuses, internal records of the
organisation, books, journal articles, websites and reports, etc. This method of gathering data is affordable,
readily available, and saves cost and time. However, the one disadvantage is that the information assembled is
for some other purpose and may not meet the present research purpose or may not be accurate.
1) Observation method: In this method, a researcher collects data by observing the activities of the interviewer.
Observation is nothing but watching the behaviour of the participants carefully. Few processes of observation
are sensation, attention and perception.
2) Interview method This method of collecting data is one of the most popular and important. An interview is the
meeting of two persons wherein one person tries to gather information from the other person by asking simple
questions. There may be direct personal interview or indirect personal interview.
Merits
• Interviews have high response rates.
• The method is quite reliable and valuable.
• This method is flexible and questions can be asked according to the facial expressions of the candidate.
Demerits
• The interviewer must be a trained professional.
• This method is expensive and time consuming.
• It cannot be used when the field of inquiry is large.
3) Reporters: Under this method, reporters and correspondents are appointed in different areas to collect data
and information about a particular matter. This method is generally employed by government departments,
newspapers, magazines, radio and TV news channels.
Merits
• This method is more accurate and reliable.
• This method is economical.
• Information can be collected easily and quickly.
• Information can be collected from a wide area extensively.
Demerits
• This method is expensive as many reporters had to be appointed.
• It is time consuming.
• Data may not be reliable
4) Questionnaire: Under this method, a list of questions is sent to the informants requesting them to answer.
questions are sent either by post or personally or by email. Questions are prepared according to the interest of
the respondents. He / she is requested to answer the questions honestly.
Merits
• This method is suitable to cover a wide area.
• This method is economical.
• Reliable and concrete information may be collected through this method.
Demerits
• It is a time-consuming process.
• The informants may not answer the questions honestly.
• This method is not applicable to uneducated people.
DESIGNING OF QUESTIONNAIRE
In designing the questionnaire, some of the important points to be kept in mind are:
1. Covering letter: Every questionnaire should be containing a covering letter. The covering letter should
highlight the purpose of study and assure the respondent the all responses will be kept confidential. It is
desirable that some inducement or motivation is provided to the respondent for better response. The
objectives of the study and questionnaire design should be such that the respondent derives a sense of
satisfaction through his involvement.
2. Number of questions should be kept to the minimum: The fewer the question, the greater the chances of
getting a better response and having all the questions answered. Otherwise, the respondent may feel
disinterested and provide inaccurate answers particularly towards the end of the questionnaire.
3. Questions should be simple, short and unambiguous: The questions should be simple, short, easy to
understand and such that their answers are unambiguous.
4. Questions of sensitive or personal nature should be avoided: The questions should not be such as would
require the respondent to disclose any private, personal or confidential information. For example, questions
relating to sales, profits, material happiness etc. should be avoided as far as possible.
5. Answers to questions should not require calculations: The questions should be framed in such a way that
their answers do not require any calculations.
6. Logical arrangement Collection of Data: The questions should be logically arranged so that there is a
continuity of responses and the respondent does not feel the need to refer back to the previous questions. It
Each and every unit of the population is covered. Only a handful of units of the population.
It is a time-consuming process. It is a fast process.
Expensive method Economical method
Less reliable and accurate, due to the margin of error in
Reliable and accurate
the data collected.
Population of heterogeneous nature. Population of homogeneous nature.
Classification of Data
After the data has been systematically collected and edited, the first step in presentation of data is classification.
Classification is the process of arranging the data according to the points of similarities and dissimilarities. It is like
the process of sorting the mail in a post office where the mail for different destinations is placed in different
compartments after it has been carefully sorted cut from the huge heap.
Objectives of Classification
The principal objectives of classifying data are:
1. To condense the mass of data in such a way that salient features can be readily noticed.
2. To facilitate comparisons between attributes of variables.
3. To prepare data which can be presented in tabular form.
4. To highlight the significant features of the data at a glance
Types of Classification
Some common types of classification are:
1. Geographical Classification
2. Chronological Classification
3. Qualitative Classification
4. Quantitative Classification
1) Geographical Classification. In this type of classification, data is classified according to area or region. For
example, when we consider production of wheat state wise, this would be called geographical classification. The
listing of individual entries is generally done in an alphabetical order or according to size to emphasise the
importance of a particular area or region.
2) Chronological Classification. When the data is classified according to the time of its occurrence, it is known as
chronological classification. For example, sales figure of a company.
3) Qualitative Classification. When the data is classified according to some attributes (distinct categories) which
are not capable of measurement is known as qualitative classification. For example, the attribute education can
have different classes such as primary, middle, higher secondary, university, etc.
4) Quantitative Classification. When the data is classified according to some characteristics that can be
measured, it is called quantitative classification. For example, the employees of a company may be classified
according to their monthly salaries.
CHARTING OF DATA
Charts of frequency distributions which cover both diagrams and graphs are useful because they enable a quick
interpretation of the data. A frequency distribution can be presented by a variety of methods. The following four
popular methods of charting frequency distribution are
1. Bar Diagram
2. Histogram
3. Frequency Polygon
4. Ogive or Cumulative Frequency Curve
1. Bar Diagram. Bar diagrams are most popular. One can see numerous such diagrams in newspapers, journals,
exhibitions, and even on television to depict different characteristics of data. For example, population, per
capita income, sales and profits of a company can be shown easily through bar diagrams. A bar is a thick line
whose width is shown to attract the viewer. A bar diagram may be either vertical or horizontal. In order to
draw a bar diagram, we take the characteristic (or attribute) under consideration on the X-axis and the
corresponding value on the Y-axis. It is desirable to mention the value depicted by the bar on the top of the bar.
1. ARITHMETIC MEAN
The arithmetic mean is the most commonly used and readily understood measure of central tendency. In
statistics, the term average refers to any of the measures of central tendency. The arithmetic mean is defined as
being equal to the sum of the numerical values of each and every observation divided by the total number of
observations.
Properties of A.M
• The sum of the deviations of the observations from the arithmetic mean is always zero.
• The sum of the squared deviations of the observations from the mean is minimum, i.e., the total of the
squares of the deviations from any other value than the mean value will be greater than the total sum of
squares of the deviations from mean.
• The arithmetic means of several sets of data may be combined into a single arithmetic mean for the
combined sets of data.
Advantages of Mean
a) Easy and simple to understand and calculate.
b) Not affected by fluctuations.
c) It takes into account all the values in the series.
Disadvantages of Mean
a) It is affected by very high or very low scores.
b) In the absence of a single item, its value becomes inaccurate.
c) It cannot be determined by inspection.
3. MEDIAN
4. QUANTILES
Quantiles are the related positional measures of central tendency. These are useful and frequently employed
measures of non-central location. The most familiar quantiles are the quartiles, deciles, and percentiles.
Quartiles: Quartiles are those values which divide the total data into four equal parts. Since three points divide
the distribution into four equal parts, we shall have three quartiles such as Q1, Q2, and Q3. The first quartile,
Q1, is the value such that 25% of the observations are smaller and 75% of the observations are larger. The
second quartile, Q2, is the median, 50% of the observations are smaller and 50% are larger. The third quartile,
Q3, is the value such that 75% of the observations are smaller and 25% of the observations are larger.
Percentiles: Percentiles are those values which divide the total data into hundred equal parts. Since ninety-
nine points divide the distribution into hundred equal parts, we shall have ninety-nine percentiles.
5. MODE
The mode is the typical or commonly observed value in a set of data. It is defined as the value which occurs most
often or with the greatest frequency. The dictionary meaning of the term mode is most usual. For example, in
the series of numbers 3, 4, 5, 5, 6, 7, 8, 8, 8, 9, the mode is 8 because it occurs the maximum number of times.
The calculations are different for the grouped data, where the modal class is defined as the class with the
maximum frequency and is calculated by using formula.
Advantages of Mode
a) Easy to understand and calculate.
b) Not affected by extreme values.
c) It can represent qualitative data.
d) It can be represented graphically.
Disadvantages of Mode
a) Mode is not well defined.
b) It does not take into account all the items in the series.
c) No further algebraic treatment is possible.
6. Geometric Mean: It is defined as the nth root of the product of all the n observations. It is not applicable in
case any variable is zero or negative. It is mostly used to know the average rate of change in population,
interest, growth rate, etc.
Types of Variation
Following is some of the well-known measures of variation:
1. Range
2. Average or Mean Deviation
3. Quartile Deviation or Semi-Interquartile Range
4. Standard Deviation
1. RANGE
The range is defined as the difference between the highest value and the lowest value in a set of data. In
symbols, this may be indicated as: R = H – L, where R = Range; H = Highest Value; L = Lowest Value. The range
is very easy to calculate. However, the range is a crude measure of variation, since it uses only two extreme
values. The concept of range is extensively used in statistical quality control. Range is helpful in studying the
variations in the prices of shares and debentures and other commodities that are very sensitive to price
changes from one period to another. For meteorological departments, the range is a good indicator for weather
forecast. For grouped data, the range may be approximated as the difference between the upper limit of the
largest class and the lower limit of the smallest class.
2. QUARTILE DEVIATION
The quartile deviation, also known as semi-interquartile range, is computed by taking the average of the
difference between the third quartile and the first quartile. In symbols, this can be written as:
Q.D. = (Q3 – Q1)/2
where Q1 = first quartile, and Q3 = third quartile.
3. AVERAGE DEVIATION
The measure of average (or mean) deviation is an improvement over the previous two measures in that it
considers all observations in the given set of data. This measure is computed as the mean of deviations from
the mean or the median. All the deviations are treated as positive regardless of sign.
4. STANDARD DEVIATION
The standard deviation is the most widely used and important measure of variation. In computing the average
deviation, the signs are ignored. The standard deviation overcomes this problem by squaring the deviations,
which makes them all positive. The square of the standard deviation is called variance. The standard deviation
and variance become larger as the square of the data becomes greater. More important, it is readily comparable
with other standard deviations and the greater the standard deviation, the greater the variability.
Meaning of Probability
Probability means chance of occurrence of an event or happening. In order to measure probability, we use,
experiment, sample space and event.
Experiment: The term experiment is used in probability theory in a much broader sense than in physics or
chemistry. Any action, whether it is the tossing of a coin or the launching of a new product in the market, constitute
an experiment in the probability. The experiments have three things in common:
1. There are two or more outcomes of each experiment.
2. It is possible to specify the outcomes in advance.
3. There is uncertainty about the outcomes.
For example, a coin tossing may result in two outcomes, in head or tail, which we know in advance, and we are not
sure whether a head or a tail will come up when we toss the coin.
Sample Space: The set of all possible outcomes of an experiment is defined as the sample space. Each outcome is
thus visualised as a sample point in the sample space. Thus, the set (head, tail) defines the sample space of a coin
tossing experiment. The sample space is fully determined by listing down all the possible outcomes of the
experiment.
Event: An event, in probability theory, constitutes one or more possible outcomes of an experiment. Thus, an event
can be defined as a subset of the sample space. Generally, an event refers to a particular happening or incident. But
here, we use an event to refer to a single outcome or a combination of outcomes. Suppose, as a result of a market
study experiment of a product, we find that the demand for the product for the next month is uncertain, and may
take values from 100, 101, 102... 150. We can obtain different event like:
Bayes’ Theorem
• It describes the probability of occurrence of an event related to any condition.
• This theorem is also known as conditional probability.
• For example, if we have to calculate the probability of taking a blue ball from a bag containing black, white, red,
and blue balls, then the probability of blue ball will be ¼.
• A manufacturer who is using a particular machine for producing a product. From earlier data, he has estimated
that the chances of the machine being set correctly or incorrectly are 0.6 and 0.4 respectively.
• Thus, we have two mutually exclusive and collectively exhaustive events: A: The set-up is correct and B: The
set-up is incorrect with P(A) = 0.6 and P(B) = 0.4 (check P(A) + P(B) = 1
Methods of sampling
The various sampling methods can be classified into two categories:
A. Random sampling methods.
B. Non-random sampling methods.
A. Random Sampling Methods: In this method, samples of population are selected randomly. All units in the
population have a chance of being chosen in the sample. It is the simplest sampling methods and each element
of the population is involved in sampling. Random sampling methods are again divided into five methods such
as:
i) Simple random sampling.
ii) Systematic sampling.
iii) Stratified random sampling.
iv) Cluster sampling.
v) Multi stage sampling
It is the most popular method for choosing a sample among population for a wide range of purposes. In simple
random sampling each member of population is equally likely to be chosen as part of the sample. All the members
of the population are given an equal chance to be chosen. Each unit is selected randomly. This method is easy to
understand in theory, but difficult to perform in practice. This is because working with a large sample size is not
easy and it can be a challenge.
Systematic sampling
In this method, samples are chosen systematically. It involves choosing a first individual at random from the
population, then selecting every following nth individual within the sampling frame to make up the sample. It
follows a certain pattern in sampling. In the figure given below, after every two persons, one person is chosen as
sample. So, this method follows a specific pattern or trend.
Advantages
a) This method is more accurate as it reflects the characteristics of the population.
b) This method is more precise and practical.
c) It saves a lot of time, money and resources for data collection because sample size is small.
Disadvantages
a) This method requires a detailed knowledge of the characteristic of the population.
b) It is a difficult task to prepare a stratified list.
c) This method is expensive as it requires the services of an expert researcher.
Cluster Sampling
Cluster sampling is a technique in which clusters of participants representing the population are identified and
included in the sample. The whole population is divided into clusters such as districts, towns, cities, etc. then a person
is selected randomly. This is a popular method in conducting marketing researches. Cluster sampling process can be
single stage or multistage.
In single stage sampling, all the members of selected clusters are included in the study. Whereas, in multistage
sampling, additional sampling methods are used to choose certain individuals within selected clusters.
In the diagram given below, we can observe that the whole population is divided into six clusters. Then any two
clusters are chosen randomly. Out of six clusters, the first and the last are selected as samples.
Multi-stage sampling
It is a combination of all the above methods of sampling. Sampling is done in various stages therefore known as
multi-stage sampling. The whole population is divided into clusters and each cluster into groups and subgroups.
Advantages
a) It is less expensive and time consuming.
b) This method is more accurate and freer from bias.
c) It is more flexible as it allows different sampling procedures at different stages.
d) It can be used over a wide area. That means we can do sampling of a huge population easily by using this
method.
Disadvantages
a) All the units of population are not involved in this method.
b) The data may not be reliable and accurate.
c) It is a complex method and a researcher must be an expert person.
1. Convenience sampling: This method refers to obtaining a sample that is most conveniently available to the
researchers. Information is easily available to researchers from nearby sources. Convenience samples are best
used for exploration research. It is useful in testing the questionnaires designed on a pilot basis. it is widely
used in market research. For example, a company may collect data from its reliable customers for its product.
2. Judgement sampling: In this method of sampling, the selection of sample is based on researchers’ judgment
about a sample. It is also known as expert sampling because the opinions of experts are taken to do research.
This method is not very reliable as different experts have different opinions on a particular fact. This type of
sampling is often used to measure the performance of salespersons. It is also used to forecast election results.
3. Quota sampling: This type of method is quite popular in marketing research. The samples are collected on the
basis of some parameters like age, sex, geographical region, education, income, occupation, etc. The different
units of population are divided on the basis of specific characteristic called quota. Each quota is then analysed.
For example, to know about the diets of population, we can divide the population into vegetarians and non-
vegetarians. Thus, here these two quotas are assigned to researchers.
Meaning of Hypothesis
• The word hypothesis consists of two words – hypo + thesis. ‘Hypo’ means tentative or subject to the
verification. ‘Thesis’ means statement about the solution of the problem.
• Thus, the literal meaning of the term hypothesis is a tentative statement about the solution of the problem.
• Hypothesis offers a solution of the problem that is to be verified empirically and based on some rationale.
• Hypothesis makes a research activity to the point and destination.
• Research without hypothesis is like a sailor in the sea without compass.
i) Random sample: The data used in Chi-square test must be collected randomly. The data must represent the
whole population. Random sample data can be used to find out the significant differences between real and
expected values. The value in the Chi-square test should not be zero because in that case the difference
between observed and expected data would be zero. So, we have to judge the quality of data.
ii) Large sample size: The size of the sample should be as large as possible. A sample size is prone to mistakes
or type-II errors and not possible to apply Chi-square test. Many researchers have set the minimum sample
size at 50.
iii) Adequate cell size: The minimum size of the cell should be 5 to avoid type-ii errors. A small cell size leads
to null hypothesis and the value of Chi-square will be overestimated.
iv) Independence: The sample observations must be independent from one another otherwise correct value in
Chi-square test cannot be ascertained. All the observations should be grouped in categories.
Meaning of Regression
• Regression is a statistical tool which shows the nature of relationship between variables.
• It describes how an independent variable is related to the other numerically.
• It shows the cause-and-effect relationship between the variables.
• Both the variables are independent of each other.
• It indicates the impact of independent variable on dependent variable.
• For example, finding the relationship between age and knowledge. Regression tells us how much knowledge
would be increased with growing age. Thus, it shows the degree of variation between variables.