Mco 22

Download as pdf or txt
Download as pdf or txt
You are on page 1of 26

QUANTITATIVE ANALYSIS & MANAGERIAL

APPLICATION
MCO-22
MCOM STUDY MATERIAL

Santosh Kumar Sharma


(MCOM / MBA / BED)
MCO-22
Quantitative Analysis and Managerial Application
UNIT 1 COLLECTION OF DATA

Types of data
Information or data can be classified into two broad categories, such as:
1. Primary data
2. Secondary data
A. Primary Data: Primary data is the data that is collected for the first time through personal experiences or
evidence. It is also described as raw data or first-hand information. The investigator supervises and controls
the data collection process directly. The data is mostly collected through observations, physical testing, mailed
questionnaires, surveys, personal interviews, telephonic interviews, case studies, and focus groups, etc.
B. Secondary Data: Secondary data is a second-hand data that is already collected and recorded by some
researchers for their purpose, and not for the current research problem. It is accessible in the form of data
collected from different sources such as government publications, censuses, internal records of the
organisation, books, journal articles, websites and reports, etc. This method of gathering data is affordable,
readily available, and saves cost and time. However, the one disadvantage is that the information assembled is
for some other purpose and may not meet the present research purpose or may not be accurate.

Distinguish between Primary data and Secondary data.

Primary data Secondary data


Primary data are those data that are collected for Secondary data refer to those data that have already
the first time. been collected by some other person.
These are original because these are collected by These are not original because someone else has
the investigator for the first time. collected these for his own purpose.
These are in the form of raw materials. These are in the finished form.
These are more reliable and suitable for the enquiry
These are less reliable and less suitable as someone
because these are collected for a particular purpose.
else has collected the data which may not perfectly
match our purpose.
Collecting primary data is quite expensive both in Secondary data requires less time and money; hence
the terms of time and money. it is economical.
No particular precaution or editing is required Both precaution and editing are essential as
while using the primary data as these were secondary data were collected by someone else for
collected with a definite purpose. his own purpose.

Methods of collecting primary data.


The data which is collected by the researcher for the first time are called primary data. There are basically five
methods of collecting primary data:
i) Observation method
ii) Interview method
iii) Reporters
iv) Questionnaire
v) Schedule

1) Observation method: In this method, a researcher collects data by observing the activities of the interviewer.
Observation is nothing but watching the behaviour of the participants carefully. Few processes of observation
are sensation, attention and perception.

SANTOSH KR. SHARM 1


Merits
• This method is suitable for intensive study and provides raw materials for deep study.
• This method is suitable where the participants are reluctant to share information.
• This method is quite economical.
Demerits
• The observer must be an expert otherwise he cannot predict accurately.
• This method is not suitable for extensive study purpose.
• There is every chance of wrong prediction if the participant does not behave rationally.
• This method is not reliable and generally not followed by many organisations.

2) Interview method This method of collecting data is one of the most popular and important. An interview is the
meeting of two persons wherein one person tries to gather information from the other person by asking simple
questions. There may be direct personal interview or indirect personal interview.
Merits
• Interviews have high response rates.
• The method is quite reliable and valuable.
• This method is flexible and questions can be asked according to the facial expressions of the candidate.
Demerits
• The interviewer must be a trained professional.
• This method is expensive and time consuming.
• It cannot be used when the field of inquiry is large.

3) Reporters: Under this method, reporters and correspondents are appointed in different areas to collect data
and information about a particular matter. This method is generally employed by government departments,
newspapers, magazines, radio and TV news channels.
Merits
• This method is more accurate and reliable.
• This method is economical.
• Information can be collected easily and quickly.
• Information can be collected from a wide area extensively.
Demerits
• This method is expensive as many reporters had to be appointed.
• It is time consuming.
• Data may not be reliable

4) Questionnaire: Under this method, a list of questions is sent to the informants requesting them to answer.
questions are sent either by post or personally or by email. Questions are prepared according to the interest of
the respondents. He / she is requested to answer the questions honestly.
Merits
• This method is suitable to cover a wide area.
• This method is economical.
• Reliable and concrete information may be collected through this method.
Demerits
• It is a time-consuming process.
• The informants may not answer the questions honestly.
• This method is not applicable to uneducated people.

SANTOSH KR. SHARM 2


5) Schedule: It is also a type of questionnaire send to the informants asking them to reply to the questions.
Generally, informants are researchers or enumerators and collect information from the informants. Here
enumerators fill the questionnaire.
Merits
• It is applicable to uneducated people also.
• It is suitable for extensive study.
• The enumerators visit the informants personally to collect information.
Demerits
• It is expensive and time-consuming method.
• The informants may not respond honestly and thus information may not be reliable.

Sources of collecting Secondary data


The various sources of collecting secondary data can be classified into 3 major categories, such as
1. Published sources
2. Unpublished sources
3. Electronic sources
1) Published sources: Secondary data can be collected from various published sources like newspapers,
magazines, bulletins, report’s, etc. These sources may be classified into the following types
• Government publications.
• International publications
• Publications by various committees and commissions.
• Private publications.
2) Unpublished sources: Secondary data may be collected from unpublished sources. We can collect information
from the registers, files, etc. of certain research institutions, trade associations, universities, private institutions.
3) Electronic sources: This is also one of the popular sources of collecting data in recent years. There is so much
of information and data power Internet and web from where the researcher can collect information. These
sources include google.com, yahoo.com, msn.com, etc.

DESIGNING OF QUESTIONNAIRE
In designing the questionnaire, some of the important points to be kept in mind are:
1. Covering letter: Every questionnaire should be containing a covering letter. The covering letter should
highlight the purpose of study and assure the respondent the all responses will be kept confidential. It is
desirable that some inducement or motivation is provided to the respondent for better response. The
objectives of the study and questionnaire design should be such that the respondent derives a sense of
satisfaction through his involvement.
2. Number of questions should be kept to the minimum: The fewer the question, the greater the chances of
getting a better response and having all the questions answered. Otherwise, the respondent may feel
disinterested and provide inaccurate answers particularly towards the end of the questionnaire.
3. Questions should be simple, short and unambiguous: The questions should be simple, short, easy to
understand and such that their answers are unambiguous.
4. Questions of sensitive or personal nature should be avoided: The questions should not be such as would
require the respondent to disclose any private, personal or confidential information. For example, questions
relating to sales, profits, material happiness etc. should be avoided as far as possible.
5. Answers to questions should not require calculations: The questions should be framed in such a way that
their answers do not require any calculations.
6. Logical arrangement Collection of Data: The questions should be logically arranged so that there is a
continuity of responses and the respondent does not feel the need to refer back to the previous questions. It

SANTOSH KR. SHARM 3


is desirable that the questionnaire should begin with some introductory questions followed by vital
questions crucial to the survey and ending with some light questions so that the overall impression of the
respondent is a happy one.
7. Cross-check and Footnotes: The questionnaire should contain some such questions which act as a cross-
check to the reliability of the information provided. For example, when a question relating to income is asked,
it is desirable to include a question: “are you an income tax assessee?”

Essential characteristics of Data


The following are the important features of primary and secondary data:
1. Completeness. Each questionnaire should be complete in all respects. The respondent should have answered
each and every question. If some important questions have been left unanswered, attempts should be made to
contact the respondent and get the response. If despite al efforts, answered to vital questions are not given, such
questionnaires should be dropped from final analysis.
2. Consistency. Questionnaire should also be checked to see that there are no contradictory answers.
Contradictory responses my arise due to wrong answers filled up by the respondents or because of carelessness
on the part of the investigator in recording the data.
3. Accuracy. The questionnaire should also be checked for the accuracy of information provided by the
respondent. It may be pointed out that this is the most difficult job of the investigator and at the same time the
most important one. If inaccuracies are permitted, this would lead to misleading results.
4. Homogeneity. It is equally important to check whether the questions have been understood in the same sense
by all the respondents. For instance, if there is a question on income, it should be very clearly stated whether it
refers to weekly, monthly, or yearly income. If it is left ambiguous then respondents may give different
responses and there will be no basis for comparison because we may take some figures which are valid for
monthly income and some for annual income.

CENSUS AND SAMPLE


Census Sampling
It is a systematic method that collects and records Sampling refers to a portion of the population selected to
the data about the whole population. represent the entire group, in all its characteristics.

Each and every unit of the population is covered. Only a handful of units of the population.
It is a time-consuming process. It is a fast process.
Expensive method Economical method
Less reliable and accurate, due to the margin of error in
Reliable and accurate
the data collected.
Population of heterogeneous nature. Population of homogeneous nature.

SANTOSH KR. SHARM 4


UNIT-2: PRESENTATION OF DATA

Classification of Data
After the data has been systematically collected and edited, the first step in presentation of data is classification.
Classification is the process of arranging the data according to the points of similarities and dissimilarities. It is like
the process of sorting the mail in a post office where the mail for different destinations is placed in different
compartments after it has been carefully sorted cut from the huge heap.

Objectives of Classification
The principal objectives of classifying data are:
1. To condense the mass of data in such a way that salient features can be readily noticed.
2. To facilitate comparisons between attributes of variables.
3. To prepare data which can be presented in tabular form.
4. To highlight the significant features of the data at a glance

Types of Classification
Some common types of classification are:
1. Geographical Classification
2. Chronological Classification
3. Qualitative Classification
4. Quantitative Classification
1) Geographical Classification. In this type of classification, data is classified according to area or region. For
example, when we consider production of wheat state wise, this would be called geographical classification. The
listing of individual entries is generally done in an alphabetical order or according to size to emphasise the
importance of a particular area or region.
2) Chronological Classification. When the data is classified according to the time of its occurrence, it is known as
chronological classification. For example, sales figure of a company.
3) Qualitative Classification. When the data is classified according to some attributes (distinct categories) which
are not capable of measurement is known as qualitative classification. For example, the attribute education can
have different classes such as primary, middle, higher secondary, university, etc.
4) Quantitative Classification. When the data is classified according to some characteristics that can be
measured, it is called quantitative classification. For example, the employees of a company may be classified
according to their monthly salaries.

CHARTING OF DATA
Charts of frequency distributions which cover both diagrams and graphs are useful because they enable a quick
interpretation of the data. A frequency distribution can be presented by a variety of methods. The following four
popular methods of charting frequency distribution are
1. Bar Diagram
2. Histogram
3. Frequency Polygon
4. Ogive or Cumulative Frequency Curve
1. Bar Diagram. Bar diagrams are most popular. One can see numerous such diagrams in newspapers, journals,
exhibitions, and even on television to depict different characteristics of data. For example, population, per
capita income, sales and profits of a company can be shown easily through bar diagrams. A bar is a thick line
whose width is shown to attract the viewer. A bar diagram may be either vertical or horizontal. In order to
draw a bar diagram, we take the characteristic (or attribute) under consideration on the X-axis and the
corresponding value on the Y-axis. It is desirable to mention the value depicted by the bar on the top of the bar.

SANTOSH KR. SHARM 5


2. Histogram. One of the most commonly used and easily understood methods for graphic presentation of
frequency distribution is histogram. A histogram is a series of rectangles having areas that are in the same
proportion as the frequencies of a frequency distribution.
3. Frequency Polygon. The frequency polygon is a graphical presentation of frequency distribution. A polygon
is a many-sided closed figure.
4. Ogive. An ogive is the graphical presentation of a cumulative frequency distribution. When the graph of such
distribution is drawn, it is known as ogive. It can be of two types, such as less than ogive and more than ogive.

SANTOSH KR. SHARM 6


UNIT-3: MEASURES OF CENTRAL TENDENCY

PROPERTIES OF A GOOD MEASURE OF CENTRAL TENDENCY


A good measure of central tendency should possess the following properties:
1. It should be easy to understand.
2. It should he simple to compute.
3. It should be based on all observations.
4. It should be uniquely defined.
5. It should be capable of further algebraic treatment.
6. It should not be unduly affected by extreme values.

Types of Measures of Central Tendency


Some of the important measures of central tendency commonly used in business and industry are:
1. Arithmetic Mean
2. Weighted Arithmetic Mean
3. Median
4. Quantiles
5. Mode
6. Geometric Mean
7. Harmonic Mean

1. ARITHMETIC MEAN
The arithmetic mean is the most commonly used and readily understood measure of central tendency. In
statistics, the term average refers to any of the measures of central tendency. The arithmetic mean is defined as
being equal to the sum of the numerical values of each and every observation divided by the total number of
observations.
Properties of A.M
• The sum of the deviations of the observations from the arithmetic mean is always zero.
• The sum of the squared deviations of the observations from the mean is minimum, i.e., the total of the
squares of the deviations from any other value than the mean value will be greater than the total sum of
squares of the deviations from mean.
• The arithmetic means of several sets of data may be combined into a single arithmetic mean for the
combined sets of data.
Advantages of Mean
a) Easy and simple to understand and calculate.
b) Not affected by fluctuations.
c) It takes into account all the values in the series.
Disadvantages of Mean
a) It is affected by very high or very low scores.
b) In the absence of a single item, its value becomes inaccurate.
c) It cannot be determined by inspection.

2. WEIGHTED ARITHMETIC MEAN


The arithmetic mean gives equal importance or weight to each observation. In some cases, all observations do
not have the same importance. Therefore, we compute weighted arithmetic mean. In weighted arithmetic mean,
W are the weights assigned to the variable X.

3. MEDIAN

SANTOSH KR. SHARM 7


Median is that value which divides the distribution into two equal parts. Fifty per cent of the observations in the
distribution are above the value of median and other fifty per cent of the observations are below this value of
median. The median is the value of the middle observation when the series is arranged in order of size or
magnitude. If the number of observations is odd, then the median is equal to one of the original observations. If
the number of observations is even, then the median is the arithmetic mean of the two middle observations. For
example, if the income of seven persons in rupees is 1100, 1200, 1350, 1500, 1550, 1600, 1800, then the median
income would be Rs. 1500. Suppose one more person joins and his income is Rs. 1850, then the median income
of eight persons would be 1525 (since the number of observations is even, the median is the arithmetic mean
of the 4th person and 5th person).
Advantages of Median
a) Easy and simple to calculate and understand.
b) Not affected by extreme values.
c) It can be represented graphically.
d) Suitable for open end distribution.
Disadvantages of Median
a) Not suitable for even distribution.
b) Unsuitable for algebraic treatment.
c) Unsuitable for fractions and percentages.

4. QUANTILES
Quantiles are the related positional measures of central tendency. These are useful and frequently employed
measures of non-central location. The most familiar quantiles are the quartiles, deciles, and percentiles.
Quartiles: Quartiles are those values which divide the total data into four equal parts. Since three points divide
the distribution into four equal parts, we shall have three quartiles such as Q1, Q2, and Q3. The first quartile,
Q1, is the value such that 25% of the observations are smaller and 75% of the observations are larger. The
second quartile, Q2, is the median, 50% of the observations are smaller and 50% are larger. The third quartile,
Q3, is the value such that 75% of the observations are smaller and 25% of the observations are larger.
Percentiles: Percentiles are those values which divide the total data into hundred equal parts. Since ninety-
nine points divide the distribution into hundred equal parts, we shall have ninety-nine percentiles.

5. MODE
The mode is the typical or commonly observed value in a set of data. It is defined as the value which occurs most
often or with the greatest frequency. The dictionary meaning of the term mode is most usual. For example, in
the series of numbers 3, 4, 5, 5, 6, 7, 8, 8, 8, 9, the mode is 8 because it occurs the maximum number of times.
The calculations are different for the grouped data, where the modal class is defined as the class with the
maximum frequency and is calculated by using formula.
Advantages of Mode
a) Easy to understand and calculate.
b) Not affected by extreme values.
c) It can represent qualitative data.
d) It can be represented graphically.
Disadvantages of Mode
a) Mode is not well defined.
b) It does not take into account all the items in the series.
c) No further algebraic treatment is possible.

6. Geometric Mean: It is defined as the nth root of the product of all the n observations. It is not applicable in
case any variable is zero or negative. It is mostly used to know the average rate of change in population,
interest, growth rate, etc.

SANTOSH KR. SHARM 8


7. Harmonic Mean: It is defined as the reciprocal of the arithmetic mean of the reciprocal of the observations. In
other words, it may be defined as the ratio of number of observations and sum of reciprocal of the values.

RELATIONSHIP AMONG MEAN, MEDIAN AND MODE


A distribution in which mean, median and mode coincide is known as a symmetrical (bell shaped) distribution. If a
distribution is skewed (that is, not symmetrical) then mean, median, and mode are not equal. In a moderately skewed
distribution, a very interesting relationship exists among mean, median and mode. In such type of distributions, it
can be proved that the distance between mean and median is approximately one third of the distance between the
mean and mode. This is shown below for two types of such distributions. This relationship can be expressed as
follows:
Mean - Median = 1/3 (Mean - Mode)
or Mode = 3 Median - 2 Mean

SANTOSH KR. SHARM 9


UNIT-4: MEASURES OF VARIATION AND SKEWNESS

Importance of measuring Variation


Measuring variation is significant for some of the following purposes.
1. It helps to find the reliability of an average by pointing out how far an average is representative of the entire
data.
2. To determine the nature and cause variation in order to control the variation itself.
3. It helps in comparisons of two or more distributions with regard to their variability.
4. Measuring variability is of great importance to advanced statistical analysis. For example, sampling or
statistical inference is essentially a problem in measuring variability.

Types of Variation
Following is some of the well-known measures of variation:
1. Range
2. Average or Mean Deviation
3. Quartile Deviation or Semi-Interquartile Range
4. Standard Deviation
1. RANGE
The range is defined as the difference between the highest value and the lowest value in a set of data. In
symbols, this may be indicated as: R = H – L, where R = Range; H = Highest Value; L = Lowest Value. The range
is very easy to calculate. However, the range is a crude measure of variation, since it uses only two extreme
values. The concept of range is extensively used in statistical quality control. Range is helpful in studying the
variations in the prices of shares and debentures and other commodities that are very sensitive to price
changes from one period to another. For meteorological departments, the range is a good indicator for weather
forecast. For grouped data, the range may be approximated as the difference between the upper limit of the
largest class and the lower limit of the smallest class.
2. QUARTILE DEVIATION
The quartile deviation, also known as semi-interquartile range, is computed by taking the average of the
difference between the third quartile and the first quartile. In symbols, this can be written as:
Q.D. = (Q3 – Q1)/2
where Q1 = first quartile, and Q3 = third quartile.
3. AVERAGE DEVIATION
The measure of average (or mean) deviation is an improvement over the previous two measures in that it
considers all observations in the given set of data. This measure is computed as the mean of deviations from
the mean or the median. All the deviations are treated as positive regardless of sign.
4. STANDARD DEVIATION
The standard deviation is the most widely used and important measure of variation. In computing the average
deviation, the signs are ignored. The standard deviation overcomes this problem by squaring the deviations,
which makes them all positive. The square of the standard deviation is called variance. The standard deviation
and variance become larger as the square of the data becomes greater. More important, it is readily comparable
with other standard deviations and the greater the standard deviation, the greater the variability.

A short note on Coefficient of Variation


A frequently used relative measure of variation is the coefficient of variation, denoted by C.V. This measure is simply
the ratio of the standard deviation to mean expressed as the percentage.
Coefficient of variation = C. V. = S.D./Mean x 100
When the coefficient of variation is less in the data, it is said to be less variable.

SANTOSH KR. SHARM 10


Meaning of Skewness.
• Skewness, in statistics, is the degree of asymmetry observed in a probability distribution.
• The measure of skewness tells us the direction of the distribution.
• Symmetry means the variables are equidistant from the average on both sides.
• Skewness can be of three types such as, positive, negative and zero.
• Negative skew refers to a longer or fatter tail on the left side of the distribution, while positive skew refers
to a longer or fatter tail on the right. Zero skewness (Bell curve) is a balanced distribution.

Graphical presentation of skewness


Skewness can be of three types such as symmetrical, positively skewed and negatively skewed distributions.
a) Symmetrical skewed distribution: When the spread of the frequencies is the same on both sides of the
middle point of the frequency polygon, it is known as symmetrical distribution.
Here, Mean = Median = Mode
b) Positively skewed distribution: When there is a longer tail towards the right-hand side of the centre of
distribution, it is known as positively skewed distribution.
Here, Mean > Median > Mode
c) Negatively skewed distribution: When there is a longer tail towards the left-hand side of the centre, it is
known as negatively skewed distribution.
Here, Mean < Median < Mode

SANTOSH KR. SHARM 11


UNIT-5: BASIC CONCEPTS OF PROBABILITY

Meaning of Probability
Probability means chance of occurrence of an event or happening. In order to measure probability, we use,
experiment, sample space and event.
Experiment: The term experiment is used in probability theory in a much broader sense than in physics or
chemistry. Any action, whether it is the tossing of a coin or the launching of a new product in the market, constitute
an experiment in the probability. The experiments have three things in common:
1. There are two or more outcomes of each experiment.
2. It is possible to specify the outcomes in advance.
3. There is uncertainty about the outcomes.
For example, a coin tossing may result in two outcomes, in head or tail, which we know in advance, and we are not
sure whether a head or a tail will come up when we toss the coin.
Sample Space: The set of all possible outcomes of an experiment is defined as the sample space. Each outcome is
thus visualised as a sample point in the sample space. Thus, the set (head, tail) defines the sample space of a coin
tossing experiment. The sample space is fully determined by listing down all the possible outcomes of the
experiment.
Event: An event, in probability theory, constitutes one or more possible outcomes of an experiment. Thus, an event
can be defined as a subset of the sample space. Generally, an event refers to a particular happening or incident. But
here, we use an event to refer to a single outcome or a combination of outcomes. Suppose, as a result of a market
study experiment of a product, we find that the demand for the product for the next month is uncertain, and may
take values from 100, 101, 102... 150. We can obtain different event like:

Bayes’ Theorem
• It describes the probability of occurrence of an event related to any condition.
• This theorem is also known as conditional probability.
• For example, if we have to calculate the probability of taking a blue ball from a bag containing black, white, red,
and blue balls, then the probability of blue ball will be ¼.
• A manufacturer who is using a particular machine for producing a product. From earlier data, he has estimated
that the chances of the machine being set correctly or incorrectly are 0.6 and 0.4 respectively.
• Thus, we have two mutually exclusive and collectively exhaustive events: A: The set-up is correct and B: The
set-up is incorrect with P(A) = 0.6 and P(B) = 0.4 (check P(A) + P(B) = 1

SANTOSH KR. SHARM 12


UNIT-6: DISCRETE PROBABILITY DISTRIBUTIONS

Meaning of Discrete Probability Distribution


A representation of all possible values of a discrete random variable together with their probabilities of occurrence
is called a discrete probability distribution. In discrete situations, the function that gives the probability of every
possible outcome is referred to in Probability Theory as the “probability mass function” (p.m.f.). The different
methods by which p.m.f. of a random variable can be specified are:
• Using standard functions in probability theory.
• Using past data on the random variable.
• Using subjective assessment.

Types of Discrete Probability Distribution


Bernoulli Process
Any uncertain situation or experiment that is marked by the following three properties is known as a Bernoulli
Process. Typical examples of Bernoulli process are coin-tossing and success-failure situations. In repeated tossing of
coins, for each toss, there are two mutually exclusive and collectively exhaustive events, namely, head and tail.
It assumes that:
• There are only two mutually exclusive and collectively exhaustive outcomes in the experiment.
• In repeated observations of the experiment, the probabilities of occurrence of these events remain constant.
• The observations are independent of one another.

SANTOSH KR. SHARM 13


UNIT-7: CONTINUOUS PROBABILITY DISTRIBUTIONS
Normal Distribution
The Normal Distribution is the most important of all the continuous probability distributions. It is found to be useful
in statistical inferences. Some important characteristics are:
• Normal distribution is symmetric about mean.
• The data near mean are more frequent.
• It looks like a bell and therefore is also known as bell curve.

1. It has a symmetric shape, meaning it can be cut into two halves.


2. Kurtosis = 3. Remember that kurtosis is a measure of flatness and excess kurtosis is measured relative to 3,
the “normal kurtosis.”
3. The mean, mode, and median are all equal and lie directly in the middle of the distribution.
4. The standard deviation measures the distance from the mean to the point of inflection, which is the point
where the curve changes an “upside-down-bowl” shape to a “right-side-up-bowl” shape.

SANTOSH KR. SHARM 14


UNIT-9: SAMPLING METHODS

Meaning of Sampling Imp


Sampling is a process of collecting data from the samples chosen. Generally, it is not possible for a researcher to
collect data from each member or unit of a population. Therefore, it divides the whole population into small samples
from where data are collected.

Methods of sampling
The various sampling methods can be classified into two categories:
A. Random sampling methods.
B. Non-random sampling methods.

Random Sampling Non-random Sampling


In this technique of sampling, each unit of In this technique of sampling, samples are selected on
population has an equal chance of being chosen as the basis of certain factors like convenience,
a sample. judgement and experience of the researcher.
It is based on probability of events. It is based on some specific factors.
It represents the entire population. It doesn’t represent the entire population.
It is a simple technique of sampling. It is complex in nature.

A. Random Sampling Methods: In this method, samples of population are selected randomly. All units in the
population have a chance of being chosen in the sample. It is the simplest sampling methods and each element
of the population is involved in sampling. Random sampling methods are again divided into five methods such
as:
i) Simple random sampling.
ii) Systematic sampling.
iii) Stratified random sampling.
iv) Cluster sampling.
v) Multi stage sampling

Simple random sampling

It is the most popular method for choosing a sample among population for a wide range of purposes. In simple
random sampling each member of population is equally likely to be chosen as part of the sample. All the members
of the population are given an equal chance to be chosen. Each unit is selected randomly. This method is easy to
understand in theory, but difficult to perform in practice. This is because working with a large sample size is not
easy and it can be a challenge.

Advantages of Simple Random Sampling


a) It is the simplest method to be used in sampling if applied properly.
b) It is easy to select samples of a large population.
c) Research findings can be easily generalized due to representativeness of this sampling technique.
SANTOSH KR. SHARM 15
d) It doesn’t require any advanced technical knowledge.
e) It is economical.

Disadvantages of Simple Random Sampling


a) This can be costly and time-consuming for large studies.
b) It is not suitable for large sample size.
c) This method looks easy in theory but difficult to practice.
d) This sampling method is not suitable for studies that involve face-to-face interviews covering a large
geographical area due to cost and time considerations.

Systematic sampling
In this method, samples are chosen systematically. It involves choosing a first individual at random from the
population, then selecting every following nth individual within the sampling frame to make up the sample. It
follows a certain pattern in sampling. In the figure given below, after every two persons, one person is chosen as
sample. So, this method follows a specific pattern or trend.

Advantage of systematic sampling


a) It is a faster and simpler way, since there is no need to generate a random number for each individual in the
sample.
b) systematic sampling guarantees perfectly even selection from the population.
c) It is more systematic and scientific than simple random sampling.

Disadvantages of systematic sampling


a) This method is very difficult to apply as it is not possible to select people in a definite pattern.
b) All the units of population are not given an equal chance in the selection process.
c) There is a high degree of risk in data manipulation.

Stratified Random Sampling


In stratified sampling, we divide the population into relatively homogeneous groups called strata. Then we select a
sample from each strata using simple random sampling method. This method is used when the population is
heterogeneous rather than homogeneous. A heterogeneous population is composed of different elements such as
male female, rural, urban, literate illiterate, etc.
This method can be of two types such as proportional stratified sample and disproportional stratified sample. If the
number of sampling units drawn from each stratum is in proportion it is known as proportional stratified sample. If

SANTOSH KR. SHARM 16


number of sampling units drawn from each stratum is not in proportion, then it is known as disproportional
stratified sample.

Advantages
a) This method is more accurate as it reflects the characteristics of the population.
b) This method is more precise and practical.
c) It saves a lot of time, money and resources for data collection because sample size is small.

Disadvantages
a) This method requires a detailed knowledge of the characteristic of the population.
b) It is a difficult task to prepare a stratified list.
c) This method is expensive as it requires the services of an expert researcher.

Cluster Sampling
Cluster sampling is a technique in which clusters of participants representing the population are identified and
included in the sample. The whole population is divided into clusters such as districts, towns, cities, etc. then a person
is selected randomly. This is a popular method in conducting marketing researches. Cluster sampling process can be
single stage or multistage.
In single stage sampling, all the members of selected clusters are included in the study. Whereas, in multistage
sampling, additional sampling methods are used to choose certain individuals within selected clusters.
In the diagram given below, we can observe that the whole population is divided into six clusters. Then any two
clusters are chosen randomly. Out of six clusters, the first and the last are selected as samples.

Advantages of Cluster Sampling


a) It is the most time-efficient and cost-efficient probability design for large geographical areas
b) This method is more practical and accurate.
c) Larger sample size can be used in this method.

Disadvantages of Cluster Sampling


a) It requires group-level information to be collected from the samples.
b) It is a very complex process and requires the services of experts.
c) This method has higher sampling errors than other sampling techniques.
d) Cluster sampling may fail to reflect the diversity in the sampling frame

Multi-stage sampling

It is a combination of all the above methods of sampling. Sampling is done in various stages therefore known as
multi-stage sampling. The whole population is divided into clusters and each cluster into groups and subgroups.

SANTOSH KR. SHARM 17


Then each unit is selected randomly. For example, to gather information on MCOM students studying in different
colleges all over India, we can use this method quite effectively.

Advantages
a) It is less expensive and time consuming.
b) This method is more accurate and freer from bias.
c) It is more flexible as it allows different sampling procedures at different stages.
d) It can be used over a wide area. That means we can do sampling of a huge population easily by using this
method.

Disadvantages
a) All the units of population are not involved in this method.
b) The data may not be reliable and accurate.
c) It is a complex method and a researcher must be an expert person.

B. Non-random sampling methods


In non-random sampling method, the probability of any particular units of the population being chosen is unknown.
The various non random sampling methods are:
1. Convenience sampling
2. Judgment Sampling
3. Quota sampling

1. Convenience sampling: This method refers to obtaining a sample that is most conveniently available to the
researchers. Information is easily available to researchers from nearby sources. Convenience samples are best
used for exploration research. It is useful in testing the questionnaires designed on a pilot basis. it is widely
used in market research. For example, a company may collect data from its reliable customers for its product.

2. Judgement sampling: In this method of sampling, the selection of sample is based on researchers’ judgment
about a sample. It is also known as expert sampling because the opinions of experts are taken to do research.
This method is not very reliable as different experts have different opinions on a particular fact. This type of
sampling is often used to measure the performance of salespersons. It is also used to forecast election results.

3. Quota sampling: This type of method is quite popular in marketing research. The samples are collected on the
basis of some parameters like age, sex, geographical region, education, income, occupation, etc. The different
units of population are divided on the basis of specific characteristic called quota. Each quota is then analysed.
For example, to know about the diets of population, we can divide the population into vegetarians and non-
vegetarians. Thus, here these two quotas are assigned to researchers.

SANTOSH KR. SHARM 18


UNIT-11: TESTING OF HYPOTHESES
Imp
Distinguish between Estimation and Testing of Hypothesis.

Estimation Testing of Hypothesis


Estimation is a process of estimating the unknown Testing of hypothesis is the process of either rejecting
parameter of population based on random samples or accepting a statement that has been set up about
collected. the parameter.
In estimation it is assumed that the parameter to be Whereas, in testing of hypothesis this is done by
estimated is constant computing a test based on the samples.
Estimation maybe of different types like point Testing of hypothesis may be null, simple, composite
estimation, interval estimation, etc. hypothesis.
We estimate a parameter with the help of single In testing of hypothesis, we may commit errors like
value known as point estimate or a pair of values Type-I or Type-II. when a true statement is rejected, it
known as interval estimate. is Type-I error and when a false statement is accepted
it is Type-II error.

A short note on Interval Estimation


• This is a type of estimation based on lower and upper values of the parameter.
• It may be defined as estimating an interval to which the unknown parameter may belong in all likelihood.
• For example, when we say the average per capita income of India is Rs.5600. instead of this we should say
the average income is between Rs.5000-6000.
• In this way, interval estimation considers two values within a certain interval.

Meaning of Hypothesis
• The word hypothesis consists of two words – hypo + thesis. ‘Hypo’ means tentative or subject to the
verification. ‘Thesis’ means statement about the solution of the problem.
• Thus, the literal meaning of the term hypothesis is a tentative statement about the solution of the problem.
• Hypothesis offers a solution of the problem that is to be verified empirically and based on some rationale.
• Hypothesis makes a research activity to the point and destination.
• Research without hypothesis is like a sailor in the sea without compass.

Need or significance of formulating Hypothesis


1) Development of Research Techniques: There are various types of social problems which are complex
in nature. For this research is very difficult. We cannot cover it with a single technique but it requires many
techniques. These techniques are due to hypothesis provided to a researcher.
2) Separating Relevant from Irrelevant Observation: A Researcher during study will take the observations
and facts which are accordance to the condition and situation. While drop out the irrelevant facts from his
study. This separation is due to hypothesis formulation which keeps away relevant observation from
irrelevant.
3) Direction of Research: Hypothesis acts as a guide master in research. It gives new knowledge and direction
to a researcher. It directs a scientist to know about the problematic situation and its causes.
4) Acts as a Guide: Hypothesis gives new ways and direction to a researcher. It acts as a guide and a leader in
various organizations or society. It is like the investigator’s eye.
5) Prevents Blind Research: Hypothesis provides lighting to the darkness of research. It gives difference b/w
scientific and unscientific, false and true research. It prevents blind research and give accuracy.
6) Accuracy & Precision: Hypothesis provides accuracy and precision to a research activity. Accuracy and
precision are the feature of scientific investigation which is possible due to hypothesis.
7) Provide Answer for a Question: A hypothesis highlights the causes of a problematic situation. Further
solution is also given by a hypothesis which provides answer to a question.

SANTOSH KR. SHARM 19


8) Saves Time, Money & Energy: Hypothesis saves time, money and energy of a researcher because it is a
guide for him and help him in saving these basic things.
9) Proper Data Collection: Hypothesis provides the basis of proper Data Collection Relevant and correct
information collected by a researcher is the main function of a good formulated hypothesis.

Requisites of a good hypothesis


a) It must be testable.
b) It must have an object.
c) Variables must be measurable.
d) It must relate to variables.
Imp
Type I and Type II Errors
There can be two types of errors such as Type I errors and Type II errors. In all tests of hypothesis, type I error is
assumed to be more serious than type II error and so the probability of type I error needs to be explicitly controlled.
This is done through specifying a significance level at which a test is conducted. The significance level, therefore, sets
a limit to the probability of Type I error and test procedures are designed so as to get the lowest probability of type
II error subject to the significance level. The probability of type I error is usually represented by the symbol α (read
as Alpha) and the probability of type II error represented by β (read as beta).

Procedure of testing hypothesis


The different steps in hypothesis are:
Step 1: State the Null and the Alternate Hypotheses.
Step 2: Choose the sample statistic that will define the critical region.
Step 3: Specify a level of significance of α.
Step 4: Define the critical region in terms of the test statistic.
Step 5: Compare the observed value of the test statistic with the cut-off value

SANTOSH KR. SHARM 20


UNIT-12: CHI-SQUARE TESTS Imp
Meaning of Chi-Square test.
• Chi-square test is a measure of the difference between the observed and expected frequencies in one or more
categories of variables.
• It was developed by Karl Pearson.
• A chi-square test is a statistical test used to compare observed results with expected results.
• Chi-square test is a statistical method used to determine goodness of fit. Goodness of fit refers to how close the
observed data r to predict data from hypothesis.
• The purpose of this test is to determine if a difference between observed data and expected data is due to
chance, or if it is due to a relationship between the variables you are studying.
• Therefore, a chi-square test is an excellent choice to help us better understand and interpret the relationship
between our two categorical variables.
• It tells us whether two variables are independent of each other.
• It is used to assess 3 types of comparisons such as
1. goodness of fit.
2. Homogeneity.
3. Independence of variables.
• The formula to calculate the value using Chi-square test is:
(Observed value−Expected value)2 ∑(O−E)2
X2 = ∑ =
Expected value E

Conditions for applying Chi-Square test


The following conditions must be satisfied to apply Chi-square test:
i) Random sample
ii) Large sample size
iii) Adequate cell size
iv) Independence

i) Random sample: The data used in Chi-square test must be collected randomly. The data must represent the
whole population. Random sample data can be used to find out the significant differences between real and
expected values. The value in the Chi-square test should not be zero because in that case the difference
between observed and expected data would be zero. So, we have to judge the quality of data.
ii) Large sample size: The size of the sample should be as large as possible. A sample size is prone to mistakes
or type-II errors and not possible to apply Chi-square test. Many researchers have set the minimum sample
size at 50.
iii) Adequate cell size: The minimum size of the cell should be 5 to avoid type-ii errors. A small cell size leads
to null hypothesis and the value of Chi-square will be overestimated.
iv) Independence: The sample observations must be independent from one another otherwise correct value in
Chi-square test cannot be ascertained. All the observations should be grouped in categories.

Testing the Goodness of Fit


• Many times, we are interested in knowing if it is reasonable to assume that the population distribution is
Normal, Poisson, Uniform or any other known Distribution.
• Again, the conclusion is to be based on the evidence produced by a sample. Such a procedure is developed to
test how close is the fit between the observed data and the distribution assumed.
• These tests are also based on the chi-square statistic.

SANTOSH KR. SHARM 21


UNIT-13: BUSINESS FORECASTING
Methods of Forecasting
The primary purpose of forecasting. Is to provide valuable information for planning the design and operation of the
enterprise. Planning decisions may be classified as long term, medium term and short term. Long term decisions
include decisions like plant expansion or. new product introduction which may require new technologies.
Some methods used in forecasting
1. Subjective of intuitive methods.
2. Methods based on averaging of past data, including simple, weighted and moving averages.
3. Regression models on historical data.
4. Causal or Econometric models.
5. Time series analysis or stochastic models.

SANTOSH KR. SHARM 22


UNIT-14: CORRELATION Imp
Meaning of Correlation
• It is a statistical measure used to determine relationship of two variables.
• Both the variables move together and are dependent on each other.
• Correlation shows the nature and extent of linear relationship between the variables only.
• For example, to find the relation between age and knowledge by correlation. As we know knowledge
increases with age. Generally, older persons are more knowledgeable. Thus, there is a linear relationship
between the two variables age and knowledge.
• The correlation coefficient measures the degree of association between two variables X and Y.

Practical application of Correlation


The primary purpose of correlation is to establish an association between any two random variables. The presence
of association does not imply causation, But the existence of causation certainly implies association.
• These are most commonly used techniques to find relationship between two given variables.
• Correlation quantifies the strength of linear relationship between variables.
• For example, relationship between age and knowledge can be found by correlation
• A value of correlation coefficient close to +1 shows strong positive linear relationship and if it is -1 it shows a
negative relationship.
• A value close to zero shows no relationship which is also known as null hypothesis.
• The value of correlation coefficient always lies between -1 and +1.
• Another major application of correlation is in forecasting with the help of Time series models.

SANTOSH KR. SHARM 23


UNIT-15: REGRESSION

Meaning of Regression
• Regression is a statistical tool which shows the nature of relationship between variables.
• It describes how an independent variable is related to the other numerically.
• It shows the cause-and-effect relationship between the variables.
• Both the variables are independent of each other.
• It indicates the impact of independent variable on dependent variable.
• For example, finding the relationship between age and knowledge. Regression tells us how much knowledge
would be increased with growing age. Thus, it shows the degree of variation between variables.

Distinguish between Correlation and Regression (June 2022)


Imp
Correlation Regression
Correlation is the relationship between two or more It shows the nature of relationship between
variables. variables.
Both the variables are dependent on each other. Here, both the variables are not dependent on each
other.
It may or may not shows cause and effect It shows cause and effect relationship.
relationship between variables.
Here, the relationship is expressed in numbers. It expresses the relationship in the form of equations.
The value of correlation coefficient always lies It can take any number.
between -1 and +1.
Correlation has a narrow application as it studies On the other hand, regression has a wider
only linear relationship. application as it studies both linear and non-linear
relationships.

SANTOSH KR. SHARM 24


UNIT-16: TIME SERIES ANALYSIS Imp
Meaning of Time Series
• When quantitative data are arranged in the order of their occurrence, the resulting statistical series is called a
time series.
• A time series consists of statistical data which are collected and recorded over successive time.
• It helps to understand the past behaviour of the variables.
• It helps in forecasting the future behaviour of variables.
• It helps to make comparisons between different time series.
• It helps researchers, business organizations and governments to frame future growth strategies.
• Time series has 4 main components such as:
a) Trend variation -T
b) Seasonal variation- S
c) Cyclic variation- C
d) Irregular variation- I
• There are two models of time series, such as:
1. Additive model in this model the 4 components of time series are added. They are independent of each
other. Y = T + S + C + I
2. Multiplicative model in this model the 4 components of time series are multiplied. They are
interdependent on each other. Y = T × S × C × I
• Time series is also used in several non-financial contexts, such as measuring the change in population over time.
The figure below depicts such a time series for the growth of the U.S. population over the century from 1900-
2000.

SANTOSH KR. SHARM 25

You might also like