0% found this document useful (0 votes)
21 views

Statistics PDF

notes of stastics

Uploaded by

mda958605
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views

Statistics PDF

notes of stastics

Uploaded by

mda958605
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

What is statistic?

Statistics is the study of the collection, analysis, interpretation, presentation, and organization of data. In
other words, it is a mathematical discipline to collect, summarize data. Also, we can say that statistics is a
branch of applied mathematics. However, there are two important and basic ideas involved in statistics;
they are uncertainty and variation. The uncertainty and variation in different fields can be determined only
through statistical analysis. These uncertainties are basically determined by the probability that plays an
important role in statistics.

how is statistics used in business?

Statistics is used in business for: appraisal of value, consumer surveys, hiring decisions, insurance,
manufacturing, online business, real estate investing, rental housing, sales, and stock markets. Data
analysis, regression, forecasting, hypothesis testing, and more are used in these fields.

Of course, statistics is a tool that serves several purposes. It can give you insight into business operations,
help you examine what went well (or what went wrong), and make predictions about the future.

What Is Business Statistics?

Business statistics is a method of using statistics to gain valuable information from the data available to a
company. Various techniques and principles of statistics are applied to gain insights that help to make
better decisions. It is a method of using numerical data that they collect from various sources. The
information can come from surveys, experiments or other information systems in the company. It helps
Organizations understand the reasons for various events in the present and predict the future. It can be
used in marketing, production planning, human resource planning, finance, etc.

What Is Business Statistics?

Business statistics is a sub-field within the broader field of statistics. It consists of creating and
interpreting numerical data from various sources, such as surveys, experiments, or business
information systems. Business Statistics is an analysis that deals with multiple aspects of society. It
is a subject that requires statistical probabilities and logic, making it possible for you to use data. It
is reportedly one of the most demanding subjects to understand and study.

Why Is Business Statistics Important?

Companies benefit from seeing patterns in their activity. It can be done with business statistics. Looking
at past sales patterns, an organization can predict sales volumes in various situations. Using this
technique, one can determine if the company’s business proposition is viable. It is something that affects
the performance of the whole company. Businesses can also find out if a particular marketing campaign
has helped to attract more customers. It will help them in planning future campaigns in a better way.
Business statistics is the foundation for business analytics.

To understand ways to optimise a team’s performance, the company must first know their present
productivity levels and weak areas. It can be gathered from data they have already generated from
previous projects. But simply collecting data will not give the management any idea about improving the
performance. It is where business statistics help. They must analyse and interpret the data using statistical
methods. Though software programmes can do this, a human mind is needed to understand the
significance of the analysis and take necessary action.

Characteristics of Statistics

The important characteristics of Statistics are as follows:

• Statistics are numerically expressed.


• It has an aggregate of facts.
• Data are collected in systematic order
• It should be comparable to each other
• Data are collected for a planned purpose

Importance of Statistics

The important functions of statistics are:

• Statistics helps in gathering information about the appropriate quantitative data


• It depicts the complex data in graphical form, tabular form and in diagrammatic representation to
understand it easily
• It provides the exact description and a better understanding
• It helps in designing the effective and proper planning of the statistical inquiry in any field
• It gives valid inferences with the reliability measures about the population parameters from the
sample data
• It helps to understand the variability pattern through the quantitative observations

To learn more about Statistical terms and formulas, register with BYJU’S – The Learning App today.

Types of Statistics

On a broader scale, statistics is classified into Descriptive Statistics and Inferential Statistics.

Descriptive Statistics

It involves the collection, organization, and presentation of data in such a way that it is easy to understand
and interpret. Descriptive statistics are used to answer questions like, what is the:

• average income of people in a state.


• most common type of car on the road.
• ranges of ages of employees working in a particular company.

It helps us better understand the data we are looking at by summarizing the important information, such as
the highest and lowest value, the middle value (median), and how spread out the data is (range and
variance). These pieces of information help to conclude the group we are studying.
. The three most common types of central tendency are Mean, Median, and Mode.

Inferential Statistics

Inferential statistics is the branch of statistics that makes inferences (or predictions) about the population
dataset based on the sample dataset. It involves hypothesis testing, a process of using statistical methods
to determine whether the hypothesis about the population is likely true.Inferential statistics are widely
used in Scientific & Market Research and social sciences to make predictions, test hypotheses, and make
decisions based on a solid understanding of the data. It also helps to minimize errors and biases in the
result.

What is difference between Population and Sample?

Key Differences Between Descriptive and Inferential Statistics

The difference between descriptive and inferential statistics can be drawn clearly on the following
grounds:

1. Descriptive Statistics is a discipline which is concerned with describing the population under
study. Inferential Statistics is a type of statistics; that focuses on drawing conclusions about the
population, on the basis of sample analysis and observation.
2. Descriptive Statistics collects, organises, analyzes and presents data in a meaningful way. On the
contrary, Inferential Statistics, compares data, test hypothesis and make predictions of the future
outcomes.
3. There is a diagrammatic or tabular representation of final result in descriptive statistics whereas
the final result is displayed in the form of probability.
4. Descriptive statistics describes a situation while inferential statistics explains the likelihood of the
occurrence of an event.
5. Descriptive statistics explains the data, which is already known, to summarise sample.
Conversely, inferential statistics attempts to reach the conclusion to learn about the population;
that extends beyond the data available.

What is a variable?

In programming, a variable is a value that can change, depending on conditions or on information passed
to the program. Typically, a program consists of instruction s that tell the computer what to do and data
that the program uses when it is running. The data consists of constants or fixed values that never change
and variable values (which are usually initialized to "0" or some default value because the actual values
will be supplied by a program's user). Usually, both constants and variables are defined as certain data
type s. Each data type prescribes and limits the form of the data. Examples of data types include: an
integer expressed as a decimal number, or a string of text characters, usually limited in length.

What Is a Constant in Math?

A “constant” simply means a fixed value or a value that does not change. A constant has a known
value.

What Is a Frequency Distribution?

A frequency distribution is a representation, either in a graphical or tabular format, that displays the
number of observations within a given interval. The frequency is how often a value occurs in an interval
while the distribution is the pattern of frequency of the variable.

The interval size depends on the data being analyzed and the goals of the analyst. The intervals must
be mutually exclusive and exhaustive. Frequency distributions are typically used within a statistical
context. Generally, frequency distributions can be associated with the charting of a normal distribution.

Cumulative Frequency

Cumulative frequency is the total of a frequency and all frequencies in a frequency distribution until a
certain defined class interval. Cumulative frequency is used to determine the number of observations that
lie above (or below) a particular value in a data set. The cumulative frequency is calculated using a
frequency distribution table, which can be constructed from stem and leaf plots or directly from the data.

The cumulative frequency is calculated by adding each frequency from a frequency distribution table to
the sum of its predecessors. The last value will always be equal to the total for all observations, since all
frequencies will already have been added to the previous total.
What is relative frequency distribution?

A related distribution is known as a relative frequency distribution, which shows the relative frequency of
each value in a dataset as a percentage of all frequencies.

A relative frequency distribution shows the proportion of the total number of observations associated with
each value or class of values and is related to a probability distribution, which is extensively used in
statistics.

For example, in the previous table we saw that there were 400 total households. To find the relative
frequency of each value in the distribution, we simply divide each individual frequency by 400:

Why Relative Frequency Distributions Are Useful?

Relative frequency distributions are useful because they allow us to understand how common a value is in
a dataset relative to all other values.

In the previous example we saw that 150 households had just one pet. But this number by itself isn’t
particularly useful.

Instead, knowing that 37.5% of all households in the sample had just one pet is more useful to know.
This helps us understand that a little more than 1 in 3 households had just one pet, which gives us some
perspective on how “common” it is to own just one pet.

What is contingency table?


in statistics, a contingency table (also known as a cross tabulation or crosstab) is a type of table in a
matrix format that displays the multivariate frequency distribution of the variables. They are heavily
used in survey research, business intelligence, engineering, and scientific research. They provide a basic
picture of the interrelation between two variables and can help find interactions between them. The
term contingency table was first used by Karl Pearson in "On the Theory of Contingency and Its Relation
to Association and Normal Correlation",[1]

What is histogram?

In statistics, a histogram is a graphical representation of the distribution of data. The histogram is


represented by a set of rectangles, adjacent to each other, where each bar Represent a kind of data.
Statistics is a stream of mathematics that is applied in various fields. When numerals are repeated in
statistical data, this repetition is known as Frequency and which can be written in the form of a table,
called a frequency distribution. A Frequency distribution can be shown graphically by using different
types of graphs and a Histogram is one among them. In this article, let us discuss in detail about what is
a histogram, how to create the histogram for the given data, different types of the histogram, and the
difference between the histogram and bar graph in detail.

A histogram is a graphical presentation of data using rectangular bars of different heights. In a


histogram, there is no space between the rectangular bars.
What is Pie chart?

A pie chart, sometimes called a circle chart, is a way of summarizing a set of nominal data or displaying
the different values of a given variable (e.g. percentage distribution). This type of chart is a circle divided
into a series of segments. Each segment represents a particular category. The area of each segment is the
same proportion of a circle as the category is of the total data set.

Pie chart usually shows the component parts of a whole. Sometimes you will see a segment of the
drawing separated from the rest of the pie in order to emphasize an important piece of information. This is
called an exploded pie chart. Chart 5.4.1 is an example of an exploded pie chat

A pie chart is a type of graph that visually displays data in a circular chart. It records data in a circular
manner and then it is further divided into sectors that show a particular part of data out of the whole part

What is Bar Graphs ?

Bar graphs represent data using rectangular bars of uniform width along with equal spacing between
the rectangular bars.

What is sampling?
In statistics, quality assurance, and survey methodology, sampling is the selection of a subset or
a statistical sample (termed sample for short) of individuals from within a statistical population to
estimate characteristics of the whole population. Statisticians attempt to collect samples that are
representative of the population. Sampling has lower costs and faster data collection compared
to recording data from the entire population, and thus, it can provide insights in cases where it is
infeasible to measure an entire population.
What is Levels of measurement?

Levels of measurement also called scales of measurement, tell you how precisely variables are
recorded. In scientific research, a variable is anything that can take on different values across
your data set (e.g., height or test scores).
There are 4 levels of measurement:

• Nominal: the data can only be categorized


• Ordinal: the data can be categorized and ranked
• Interval: the data can be categorized, ranked, and evenly spaced
• Ratio: the data can be categorized, ranked, evenly spaced, and has a natural zero.

Depending on the level of measurement of the variable, what you can do to analyze your data
may be limited. There is a hierarchy in the complexity and precision of the level of
measurement, from low (nominal) to high (ratio).

Why are levels of measurement important?

Level of measurement is important, as it determines the type of statistical analysis you can carry

out. As a result, it affects both the nature and the depth of insights you’re able to glean from

your data.

Certain statistical tests can only be performed where more precise levels of measurement have

been used, so it’s essential to plan in advance how you’ll gather and measure your data.

3. What are the four levels of measurement? Nominal, ordinal, interval, and ratio scales
explained

There are four types of measurement (or scales) to be aware of: nominal, ordinal, interval,

and ratio.

Each scale builds on the previous, meaning that each scale not only “ticks the same boxes” as

the previous scale, but also adds another level of precision.


The four levels of measurement displayed in a table: Nominal, ordinal, interval, and ratio

Let’s go through each in turn to give you an idea of what they are, and how they interact.

Nominal

The nominal scale simply categorizes variables according to qualitative labels (or names).

These labels and groupings don’t have any order or hierarchy to them, nor do they convey any

numerical value.
For example, the variable “hair color” could be measured on a nominal scale according to the

following categories: blonde hair, brown hair, gray hair, and so on..

Ordinal

The ordinal scale also categorizes variables into labeled groups, and these categories have an

order or hierarchy to them.

For example, you could measure the variable “income” on an ordinal scale as follows:

• low income

• medium income

• high income.

Another example could be level of education, classified as follows:

• high school

• master’s degree

• doctorate
These are still qualitative labels (as with the nominal scale), but you can see that they follow a

hierarchical order..

Interval

The interval scale is a numerical scale which labels and orders variables, with a known, evenly

spaced interval between each of the values.

A commonly-cited example of interval data is temperature in Fahrenheit, where the difference

between 10 and 20 degrees Fahrenheit is exactly the same as the difference between, say, 50

and 60 degrees Fahrenheit..

Ratio

The ratio scale is exactly the same as the interval scale, with one key difference: The ratio

scale has what’s known as a “true zero.”

What is Mean?

Mean is an essential concept in mathematics and statistics. The mean is the average or the most common
value in a collection of numbers.
In statistics, it is a measure of central tendency of a probability distribution along median and mode. It is
also referred to as an expected value.

It is a statistical concept that carries a major significance in finance. The concept is used in various
financial fields, including but not limited to portfolio management and business valuation.

How to Calculate Mean?

There are different ways of measuring the central tendency of a set of values. There are multiple ways to
calculate the mean. Here are the two most popular ones:

Arithmetic mean is the total of the sum of all values in a collection of numbers divided by the number of
numbers in a collection. It is calculated in the following way:

In finance, the arithmetic mean may be misleading in the calculations of returns, as it does not consider
the effects of volatility and compounding, producing an inflated value for the central point of the
distribution.
Mode Definition in Statistics

A mode is defined as the value that has a higher frequency in a given set of values. It is the value that
appears the Most number of times.

Example: In the given set of data: 2, 4, 5, 5, 6, 7, the mode of the data set is 5 since it has appeared in the
set twice.

Statistics deals with the presentation, collection and analysis of data and information for a particular
purpose. We use tables, graphs, pie charts, bar graphs, pictorial representation, etc. After the proper
organization of the data, it must be further analyzed to infer helpful information.

For this purpose, frequently in statistics, we tend to represent a set of data by a representative value that
roughly defines the entire data collection. This representative value is known as the measure of central
tendency. By the name itself, it suggests that it is a value around which the data is centred. These
measures of central tendency allow us to create a statistical summary of the vast, organized data. One
such measure of central tendency is the mode of data.

Weighted Mean Formula

Weighted Mean is an average computed by giving different weights to some of the individual values. If all
the weights are equal, then the weighted mean is the same as the arithmetic mean.

It represents the average of a given data. The Weighted mean is similar to the arithmetic mean or sample
mean. The Weighted mean is calculated when data is given in a different way compared to the arithmetic
mean or sample mean.

Weighted means generally behave in a similar approach to arithmetic means, they do have a few counter-
instinctive properties. Data elements with a high weight contribute more to the weighted mean than the
elements with a low weight.

The weights cannot be negative. Some may be zero, but not all of them; since division by zero is not
allowed. Weighted means play an important role in the systems of data analysis, weighted differential and
integral calculus.

What is the Geometric Mean?

The geometric mean is a measure of central tendency that averages a set of products. Its formula takes the
nth root of the product of n numbers.

Like the arithmetic mean, the geometric mean finds the center of a dataset. While the arithmetic mean
finds the center by summing the values and dividing by the number of observations, the geometric mean
finds the center by multiplying and then taking a root of the product.

Based on the calculation methods, the arithmetic mean is the better statistic when adding data is
appropriate, while the geometric mean is better when you need to multiply the data.

Geometric Mean Formula for Ungrouped Data: GM=x1×x2×x3…..xn−−−−−−−−−−−−−−−−√n


GM=(x1×x2×x3…..xn)1n

Measures of Dispersion

In statistics, the measures of dispersion help to interpret the variability of data i.e. to know how much
homogenous or heterogeneous the data is. In simple terms, it shows how squeezed or scattered the
variable is.

Types of Measures of Dispersion

There are two main types of dispersion methods in statistics which are:

• Absolute Measure of Dispersion


• Relative Measure of Dispersion

Absolute Measure of Dispersion

An absolute measure of dispersion contains the same unit as the original data set. The absolute dispersion
method expresses the variations in terms of the average of deviations of observations like standard or
means deviations. It includes range, standard deviation, quartile deviation, etc.

The types of absolute measures of dispersion are:

1. Range: It is simply the difference between the maximum value and the minimum value given in
a data set. Example: 1, 3,5, 6, 7 => Range = 7 -1= 6
2. Variance: Deduct the mean from each data in the set, square each of them and add each
square and finally divide them by the total no of values in the data set to get the variance.
Variance (σ2) = ∑(X−μ)2/N
3. Standard Deviation: The square root of the variance is known as the standard deviation i.e.
S.D. = √σ.
4. Quartiles and Quartile Deviation: The quartiles are values that divide a list of numbers into
quarters. The quartile deviation is half of the distance between the third and the first quartile.
5. Mean and Mean Deviation: The average of numbers is known as the mean and the arithmetic
mean of the absolute deviations of the observations from a measure of central tendency is
known as the mean deviation (also called mean absolute deviation).

Relative Measure of Dispersion

The relative measures of dispersion are used to compare the distribution of two or more data sets. This
measure compares values without units. Common relative dispersion methods include:

1. Co-efficient of Range
2. Co-efficient of Variation
3. Co-efficient of Standard Deviation
4. Co-efficient of Quartile Deviation
5. Co-efficient of Mean Deviation

What is range?

The range in statistics for a given data set is the difference between the highest and lowest values. For
example, if the given data set is {2,5,8,10,3}, then the range will be 10 – 2 = 8.

Thus, the range could also be defined as the difference between the highest observation and lowest
observation. The obtained result is called the range of observation. The range in statistics
represents the spread of observations. Range Formula

The formula of the range in statistics can simply be given by the difference between the highest and
lowest values.

Range = Highest Value – Lowest Value


Range = Highest observation – Lowest observation

Range = Maximum value – Minimum Value

What Is Variance?

The term variance refers to a statistical measurement of the spread between numbers in a data set. More
specifically, variance measures how far each number in the set is from the mean (average), and thus from
every other number in the set. Variance is often depicted by this symbol: σ2. It is used by both analysts
and traders to determine volatility and market security.

The square root of the variance is the standard deviation (SD or σ), which helps determine the consistency
of an investment’s returns over a period of time.

What is Standard deviation?

Standard Deviation is a measure which shows how much variation (such as spread, dispersion, spread,)
from the mean exists. The standard deviation indicates a “typical” deviation from the mean. It is a popular
measure of variability because it returns to the original units of measure of the data set. Like the variance,
if the data points are close to the mean, there is a small variation whereas the data points are highly spread
out from the mean, then it has a high variance. Standard deviation calculates the extent to which the
values differ from the average. Standard Deviation, the most widely used measure of dispersion, is based
on all values. Therefore a change in even one value affects the value of standard deviation. It is
independent of origin but not of scale. It is also useful in certain advanced statistical problems.
Absolute Measures of Dispersion

If the dispersion of data within an experiment has to be determined then absolute measures of dispersion
should be used. These measures usually express variations in a data set with respect to the average of the
deviations of the observations. The most commonly used absolute measures of deviation are listed below.

Range: Given a data set, the range can be defined as the difference between the maximum value and the
minimum value.

Variance: The average squared deviation from the mean of the given data set is known as the variance.
This measure of dispersion checks the spread of the data about the mean.

Standard Deviation: The square root of the variance gives the standard deviation. Thus, the standard
deviation also measures the variation of the data about the mean.

Mean Deviation: The mean deviation gives the average of the data's absolute deviation about the central
points. These central points could be the mean, median, or mode.

Quartile Deviation: Quartile deviation can be defined as half of the difference between the third quartile
and the first quartile in a given data set.

What Is a Percentile in Statistics?

In statistics, a percentile is a term that describes how a score compares to other scores from the same set.
While there is no universal definition of percentile, it is commonly expressed as the percentage of values
in a set of data scores that fall below a given value.

How to Calculate Percentile?

You can calculate percentiles in statistics using the following formula:


For example:

Imagine you have the marks of 20 students. Now, try to calculate the 90th percentile.

Step 1: Arrange the score in ascending order.


Step 2: Plug the values in the formula to find n.

What is Decile?

Decile, percentile, quartile, and quintile are different types of quantiles in statistics. A quantile refers to a
value that divides the observations in a sample into equal subsections. There will always be 1 lesser
quantile than the number of subsections created.
Decile Formula

Decile Example

Suppose a data set consists of the following numbers: 24, 32, 27, 32, 23, 62, 45, 80, 59, 63, 36, 54, 57, 36,
72, 55, 51, 32, 56, 33, 42, 55, 30. The value of the first two deciles has to be calculated. The steps
required are as follows:

• Step 1: Arrange the data in increasing order. This gives 23, 24, 27, 30, 32, 32, 32, 33, 36, 36, 42,
45, 51, 54, 55, 55, 56, 57, 59, 62, 63, 72, 80.
• Step 2: Identify the total number of points. Here, n = 23
• Step 3: Apply the decile formula to calculate the position of the required data point. D(1) =
(n+1)10

• = 2.4. This implies the value of the 2.4th data point has to be determined. This will lie between the
scores in the 2nd and 3rd positions. In other words, the 2.4th data is 0.4 of the way between the scores 24
and 27
• Step 4: The value of the decile can be determined as [lower score + (distance)(higher score - lower
score)]. This is given as 24 + 0.4 * (27 – 24) = 25.2
• Step 5: Apply steps 3 and 4 to determine the rest of the deciles. D(2) = 2(n+1)10 = 4.8th data between
digit number 4 and 5. Thus, 30 + 0.8 * (32 – 30) = 31.6

What Is a Quartile?

A quartile is a statistical term that describes a division of observations into four defined intervals based on
the values of the data and how they compare to the entire set of observations. Quartiles are organized into
lower quartiles, median quartiles, and upper quartiles.

When the data points are arranged in increasing order, data are divided into four sections of 25% of the
data each

There are three quartile values—a lower quartile, median, and upper quartile—which divide the data set
into four ranges, each containing 25% of the data points:

• First quartile: The set of data points between the minimum value and the first quartile.
• Second quartile: The set of data points between the lower quartile and the median.
• Third quartile: The set of data between the median and the upper quartile.
• Fourth quartile: The set of data points between the upper quartile and the maximum value of the
data set

Calculating Quartiles Manually

Quartile manual calculation requires more effort as there are formulas involved. Using the same values as
in the spreadsheet example:

• 59, 60, 65, 65, 68, 69, 70, 72, 75, 75, 76, 77, 81, 82, 84, 87, 90, 95, 98

Using the following formulas, you calculate each quartile:

• First Quartile (Q1) = (n + 1) x 1/4


• Second Quartile (Q2), or the median = (n + 1) x 2/4
• Third Quartile (Q3) = (n + 1) x 3/4

Where n is the number of integers in your dataset, and the result is the position of the number in the
sequence dataset. So:

• First Quartile (Q1) = 20 x 1/4 = 5


• Second Quartile (Q2) = 20 x 2/4 = 10
• Third Quartile (Q3) = 20 x 3/4 = 15

Here, we have the Q1 (fifth) value of 68, the Q2 (tenth and the median) value of 75, and the Q3 (fifteenth)
value of 84. The results differ slightly from the spreadsheet results because the spreadsheet calculates
them differently. Your graph would then look like this:

Interquartile Range Formula

The difference between the upper and lower quartile is known as the interquartile range. The formula for
the interquartile range is given below

Interquartile range = Upper Quartile – Lower Quartile = Q3 – Q1

where Q1 is the first quartile and Q3 is the third quartile of the series.

The below figure shows the occurrence of median and interquartile range for the data set.
Semi Interquartile Range

The semi-interquartile range is defined as the measures of dispersion. Semi interquartile range also is
defined as half of the interquartile range. It is computed as one half the difference between the 75th
percentile (Q3) and the 25th percentile (Q1). The semi-interquartile range is one-half of the difference
between the first and third quartiles. The Formula for Semi Interquartile Range is

Semi Interquartile Range = (Q3– Q1) / 2

Median and Interquartile Range

The median is the middle value of the distribution of the given data. The interquartile range (IQR) is the
range of values that resides in the middle of the scores. When a distribution is skewed, and the median is
used instead of the mean to show a central tendency, the appropriate measure of variability is the
Interquartile range.

Q1 – Lower Quartile Part

Q2 – Median

Q3 – Upper Quartile Part

It is a measure of dispersion based on the lower and upper quartile. Quartile deviation is obtained from
interquartile range on dividing by 2, hence also known as semi interquartile range.

Interquartile Range Example

Question:
Determine the interquartile range value for the first ten prime numbers.

Solution:

Given: The first ten prime numbers are:

2, 3, 5, 7, 11, 13, 17, 19, 23, 29

This is already in increasing order.

Here the number of values = 10

10 is an even number. Therefore, the median is mean of 11 and 13

That is Q2 = (11 + 13)/2 = 24/2 = 12.

Now we have to get two parts i.e. lower half to find Q1 and the upper half to find Q3.

Q1 part : 2, 3, 5,7,11

Here the number of values = 5

5 is an odd number. Therefore, the center value is 5, that is Q1= 5

Q3 part : 13, 17, 19, 23, 29

Here the number of values = 5

5 is an odd number. Therefore, the center value is 19, that is Q3= 19

The subtraction of Q1 and Q3 value is 19 – 5 = 11

Therefore, 11 is the interquartile range value.

Register with BYJU’S – The Learning App and also download the app for more interesting and engaging
videos.

What is Probability?

Probability denotes the possibility of the outcome of any random event. The meaning of this term is to
check the extent to which any event is likely to happen. For example, when we flip a coin in the air, what
is the possibility of getting a head? The answer to this question is based on the number of possible
outcomes. Here the possibility is either head or tail will be the outcome. So, the probability of a head to
come as a result is 1/2.

The probability is the measure of the likelihood of an event to happen. It measures the certainty of the
event. The formula for probability is given by;
P(E) = Number of Favourable Outcomes/Number of total outcomes

P(E) = n(E)/n(S)

Here,

n(E) = Number of event favourable to event E

n(S) = Total number of outcomes

Types of Probability

Depending on the nature of the outcome or the method used to calculate the chance of an event occurring,
several views or types of probabilities may exist. There are four major different types of probabilities:

• Classical Probability
• Axiomatic Probability
• Subjective Probability
• Empirical Probability

Let us understand each type of probability one by one,

Classical Probability

Classical probability which is also known as the theoretical probability which further states that if there
are B equally likely outcomes in an experiment and event X has exactly A of them,

Then the probability of X is A/B, or P(X) = A/B.

It entails tossing a coin or rolling dice. It’s computed by making a list of all the possible outcomes of the
activity and keeping track of what actually happens. When throwing a coin, for example, the possible
outcomes are heads or tails. If you toss the coin ten times, you must keep track of which outcome
occurred each time.

Axiomatic Probability

A series of rules or axioms by Kolmogorov are applied to all kinds of axiomatic probability. The
probability of each event occurring or not occurring can be calculated using these axioms, which are
written as,

• The smallest and greatest probabilities are 0 and one, respectively.


• The chance of a specific happening is equal to one.
• Only one of two mutually exclusive events can occur at the same time, according to the union of
events.

Subjective Probability
Subjective probability takes into account a person’s personal belief on the probability of an event
occurring. For example, a fan’s opinion on the probability of a specific side winning a football match is
based on their personal beliefs and feelings rather than a rigorous quantitative calculation.

Empirical Probability

Through thinking experiments, the empirical probability or experimental perspective evaluates


probability. If a weighted die is rolled and we don’t know which side has the weight, we can gain an idea
of the probability of each outcome by rolling the die a certain number of times and counting the
proportion of times the die produces that outcome, and therefore find the probability of that outcome.

Conclusion:

• Probability is a metric for determining the probability of an event occurring.


• Probability is constantly between 0 and 1 and is expressed as a fraction.
• Probability is of 4 major types and they are, Classical Probability, Empirical Probability,
Subjective Probability, Axiomatic Probability.
• The probability of an occurrence is the chance that it will happen. Any event’s probability is a
number between (and including) “0” and “1.”

What is Intersection of Sets?

The intersection of sets A and B is the set of all elements which are common to both A and B.

Suppose A is the set of even numbers less than 10 and B is the set of the first five multiples of 4, then the
intersection of these two can be identified as given below:

A = {2, 4, 6, 8}

B = {4, 8, 12, 16, 20}

The elements common to A and B are 4 and 8.

Therefore, the set of elements in the intersection A and B = {4, 8}

What is union?

Union of two or more sets is the set containing all the elements of the given sets. Union of sets can be
written using the symbol “⋃”. Suppose the union of two sets X and Y can be represented as X ⋃ Y.

What is collectively exhaustive?

In probability theory and logic, a set of events is jointly or collectively exhaustive if at least one of the
events must occur. For example, when rolling a six-sided die, the events 1, 2, 3, 4, 5, and 6 balls of a
single outcome are collectively exhaustive, because they encompass the entire range of possible
outcomes.

Examples of Collectively Exhaustive Events


If you are rolling a six-sided die, the set of events {1, 2, 3, 4, 5, 6} is collectively exhaustive. Any roll
must be represented by one of the set.

Sometimes a small change can make a set that is not collectively exhaustive into one that is. A random
integer generated by a computer may be greater than or less than 5, but those are not collectively
exhaustive options. Changing one option to “greater than or equal to five” or adding five as an option
makes the set fit our criteria

What is Independent events?

Independent events are those events whose occurrence is not dependent on any other event. For
example, if we flip a coin in the air and get the outcome as Head, then again if we flip the coin but this
time we get the outcome as Tail. In both cases, the occurrence of both events is independent of each
other. It is one of the types of events in probability. Let us learn here the complete definition of
independent events along with its Venn diagram, examples and how it is different from mutually
exclusive events.#

Difference between Mutually exclusive and independent events


Mutually exclusive events Independent events
When the occurrence is not simultaneous for When the occurence of one event does not control the
two events then they are termed as Mutually happening of the other event then it is termed as an
exclusive events. independent event.
The non-occurrence of an event will end up in There is no influence of an occurrence with another and
the occurrence of an event. they are independent of each other.
The mathematical formula for mutually
The mathematical formula for independent events can
exclusive events can be represented as P(X and
be defined as P(X and Y) = P(X) P(Y)
Y) = 0
The sets will overlap in the case of independent event
The sets will not overlap in the case of mutually
exclusive events.
What is complement?
The complement of an event A is the set of all outcomes in the sample space that are not in A. The
complement of A is denoted by AC and is read “not A

.”
EXAMPLE

Suppose a coin is flipped two times. Previously, we found the sample space for this
experiment: S={HH,HT,TH,TT}

where H is heads and T

is tails.

1. What is the complement of the event “exactly one head”?


2. What is the complement of the event “at least one tail.”

Solution:

1. The event “exactly one head” consists of the outcomes HT

and TH. The complement of “exactly one head” consists of the outcomes HH and TT. These are the
outcomes in the sample space S
• that are NOT in the original event “exactly one head.”

• The event “at least one tail” consists of the outcomes HT, TH, and TT. The complement of “at least
one tail” consists of the outcomes HH. These are the outcomes in the sample space S

2. that are NOT in the original event “at least one tail.”

C:\Program Files\Microsoft Office\Offic


Contingency Table

Contingency Table:

A contingency table is a tabular representation of categorical data . A contingency table usually shows
frequencies for particular combinations of values of two discrete random variable s X and Y. Each cell in
the table represents a mutually exclusive combination of X-Y values.

For example, consider a sample of N=200 beer-drinkers. For each drinker we have information on sex
(variable X, taking on 2 possible values: “Male” and “Female”) and preferred category of beer (variable
Y, taking on 3 possible values: “Light”, “Regular”, “Dark”). A contingency table for these data might
look like the following

Light Regular Dark Total

Male 20 40 50 110

Female 50 20 20 90

Total: 70 60 70 200

This is a two-way 2×3 contingency table (i.e. two rows and three columns).

Sometimes three-way (and more) contingency tables are used. Suppose the beer-drinkers data, besides sex
and preference, are also stratified by age group. The third discrete variable Z (“Age”) in this case might,
for example, take on 4 values: “65”.

In this case we would have a three-way 2x3x4 contingency table, equivalent to 4 two-way 2×3
contingency tables (one 2×3 table for each of the 4 age-groups).

Regression and correlation

Regression and correlation are two of the most powerful and versatile statistical tools we can use to
solve common business problems. They are based on the belief that we can identify and quantify some
functional relationship between two or more variables. One variable is said to depend on another. We
might say Y depends on X where Y and X are any two variables.

Since Y depends on X, Y is the dependent variable and X is the independent variable. It is important to
identify which is the dependent variable and which is the independent variable in the regression model.
This depends on logic and what the statistician is trying to measure. The dean of the college wishes to
examine the relationship between
students’ grades and the time they spend studying. Data are collected on both variables. It is only
logical to presume that grades depend on the amount of quality time students spend with the books!
Thus, “grades” is the dependent variable and “time” is the independent variable

What is Regression?
Regression is a statistical method that tries to determine the strength and character of the
relationship between one dependent variable and a series of other variables. It is used in
finance, investing, and other disciplines.

What Are the Assumptions That Must Hold for Regression Models?

To properly interpret the output of a regression model, the following main assumptions about
the underlying data process of what you are analyzing must hold:

• The relationship between variables is linear;


• There must be homoskedasticity, or the variance of the variables and error term must
remain constant;
• All explanatory variables are independent of one another;
• All variables are normally distributed.

What is correlation?
Correlation is a statistical measure that expresses the extent (পরিমাণ) to which two variables are linearly
related (meaning they change together at a constant rate). It’s a common tool for describing simple
relationships without making a statement about cause and effect.

What is a variable?

A variable is any kind of attribute or characteristic that you are trying to measure,
manipulate and control in statistics and research. All studies analyze a variable, which can
describe a person, place, thing or idea. A variable's value can change between groups or
over times

Independent vs. dependent variables


Independent variables Dependent variables

A variable that stands alone and isn't A variable that relies on and can be
Definition changed by other variables or factors that changed by other factors that are
are measured measured

Example Age: Other variables such as where A grade someone gets on an exam
someone lives, what they eat or how much depends on factors such as how much
Independent variables Dependent variables

they exercise are not going to change their sleep they got and how long they
age studied

What is Regression Analysis?

Regression analysis is a set of statistical methods used for the estimation of relationships
between a dependent variable and one or more independent variables. It can be utilized to
assess the strength of the relationship between variables and for modeling the future
relationship between them.

Distinguish between simple regression and multiple regression?

We should distinguish between simple regression and multiple regression. In simple regression, Y is said
to be a function of only one independent variable. Often referred to as bivariate regression because
there are only two variables, one dependent and one independent, simple regression is represented by
Formula (11.1). In a multiple regression model, Y is a function of two or more independent variables. A
regression model with k independent variables can be expressed as
Aspect Linear Correlation Curvilinear Correlation

Follows a straight-line relationship Follows a curved or nonlinear


Pattern
between variables. relationship between variables.

Varies based on the shape of the


Can be positive or negative, indicating the
Direction curve; can be positive, negative, or
strength and direction of the relationship.
mixed.

The correlation coefficient ranges from -1 Difficult to measure precisely due to


Strength to +1, where +1 indicates perfect positive the nonlinear nature; various
Measurement correlation, -1 indicates a perfect negative techniques are used to assess the
correlation, and 0 indicates no correlation. strength.

Predicts changes in one variable based Predicts change, but the relationship
Predictability on changes in another with relative is more intricate, making predictions
accuracy within a linear framework. complex and context-dependent.

The relationship between economic


Height and weight often exhibit linear growth and investment may be
Common
correlation; as height increases, weight curvilinear, with diminishing returns
Examples
tends to increase proportionally. on investment at higher levels of
economic growth.

Graphical Graphed as a straight line on a scatter Graphed as a curve or wave-like


Representation plot. pattern on a scatter plot.

You might also like