0% found this document useful (0 votes)

14 views

Statistics 02

The document discusses statistical series of two variables, also known as bivariate data, which involves analyzing two variables to determine their empirical relationship. There are three main types of relationships between two variables: positive, negative, and no relationship. Cloud points, also known as scatter plots, are used to graphically represent two-variable data. Other topics discussed include average point, covariance, correlation coefficient, regression analysis, and the least squares method for determining the line of best fit for a dataset.

Uploaded by

Lukong Louis

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views

Statistics 02

Uploaded by

Lukong Louis

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Statistics Notes Part 2

Statistical Series of Two Variables

Statistical series of two variables, also known as bivariate data, is a fundamental concept in statistics.
It involves the analysis of two variables to determine the empirical relationship between them.

Definition
A statistical series of two variables consists of pairs of observations corresponding to each individual
or object under study. Each pair of observations includes a value for the first variable and a value
for the second variable. For example, in a study of the relationship between height and weight in a
group of individuals, each individual's height and weight would constitute a pair of observations.

Types of Relationships
There are three main types of relationships that can exist between two variables:

1. Positive Relationship: As the value of one variable increases, the value of the other variable also
increases.
2. Negative Relationship: As the value of one variable increases, the value of the other variable
decreases.
3. No Relationship: There is no apparent pattern between the values of the two variables.

Cloud Points
Cloud points, also known as scatter plots, are graphical representations of two-variable data. Each
point on the plot represents a pair of observations.

Example

Consider the following data representing the heights and weights of five individuals:

Individual Height(cm) Weight(kg)

1 170 65
2 180 75
3 175 70
4 185 80
5 165 60

A scatter plot of this data would place each individual's height and weight as a point on a two-
dimensional graph.
Exercise Given a dataset of students' scores in Mathematics and English, plot a scatter plot to
visualize the relationship between the two scores.

Student Mathematics Score English Score

1 85 78
2 90 92
3 78 81
4 92 88
5 88 85
6 75 80
7 82 78
8 90 92
9 78 75
10 85 88

Average Point
The average point of a two-variable data set is the point whose coordinates are the means of the
respective variables.

Example

Using the same data from the previous example, the average point would be the average height and
the average weight:

Average height = (170 + 180 + 175 + 185 + 165) / 5 = 175 cm

Average weight = (65 + 75 + 70 + 80 + 60) / 5 = 70 kg

So, the average point is (175, 70).

Exercise

Calculate the average point for a dataset representing the ages and incomes of a group of individuals
below.

Individual Age Income($)

1 22 32000
2 25 65000
3 30 45000
4 35 28000
5 40 32000
6 45 60000
7 50 25000
8 55 65000
9 60 40000
10 65 32000

Covariance and Covariance Matrix of 2 Variables

Covariance is a measure of how much two variables change together. If the variables tend to increase
and decrease together, the covariance is positive. If one variable tends to increase when the other
decreases, the covariance is negative. A covariance of 0: No linear relationship.

The covariance matrix of two variables is a matrix that contains the variances of the variables along
the main diagonal, and the covariances between each pair of variables in the other positions.

The covariance of X and Y can be calculated using the formula:

Note that, n is the number of observations. Also, dividing by (n-1) means we are using sample
data(for population data it’s divided by n)

Example: Let's consider two variables, X and Y, with the following data:

X=[2,4,6,8,10]

Y=[1,3,5,7,9]

• Calculate the covariance between X and Y.

The covariance matrix of X and Y is then:

| Cov(X,X) Cov(X, Y) |

| Cov(Y, X) Cov(Y,Y) |

Exercises

The Coefficient of Correlation and Regression

The coefficient of correlation, also known as Pearson's correlation coefficient, is a measure of the
strength and direction of the linear relationship between two variables. It ranges from -1 to 1, where
-1 indicates a perfect negative linear relationship, 1 indicates a perfect positive linear relationship,
and 0 indicates no linear relationship.
Regression analysis is used to model the relationship between two variables. The regression line,
also known as the line of best fit, is the line that minimizes the sum of the squared residuals (the
differences between the observed and predicted values).

Example

Using the same data for X and Y from the previous example, the correlation coefficient can be
calculated using the formula:

The regression line can be calculated using the formula:

𝑦 = 𝑎 + 𝑏𝑥

where a is the y-intercept, b is the slope of the line (which can be calculated as 𝐶𝑜𝑣(𝑋, 𝑌) / 𝑉𝑎𝑟(𝑋)),
and x is the independent variable.

Exercise

Calculate the correlation coefficient of the given data.

Linear Adjustment by the Method of Least Squares

Least Square Method Definition

The least-squares method is a statistical method used to find the line of best fit of the form of an
equation such as y = mx + b to the given data. The curve of the equation is called the regression
line. Our main objective in this method is to reduce the sum of the squares of errors as much as
possible. This is the reason this method is called the least-squares method. This method is often used
in data fitting where the best fit result is assumed to reduce the sum of squared errors that is
considered to be the difference between the observed values and corresponding fitted value. The
sum of squared errors helps in finding the variation in observed data. For example, we have 4 data
points and using this method we arrive at the following graph.

Figure 1: Least square method

The two basic categories of least-square problems are ordinary or linear least squares and nonlinear
least squares.

Limitations for Least Square Method

Even though the least-squares method is considered the best method to find the line of best fit, it
has a few limitations. They are:

• This method exhibits only the relationship between the two variables. All other causes and effects
are not taken into consideration.
• This method is unreliable when data is not evenly distributed.
• This method is very sensitive to outliers. In fact, this can skew the results of the least-squares
analysis.

Least Square Method Graph

Look at the graph below, the straight line shows the potential relationship between the independent
variable and the dependent variable. The ultimate goal of this method is to reduce this difference
between the observed response and the response predicted by the regression line. Less residual
means that the model fits better. The data points need to be minimized by the method of reducing
residuals of each point from the line. There are vertical residuals and perpendicular residuals.
Vertical is mostly used in polynomials and hyperplane problems while perpendicular is used in
general as seen in the image below.
Figure 2: Least square method graph

Least Square Method Formula

Least-square method is the curve that best fits a set of observations with a minimum sum of squared
residuals or errors. Let us assume that the given points of data are (x1, y1), (x2, y2), (x3, y3), …,
(xn, yn) in which all x’s are independent variables, while all y’s are dependent ones. This method is
used to find a linear line of the form y = mx + b, where y and x are variables, m is the slope, and b
is the y-intercept. The formula to calculate slope m and the value of b is given by:

𝑚 = (𝑛∑𝑥𝑦 − ∑𝑦∑𝑥)/𝑛∑𝑥 2 − (∑𝑥)2

𝑏 = (∑𝑦 − 𝑚∑𝑥)/𝑛

Here, n is the number of data points.

Following are the steps to calculate the least square using the above formulas.

• Step 1: Draw a table with 4 columns where the first two columns are for x and y points.
• Step 2: In the next two columns, find xy and (x)2.
• Step 3: Find ∑x, ∑y, ∑xy, and ∑(x)2.
• Step 4: Find the value of slope m using the above formula.
• Step 5: Calculate the value of b using the above formula.
• Step 6: Substitute the value of m and b in the equation y = mx + b

Let us look at an example to understand this better.

Example: Let's say we have data as shown below.

Solution: We will follow the steps to find the linear line.

Find the value of m by using the formula,

m = (n∑xy - ∑y∑x)/n∑x2 - (∑x)2

m = [(5×88) - (15×25)]/(5×55) - (15)2

m = (440 - 375)/(275 - 225)

m = 65/50 = 13/10

Find the value of b by using the formula,

b = (∑y - m∑x)/n

b = (25 - 1.3×15)/5

b = (25 - 19.5)/5

b = 5.5/5

So, the required equation of least squares is y = mx + b = 13/10x + 5.5/5.

Important Notes

• The least-squares method is used to predict the behaviour of the dependent variable with respect to
the independent variable.
• The sum of the squares of errors is called variance.
• The main aim of the least-squares method is to minimize the sum of the squared errors.

Regression Analysis Assignment
100% (1)
Regression Analysis Assignment
8 pages
Homework 1
0% (1)
Homework 1
8 pages
Regression Analysis Assignment
No ratings yet
Regression Analysis Assignment
8 pages
Wind Tunnels and Experimental Fluid Dynamics Research
100% (2)
Wind Tunnels and Experimental Fluid Dynamics Research
724 pages
11august2010 - Correlation and Regression
No ratings yet
11august2010 - Correlation and Regression
7 pages
Unit_6_Machine_Learning_Algorithms
No ratings yet
Unit_6_Machine_Learning_Algorithms
13 pages
1.linear Regression PSP
No ratings yet
1.linear Regression PSP
92 pages
Measures of Dispersion
No ratings yet
Measures of Dispersion
17 pages
UNIT-2 ML
No ratings yet
UNIT-2 ML
39 pages
Regn_lect_3
No ratings yet
Regn_lect_3
10 pages
VII Pearson R
No ratings yet
VII Pearson R
4 pages
Module9-Correlation and Regression (Business)
No ratings yet
Module9-Correlation and Regression (Business)
15 pages
Correlation and Regression
No ratings yet
Correlation and Regression
10 pages
Risk Analytics (IMT)_Chapter 8
No ratings yet
Risk Analytics (IMT)_Chapter 8
29 pages
Asynchronus Learning Module - Sesi 8
No ratings yet
Asynchronus Learning Module - Sesi 8
9 pages
Answered Sheets Combined
No ratings yet
Answered Sheets Combined
52 pages
Introduction To Statistics (4485) : Semester: Spring, 2023
No ratings yet
Introduction To Statistics (4485) : Semester: Spring, 2023
26 pages
Lesson 7 Pearson Product of Moment Coefficient Correlation
No ratings yet
Lesson 7 Pearson Product of Moment Coefficient Correlation
6 pages
LESSON 3FINALS Linear Regression and Correlation
No ratings yet
LESSON 3FINALS Linear Regression and Correlation
8 pages
Regression
No ratings yet
Regression
21 pages
Simple Regression and Correlation Analysis
100% (2)
Simple Regression and Correlation Analysis
27 pages
Lesson 3: Linear Regression and Correlations: Learning Objectives
No ratings yet
Lesson 3: Linear Regression and Correlations: Learning Objectives
8 pages
Decision Science June
No ratings yet
Decision Science June
5 pages
Regression Analysis
No ratings yet
Regression Analysis
54 pages
Module 3 PoM-Forecasting
No ratings yet
Module 3 PoM-Forecasting
5 pages
Correlation and Regration
No ratings yet
Correlation and Regration
8 pages
s&Ml Unit 5- q & A
No ratings yet
s&Ml Unit 5- q & A
15 pages
Regression Analysis
No ratings yet
Regression Analysis
29 pages
Econometrics For Finance
100% (1)
Econometrics For Finance
54 pages
Statistics Assignment 05
50% (2)
Statistics Assignment 05
14 pages
CH. 9 Correlation Rev2
No ratings yet
CH. 9 Correlation Rev2
44 pages
Ba All Notes Merge - Merged
No ratings yet
Ba All Notes Merge - Merged
385 pages
DMJAP-LinearRegression-3
No ratings yet
DMJAP-LinearRegression-3
28 pages
Regression Analysis Assignment
No ratings yet
Regression Analysis Assignment
8 pages
Linear Regression
No ratings yet
Linear Regression
62 pages
lecture 9-10
No ratings yet
lecture 9-10
28 pages
GMATH Regression Analysis
No ratings yet
GMATH Regression Analysis
3 pages
Interactive Lecture Notes 12-Regression Analysis
No ratings yet
Interactive Lecture Notes 12-Regression Analysis
22 pages
2033 Rao Faisal Maqbool Data Maining 2
No ratings yet
2033 Rao Faisal Maqbool Data Maining 2
3 pages
Data Management
No ratings yet
Data Management
31 pages
Sol11 Economics SP 03-Pages-Deleted
No ratings yet
Sol11 Economics SP 03-Pages-Deleted
10 pages
Simple Linear Regression and Correlation - Class Example
No ratings yet
Simple Linear Regression and Correlation - Class Example
18 pages
3.2 Power Point 2
No ratings yet
3.2 Power Point 2
35 pages
Waqar Ansari's RISE QM Ch#08
No ratings yet
Waqar Ansari's RISE QM Ch#08
21 pages
Statistical Analysis Measure of Variation
No ratings yet
Statistical Analysis Measure of Variation
14 pages
Chapter 8 Simple Linear Regression
100% (3)
Chapter 8 Simple Linear Regression
17 pages
Chapter 1 MATHS 2
No ratings yet
Chapter 1 MATHS 2
13 pages
Lecture Four - Measures of Dispersion
No ratings yet
Lecture Four - Measures of Dispersion
6 pages
Lab 05-1
No ratings yet
Lab 05-1
6 pages
Complete - Lesson 2 Corr
No ratings yet
Complete - Lesson 2 Corr
26 pages
Chapter 4 Transformations and Weighting To Correct Model Inadequacies 13 March
No ratings yet
Chapter 4 Transformations and Weighting To Correct Model Inadequacies 13 March
27 pages
Regression Analysis
No ratings yet
Regression Analysis
49 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
95 pages
Reymond Contreras Pearson r and Spearman Rho
No ratings yet
Reymond Contreras Pearson r and Spearman Rho
35 pages
Mathematical Modeling 1
No ratings yet
Mathematical Modeling 1
11 pages
Quanti_Simple-Linear-Regression_with-group-activities
No ratings yet
Quanti_Simple-Linear-Regression_with-group-activities
6 pages
Lesson 18 - Correlation
No ratings yet
Lesson 18 - Correlation
3 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
5 pages
Lecture 5-Association Between Variables-1
No ratings yet
Lecture 5-Association Between Variables-1
20 pages
Fai Module 3
No ratings yet
Fai Module 3
67 pages
Linear Algebra Fundamentals
From Everand
Linear Algebra Fundamentals
Kartikeya Dutta
No ratings yet
Performance control and risk calibration in the Black-Litterman m
No ratings yet
Performance control and risk calibration in the Black-Litterman m
14 pages
09.Statistics Unit-IX 2
No ratings yet
09.Statistics Unit-IX 2
49 pages
Pattern Recognition 2nd Ed. (2009)
No ratings yet
Pattern Recognition 2nd Ed. (2009)
113 pages
The Multivariate Normal Distribution: Exactly Central Limit
No ratings yet
The Multivariate Normal Distribution: Exactly Central Limit
59 pages
Caro 2013
No ratings yet
Caro 2013
9 pages
Optimal Sizing of Energy Storage System and Its Cost-Benefit Analysis For Power Grid Planning With Intermittent Wind Generation
No ratings yet
Optimal Sizing of Energy Storage System and Its Cost-Benefit Analysis For Power Grid Planning With Intermittent Wind Generation
15 pages
A Simple Explanation of Partial Least Squares
No ratings yet
A Simple Explanation of Partial Least Squares
10 pages
Portfolio Models-Introduction: I I I J I I J Ij I II I II I I
No ratings yet
Portfolio Models-Introduction: I I I J I I J Ij I II I II I I
23 pages
1993 - or - A Mickey Mouse Guide To Kalman Filtering
No ratings yet
1993 - or - A Mickey Mouse Guide To Kalman Filtering
8 pages
CFA Using Excel
No ratings yet
CFA Using Excel
5 pages
Paper III - Objective III - ALexis Kabayiza - PHD AGBM
No ratings yet
Paper III - Objective III - ALexis Kabayiza - PHD AGBM
9 pages
Skymind The Math Behind Neural Networks
100% (1)
Skymind The Math Behind Neural Networks
17 pages
Package Caret': R Topics Documented
No ratings yet
Package Caret': R Topics Documented
136 pages
Lavaan
No ratings yet
Lavaan
104 pages
Structural Equation Modeling: Petri Nokelainen
No ratings yet
Structural Equation Modeling: Petri Nokelainen
145 pages
Master's Thesis - Rahaman - MD Anisur
No ratings yet
Master's Thesis - Rahaman - MD Anisur
38 pages
Graded Project
No ratings yet
Graded Project
36 pages
Statistics and Probability For Simulation-2
No ratings yet
Statistics and Probability For Simulation-2
43 pages
Gpop
No ratings yet
Gpop
12 pages
Jntuk Machine Learning 3-2 Unit-4
No ratings yet
Jntuk Machine Learning 3-2 Unit-4
32 pages
Covariance Matrix Applications: Dimensionality Reduction
No ratings yet
Covariance Matrix Applications: Dimensionality Reduction
24 pages
Generalized Elliptical Distributions Theory and Applications (Thesis) - Frahm (2004)
No ratings yet
Generalized Elliptical Distributions Theory and Applications (Thesis) - Frahm (2004)
145 pages
Fortran Folheto V6
0% (1)
Fortran Folheto V6
83 pages
Haili Ma, Adly A. Girgis,: Ieee of Ieee
No ratings yet
Haili Ma, Adly A. Girgis,: Ieee of Ieee
7 pages
1520-0493-1520-0493 1958 086 0117 Anotgd 2 0 Co 2
No ratings yet
1520-0493-1520-0493 1958 086 0117 Anotgd 2 0 Co 2
6 pages
Borehole Surveying PDF
No ratings yet
Borehole Surveying PDF
129 pages
Random Processes I
No ratings yet
Random Processes I
35 pages
Friedlander Weiss 98
No ratings yet
Friedlander Weiss 98
4 pages
Section 2 - Descriptive Multivariate Statistics
No ratings yet
Section 2 - Descriptive Multivariate Statistics
9 pages

Statistics 02

Uploaded by

Statistics 02

Uploaded by

Statistics Notes Part 2

Statistical Series of Two Variables

Individual Height(cm) Weight(kg)

Student Mathematics Score English Score

Average height = (170 + 180 + 175 + 185 + 165) / 5 = 175 cm

Average weight = (65 + 75 + 70 + 80 + 60) / 5 = 70 kg

So, the average point is (175, 70).

Individual Age Income($)

Covariance and Covariance Matrix of 2 Variables

The covariance of X and Y can be calculated using the formula:

• Calculate the covariance between X and Y.

The Coefficient of Correlation and Regression

The regression line can be calculated using the formula:

Calculate the correlation coefficient of the given data.

Linear Adjustment by the Method of Least Squares

Least Square Method Definition

Figure 1: Least square method

Limitations for Least Square Method

Least Square Method Graph

Least Square Method Formula

𝑚 = (𝑛∑𝑥𝑦 − ∑𝑦∑𝑥)/𝑛∑𝑥 2 − (∑𝑥)2

Here, n is the number of data points.

Let us look at an example to understand this better.

Example: Let's say we have data as shown below.

Find the value of m by using the formula,

m = (n∑xy - ∑y∑x)/n∑x2 - (∑x)2

m = [(5×88) - (15×25)]/(5×55) - (15)2

m = (440 - 375)/(275 - 225)

Find the value of b by using the formula,

So, the required equation of least squares is y = mx + b = 13/10x + 5.5/5.

You might also like