Python Tutorial - W3school2 PDF
Python Tutorial - W3school2 PDF
W3schools.com/python
Machine Learning
Mean, Median, and Mode
In Machine Learning (and in mathematics) there are often three values that interests us:
speed = [99,86,87,88,111,86,103,87,94,78,77,85,86]
What is the average, the middle, or the most common speed value?
Mean
To calculate the mean, find the sum of all values, and dived the sum by the number of values:
(99+86+87+88+111+86+103+87+94+78+77+85+86) / 13 = 89.77
Example
Use the NumPy mean() method to find the average speed:
Python Program Output
import numpy C:\Users\My Name>python
demo_ml_mean.py
speed = [99,86,87,88,111,86,103,87,94,78,77,85,86] 89.76923076923077
x = numpy.mean(speed)
print(x)
Median
The median value is the value in the middle, after you have sorted all the values:
77, 78, 85, 86, 86, 86, 87, 87, 88, 94, 99, 103, 111
It is important that the numbers are sorted before you can find the median.
Example
Use the NumPy median() method to find the middle value:
Python Program Output
import numpy C:\Users\My Name>python
demo_ml_median.py
speed = [99,86,87,88,111,86,103,87,94,78,77,85,86] 87.0
x = numpy.median(speed)
print(x)
If there are two numbers in the middle, divide the sum of those numbers by two.
77, 78, 85, 86, 86, 86, 87, 87, 94, 98, 99, 103
Example
Using the NumPy module:
Python Program Output
import numpy C:\Users\My Name>python
demo_ml_median2.py
speed = [99,86,87,88,86,103,87,94,78,77,85,86] 86.5
x = numpy.median(speed)
print(x)
Mode
The Mode value is the value that appears the most number of times:
99, 86, 87, 88, 111, 86, 103, 87, 94, 78, 77, 85, 86 = 86
x = stats.mode(speed)
print(x)
#The mode() method returns a ModeResult object that contains the mode number (86), and
count (how many times the mode number appeared (3)).
Chapter Summary
The Mean, Median, and Mode are techniques that are often used in Machine Learning, so it is
important to understand the concept behind them.
Machine Learning - Standard Deviation
Standard deviation is a number that describes how spread out the values are.
A low standard deviation means that most of the numbers are close to the mean (average)
value.
A high standard deviation means that the values are spread out over a wider range.
speed = [86,87,88,86,87,85,86]
0.9
Meaning that most of the values are within the range of 0.9 from the mean value, which is
86.4.
speed = [32,111,138,28,59,77,97]
37.85
Meaning that most of the values are within the range of 37.85 from the mean value, which is
77.4.
As you can see, a higher standard deviation indicates that the values are spread out over a
wider range.
Example
Use the NumPy std() method to find the standard deviation:
Python Program Output
import numpy C:\Users\My Name>python
demo_ml_numpy_std.py
speed = [86,87,88,86,87,85,86] 0.9035079029052513
x = numpy.std(speed)
print(x)
Example
Python Program Output
import numpy C:\Users\My Name>python
demo_ml_numpy_std2.py
speed = [32,111,138,28,59,77,97] 37.84501153334721
x = numpy.std(speed)
print(x)
Variance
Variance is another number that indicates how spread out the values are.
In fact, if you take the square root of the variance, you get the standard variation!
Or the other way around, if you multiply the standard deviation by itself, you get the variance!
(32+111+138+28+59+77+97) / 7 = 77.4
32 - 77.4 = -45.4
111 - 77.4 = 33.6
138 - 77.4 = 60.6
28 - 77.4 = -49.4
59 - 77.4 = -18.4
77 - 77.4 = - 0.4
97 - 77.4 = 19.6
(-45.4)2 = 2061.16
(33.6)2 = 1128.96
(60.6)2 = 3672.36
(-49.4)2 = 2440.36
(-18.4)2 = 338.56
(- 0.4)2 = 0.16
(19.6)2 = 384.16
(2061.16+1128.96+3672.36+2440.36+338.56+0.16+384.16) / 7 = 1432.2
Luckily, NumPy has a method to calculate the variance:
Example
Use the NumPy var() method to find the variance:
Python Program Output
import numpy C:\Users\My Name>python demo_ml_numpy_var.py
1432.2448979591834
speed = [32,111,138,28,59,77,97]
x = numpy.var(speed)
print(x)
Standard Deviation
As we have learned, the formula to find the standard deviation is the square root of the
variance:
√1432.25 = 37.85
Or, as in the example from before, use the NumPy to calculate the standard deviation:
Example
Use the NumPy std() method to find the standard deviation:
Python Program Output
import numpy C:\Users\My Name>python demo_ml_numpy_std2.py
37.84501153334721
speed = [32,111,138,28,59,77,97]
x = numpy.std(speed)
print(x)
Symbols
Chapter Summary
The Standard Deviation and Variance are terms that are often used in Machine Learning, so it
is important to understand how to get them, and the concept behind them.
Machine Learning - Percentiles
What are Percentiles?
Percentiles are used in statistics to give you a number that describes the value that a given
percent of the values are lower than.
Example: Let's say we have an array of the ages of all the people that lives in a street.
ages = [5,31,43,48,50,41,7,11,15,39,80,82,32,2,8,6,25,36,27,61,31]
What is the 75. percentile? The answer is 43, meaning that 75% of the people are 43 or
younger.
The NumPy module has a method for finding the specified percentile:
Example
Use the NumPy percentile() method to find the percentiles:
Python Program Output
import numpy C:\Users\My Name>python
demo_ml_percentile1.py
ages = 43.0
[5,31,43,48,50,41,7,11,15,39,80,82,32,2,8,6,25,36,27,61,31]
x = numpy.percentile(ages, 75)
print(x)
Example
What is the age that 90% of the people are younger than?
Python Program Output
import numpy C:\Users\My Name>python
demo_ml_percentile2.py
ages = 61.0
[5,31,43,48,50,41,7,11,15,39,80,82,32,2,8,6,25,36,27,61,31]
x = numpy.percentile(ages, 90)
print(x)
Machine Learning - Data Distribution
Data Distribution
Earlier in this tutorial we have worked with very small amounts of data in our examples, just
to understand the different concepts.
In the real world, the data sets are much bigger, but it can be difficult to gather real world data,
at least at an early stage of a project.
To create big data sets for testing, we use the Python module NumPy, which comes with a
number of methods to create random data sets, of any size.
Example
Create an array containing 250 random floats between 0 and 5:
Python Program
import numpy
print(x)
Output
C:\Users\My Name>python demo_ml_numpy_uniform.py
[4.16457941 2.53336934 0.76094645 2.19728824 0.26461522 2.47763846
1.01861707 1.81286031 3.31170377 1.82227842 0.9851678 3.39704211
2.80936846 3.5178459 2.43532755 2.16588249 0.51356737 2.13931298
3.29456667 3.9949609 3.55884565 3.25152112 4.10826858 4.59093062
1.10645521 2.00119659 1.35298074 3.19715447 3.9095812 4.49572829
0.19396857 1.98504038 3.4434233 1.4264503 2.5929941 1.93930881
1.40465862 0.68521082 3.13884087 0.19739132 3.7006942 3.03040889
0.44557704 4.93506348 0.01016715 4.49707411 0.0250856 1.6161289
4.0614196 0.07539926 0.14178923 3.53735644 2.92626772 4.24309409
2.93614483 4.19271678 2.11085992 0.89565608 2.91128253 2.03085369
0.25994798 1.52378501 4.62784889 0.88462656 4.34725502 1.90010131
2.70673256 4.7833187 3.90638155 2.21866015 3.22971977 4.23391232
1.34365916 4.09616657 1.90472694 2.40922049 0.17677846 4.69405223
3.37608853 4.21936793 2.32993106 3.2160566 4.29338672 3.8085986
0.03532943 0.1336674 3.29150384 3.487193 4.83582545 0.77023456
2.9306055 3.45004702 0.95169535 1.59869558 1.99953255 4.45373969
0.46106712 0.77225608 2.5982888 2.41894188 4.7730061 2.49255828
2.67640541 4.81062781 0.18381472 3.8635721 0.72421463 3.29070047
3.21078948 1.97673306 2.23160857 2.14947338 1.57228967 4.03231597
1.93193546 4.83317115 4.57366509 2.72148306 2.76236854 2.45620923
3.27250864 3.27347015 0.20148648 2.74118186 3.00158603 3.50135538
2.75769371 3.21769774 3.76810699 2.05289646 1.41288639 4.97371182
1.87598207 0.17278485 4.27510981 0.31476547 0.00893708 1.04915326
1.54613005 1.91131455 4.12173165 0.64393556 1.49024513 0.35966727
2.38830249 3.59406423 0.67916137 1.18438456 4.4451865 3.99320972
1.53586504 4.86559434 4.867244 4.92217506 3.78949487 1.66934268
4.0403024 3.61716084 4.0901871 1.48687033 1.10239527 0.37455416
2.89031213 3.02845543 2.85232673 2.7275596 4.02031037 2.69293241
2.73244605 3.24139436 4.93317182 3.33097023 1.06817254 0.72802594
0.47194159 4.71601616 0.91228598 0.53578222 4.6864055 1.82696259
2.97684839 4.51509617 2.32623158 4.65218818 0.92864795 2.92965377
1.05175105 4.92930102 1.34231746 3.58343988 2.06728736 2.39001083
1.68120088 3.73902319 0.96690738 2.60878368 4.20396981 1.49623894
2.87431876 4.36249686 0.9025258 3.76298156 3.55854602 4.56100202
4.01188567 3.83115035 4.11706811 2.06614667 1.41638643 2.89719905
2.06946139 1.52044048 3.54159028 3.95656091 0.42960599 1.09079623
2.46292254 4.95074464 3.87291033 2.1211344 3.80070747 0.00888656
4.16287847 2.94661859 3.1512899 2.96793599 2.61313196 3.34480097
4.8391801 0.74660596 3.55424576 4.63494792 2.34374201 4.51295525
4.60275672 2.97788828 3.30910678 1.37742008 0.09007784 4.0066061
3.85646881 0.55971376 0.07674231 1.0299027 3.77871601 3.86643305
3.06371385 4.01894688 2.00470197 2.14495597]
Histogram
To visualize the data set we can draw a histogram with the data we collected.
Example
Draw a histogram:
Python Program
import numpy
import matplotlib.pyplot as plt
plt.hist(x, 5)
plt.show()
Result:
Histogram Explained
We use the array from the example above to draw a histogram with 5 bars.
The first bar represents how many values in the array are between 0 and 1.
The second bar represents how many values are between 1 and 2.
Etc.
Note: The array values are random numbers and will not show the exact same result on your
computer.
An array containing 250 values is not considered very big, but now you know how to create a
random set of values, and by changing the parameters, you can create the data set as big as you
want.
Example
Create an array with 100000 random numbers, and display them using a histogram with 100
bars:
Python Program
import numpy
import matplotlib.pyplot as plt
plt.hist(x, 100)
plt.show()
Output
Machine Learning - Normal Data
Distribution
In the previous chapter we learned how to create a completely random array, of a given size,
and between two given values.
In this chapter we will learn how to create an array where the values are concentrated around a
given value.
In probability theory this kind of data distribution is known as the normal data distribution, or
the Gaussian data distribution, after the mathematician Carl Friedrich Gauss who came up
with the formula of this data distribution.
Example
A typical normal data distribution:
Python Program
import numpy
import matplotlib.pyplot as plt
plt.hist(x, 100)
plt.show()
Output
Note: A normal distribution graph is also known as the bell curve because of it's characteristic
shape of a bell.
Histogram Explained
We use the array from the numpy.random.normal() method, with 100000 values, to draw a
histogram with 100 bars.
We specify that the mean value is 5.0, and the standard deviation is 1.0.
Meaning that the values should be concentrated around 5.0, and rarely further away than 1.0
from the mean.
And as you can see from the histogram, most values are between 4.0 and 6.0, with a top at
approximately 5.0.
Machine Learning - Scatter Plot
Scatter Plot
A scatter plot is diagram where each value in the data set is represented by a dot.
The Matplotlib module has a method for drawing scatter plots, it needs two arrays of the same
length, one for the values of the x-axis, and one for the values of the y-axis:
y = [5,7,8,7,2,17,2,9,4,11,12,9,6]
x = [99,86,87,88,111,86,103,87,94,78,77,85,86]
x = [5,7,8,7,2,17,2,9,4,11,12,9,6]
y = [99,86,87,88,111,86,103,87,94,78,77,85,86]
plt.scatter(x, y)
plt.show()
Result:
What we can read from the diagram is that the two fastest cars were both 2 years old, and the
slowest car was 12 years old.
Note: It seems that the newer the car, the faster it drives, but that could be a coincidence, after
all we only registered 13 cars.
Random Data Distributions
In Machine Learning the data sets can contain thousands-, or even millions, of values.
You might not have real world data when you are testing an algorithm, you might have to use
randomly generated values.
As we have learned in the previous chapter, the NumPy module can help us with that!
Let us create two arrays that are both filled with 1000 random numbers from a normal data
distribution.
The first array will have the mean set to 5.0 with a standard deviation of 1.0.
The second array will have the mean set to 10.0 with a standard deviation of 2.0:
Example
A scatter plot with 1000 dots:
Python Program
import numpy
import matplotlib.pyplot as plt
plt.scatter(x, y)
plt.show()
Result:
Scatter Plot Explained
We can see that the dots are concentrated around the value 5 on the x-axis, and 10 on the y-
axis.
We can also see that the spread is wider on the y-axis than on the x-axis.
Machine Learning - Linear Regression
Regression
The term regression is used when you try to find the relationship between variables.
In Machine Learning, and in statistical modeling, that relationship is used to predict the
outcome of future events.
Linear Regression
Linear regression uses the relationship between the data-points to draw a straight line through
all them.
Python has methods for finding a relationship between data-points and to draw a line of linear
regression. We will show you how to use these methods instead of going through the
mathematic formula.
In the example below, the x-axis represents age, and the y-axis represents speed. We have
registered the age and speed of 13 cars as they were passing a tollbooth. Let us see if the data
we collected could be used in a linear regression:
Example
x = [5,7,8,7,2,17,2,9,4,11,12,9,6]
y = [99,86,87,88,111,86,103,87,94,78,77,85,86]
plt.scatter(x, y)
plt.show()
Result:
Example
x = [5,7,8,7,2,17,2,9,4,11,12,9,6]
y = [99,86,87,88,111,86,103,87,94,78,77,85,86]
def myfunc(x):
return slope * x + intercept
plt.scatter(x, y)
plt.plot(x, mymodel)
plt.show()
Result:
Example Explained
Create the arrays that represents the values of the x and y axis:
x = [5,7,8,7,2,17,2,9,4,11,12,9,6]
y = [99,86,87,88,111,86,103,87,94,78,77,85,86]
Execute a method that returns some important key values of Linear Regression:
Create a function that uses the slope and intercept values to return a new value. This new
value represents where on the y-axis the corresponding x value will be placed:
def myfunc(x):
return slope * x + intercept
Run each value of the x array through the function. This will result in a new array with new
values for the y-axis:
plt.scatter(x, y)
plt.plot(x, mymodel)
plt.show()
R-Squared
It is important to know how well the relationship between the values of the x-axis and the
values of the y-axis is, if there are no relationship the linear regression can not be used to
predict anything.
The r-squared value ranges from 0 to 1, where 0 means no relationship, and 1 means 100%
related.
Python and the Scipy module will computed this value for you, all you have to do is feed it
with the x and y values:
Example
How well does my data fit in a linear regression?
Python Program Output
from scipy import stats C:\Users\My Name>python
demo_ml_r_squared.py
x = [5,7,8,7,2,17,2,9,4,11,12,9,6] -0.758591524376155
y = [99,86,87,88,111,86,103,87,94,78,77,85,86]
print(r)
Note: The result -076 shows that there are a relationship, not perfect, but it indicates that we
could use linear regression in future predictions.
Example
Predict the speed of a 10 years old car:
Python Program
from scipy import stats
x = [5,7,8,7,2,17,2,9,4,11,12,9,6]
y = [99,86,87,88,111,86,103,87,94,78,77,85,86]
def myfunc(x):
return slope * x + intercept
speed = myfunc(10)
print(speed)
The example predicted a speed at 85.6, which we also could read from the diagram:
Bad Fit?
Let us create an example where linear regression would not be the best method to predict
future values.
Example
These values for the x- and y-axis should result in a very bad fit for linear regression:
Python Program
import matplotlib.pyplot as plt
from scipy import stats
x = [89,43,36,36,95,10,66,34,38,20,26,29,48,64,6,5,36,66,72,40]
y = [21,46,3,35,67,95,53,72,58,10,26,34,90,33,38,20,56,2,47,15]
def myfunc(x):
return slope * x + intercept
plt.scatter(x, y)
plt.plot(x, mymodel)
plt.show()
Result:
Example
You should get a very low r-squared value.
Python Program Output
import numpy C:\Users\My Name>python
from scipy import stats demo_ml_r_squared_badfit.py
0.01331814154297491
x=
[89,43,36,36,95,10,66,34,38,20,26,29,48,64,6,5,36,66,72,40]
y=
[21,46,3,35,67,95,53,72,58,10,26,34,90,33,38,20,56,2,47,15]
print(r)
The result: 0.013 indicates a very bad relationship, and tells us that this data set is not suitable
for linear regression.
Machine Learning - Polynomial Regression
Polynomial Regression
If your data points clearly will not fit a linear regression (a straight line through all data
points), it might be ideal for polynomial regression.
Polynomial regression, like linear regression, uses the relationship between the variables x and
y to find the best way to draw a line through the data points.
Python has methods for finding a relationship between data-points and to draw a line of
polynomial regression. We will show you how to use these methods instead of going through
the mathematic formula.
In the example below, we have registered 18 cars as they were passing a certain tollbooth.
We have registered the car's speed, and the time of day (hour) the passing occurred.
The x-axis represents the hours of the day and the y-axis represents the speed:
Example
Start by drawing a scatter plot:
import matplotlib.pyplot as plt
x = [1,2,3,5,6,7,8,9,10,12,13,14,15,16,18,19,21,22]
y = [100,90,80,60,60,55,60,65,70,70,75,76,78,79,90,99,99,100]
plt.scatter(x, y)
plt.show()
Result:
Example
Import numpy and matplotlib then draw the line of Polynomial Regression:
import numpy
import matplotlib.pyplot as plt
x = [1,2,3,5,6,7,8,9,10,12,13,14,15,16,18,19,21,22]
y = [100,90,80,60,60,55,60,65,70,70,75,76,78,79,90,99,99,100]
plt.scatter(x, y)
plt.plot(myline, mymodel(myline))
plt.show()
Result:
Example Explained
import numpy
import matplotlib.pyplot as plt
Create the arrays that represents the values of the x and y axis:
x = [1,2,3,5,6,7,8,9,10,12,13,14,15,16,18,19,21,22]
y = [100,90,80,60,60,55,60,65,70,70,75,76,78,79,90,99,99,100]
Then specify how the line will display, we start at position 1, and end at position 22:
plt.scatter(x, y)
plt.show()
R-Squared
It is important to know how well the relationship between the values of the x- and y-axis is, if
there are no relationship the polynomial regression can not be used to predict anything.
The r-squared value ranges from 0 to 1, where 0 means no relationship, and 1 means 100%
related.
Python and the Sklearn module will computed this value for you, all you have to do is feed it
with the x and y arrays:
Example
How well does my data fit in a polynomial regression?
Python Program Output
import numpy C:\Users\My Name>python
from sklearn.metrics import r2_score demo_ml_polynomial_r.py
0.9432150416451027
x = [1,2,3,5,6,7,8,9,10,12,13,14,15,16,18,19,21,22]
y = [100,90,80,60,60,55,60,65,70,70,75,76,78,79,90,99,99,100]
print(r2_score(y, mymodel(x)))
Note: The result 0.94 shows that there is a very good relationship, and we can use polynomial
regression in future predictions.
Now we can use the information we have gathered to predict future values.
Example: Let us try to predict the speed of a car that passes the tollbooth at around 17 P.M:
To do so, we need the same mymodel array from the example above:
speed = mymodel(17)
print(speed)
The example predicted a speed to be 88.87, which we also could read from the diagram:
Bad Fit?
Let us create an example where polynomial regression would not be the best method to predict
future values.
Example
These values for the x- and y-axis should result in a very bad fit for polynomial regression:
Python Program
import numpy
import matplotlib.pyplot as plt
x = [89,43,36,36,95,10,66,34,38,20,26,29,48,64,6,5,36,66,72,40]
y = [21,46,3,35,67,95,53,72,58,10,26,34,90,33,38,20,56,2,47,15]
plt.scatter(x, y)
plt.plot(myline, mymodel(myline))
plt.show()
Result:
.
Example
You should get a very low r-squared value.
Python Program Output
import numpy C:\Users\My Name>python
from sklearn.metrics import r2_score demo_ml_polynomial_badfit_r.py
x = [89,43,36,36,95,10,66,34,38,20,26,29,48,64,6,5,36,66,72,40] 0.009952707566680652
y = [21,46,3,35,67,95,53,72,58,10,26,34,90,33,38,20,56,2,47,15]
print(r2_score(y, mymodel(x)))
The result: 0.00995 indicates a very bad relationship, and tells us that this data set is not
suitable for polynomial regression.
Machine Learning - Multiple Regression
Multiple Regression
Multiple regression is like linear regression, but with more than one independent value,
meaning that we try to predict a value based on two or more variables.
Take a look at the data set below, it contains some information about cars.
We can predict the CO2 emission of a car based on the size of the engine, but with multiple
regression we can throw in more variables, like the weight of the car, to make the prediction
more accurate.
In Python we have modules that will do the work for us. Start by importing the Pandas
module.
import Pandas
The Pandas module allows us to read csv files and return a DataFrame object.
df = pandas.read_csv("cars.csv")
Then make a list of the independent values and call this variable X.
X = df[['Weight', 'Volume']]
y = df['CO2']
Tip: It is common to name the list of independent values with a upper case X, and the list of
dependent values with a lower case y.
We will use some methods from the sklearn module, so we will have to import that module as
well:
From the sklearn module we will use the LinearRegression() method to create a linear
regression object.
This object has a method called fit() that takes the independent and dependent values as
parameters and fills the regression object with data that describes the relationship:
regr = linear_model.LinearRegression()
regr.fit(X, y)
Now we have a regression object that are ready to predict CO2 values based on a car's weight
and volume:
#predict the CO2 emission of a car where the weight is 2300g, and the volume
is 1300ccm:
Example
See the whole example in action:
Python Program Output
import pandas C:\Users\My Name>python
from sklearn import linear_model demo_ml_multiple_regression.py
[107.2087328]
df = pandas.read_csv("cars.csv")
X = df[['Weight', 'Volume']]
y = df['CO2']
regr = linear_model.LinearRegression()
regr.fit(X, y)
print(predictedCO2)
Result:
[107.2087328]
We have predicted that a car with 1.3 liter engine, and a weight of 2.3 kg, will release
approximately 107 grams of CO2 for every kilometer it drives.
Coefficient
The coefficient is a factor that describes the relationship with an unknown variable.
Example: if x is a variable, then 2x is x two times. x is the unknown variable, and the number
2 is the coefficient.
In this case, we can ask for the coefficient value of weight against CO2, and for volume
against CO2. The answer(s) we get tells us what would happen if we increase, or decrease, one
of the independent values.
Example
Print the coefficient values of the regression object:
Python Program Output
import pandas C:\Users\My Name>python
from sklearn import linear_model demo_ml_coef.py
[0.00755095 0.00780526]
df = pandas.read_csv("cars.csv")
X = df[['Weight', 'Volume']]
y = df['CO2']
regr = linear_model.LinearRegression()
regr.fit(X, y)
print(regr.coef_)
Result:
[0.00755095 0.00780526]
Result Explained
The result array represents the coefficient values of weight and value.
Weight: 0.00755095
Volume: 0.00780526
These values tells us that if the weight increases by 1g, the CO2 emission increases by
0.00755095g.
And if the engine size (Volume) increases by 1 ccm, the CO2 emission increases by
0.00780526 g.
We have already predicted that if a car with a 1300ccm engine weighs 2300g, the CO2
emission will be approximately 107g.
X = df[['Weight', 'Volume']]
y = df['CO2']
regr = linear_model.LinearRegression()
regr.fit(X, y)
print(predictedCO2)
Result:
[114.75968007]
We have predicted that a car with 1.3 liter engine, and a weight of 3.3 kg, will release
approximately 115 grams of CO2 for every kilometer it drives.
Scale Features
When your data has different values, and even different measurement units, it can be difficult
to compare them. What is kilograms compared to meters? Or altitude compared to time?
The answer to this problem is scaling. We can scale data into new values that are easier to
compare.
Take a look at the table below, it is the same data set that we used in the multiple regression
chapter, but this time the volume column contains values in liters instead of ccm (1.0 instead
of 1000).
It can be difficult to compare the volume 1.0 with the weight 790, but if we scale them both
into comparable values, we can easily see how much one value is compared to the other.
There are different methods for scaling data, in this tutorial we will use a method called
standardization.
z = (x - u) / s
Where z is the new value, x is the original value, u is the mean and s is the standard deviation.
If you take the weight column from the data set above, the first value is 790, and the scaled
value will be:
If you take the volume column from the data set above, the first value is 1.0, and the scaled
value will be:
You do not have to do this manually, the Python sklearn module has a method called
StandardScaler() which returns a Scaler object with methods for transforming data sets.
Example
Scale all values in the Weight and Volume columns:
import pandas
from sklearn import linear_model
from sklearn.preprocessing import StandardScaler
scale = StandardScaler()
df = pandas.read_csv("cars2.csv")
X = df[['Weight', 'Volume']]
scaledX = scale.fit_transform(X)
print(scaledX)
Result:
Note that the first two values are -2.1 and -1.59, which corresponds to our calculations:
[[-2.10389253 -1.59336644]
[-0.55407235 -1.07190106]
[-1.52166278 -1.59336644]
[-1.78973979 -1.85409913]
[-0.63784641 -0.28970299]
[-1.52166278 -1.59336644]
[-0.76769621 -0.55043568]
[ 0.3046118 -0.28970299]
[-0.7551301 -0.28970299]
[-0.59595938 -0.0289703 ]
[-1.30803892 -1.33263375]
[-1.26615189 -0.81116837]
[-0.7551301 -1.59336644]
[-0.16871166 -0.0289703 ]
[ 0.14125238 -0.0289703 ]
[ 0.15800719 -0.0289703 ]
[ 0.3046118 -0.0289703 ]
[-0.05142797 1.53542584]
[-0.72580918 -0.0289703 ]
[ 0.14962979 1.01396046]
[ 1.2219378 -0.0289703 ]
[ 0.5685001 1.01396046]
[ 0.3046118 1.27469315]
[ 0.51404696 -0.0289703 ]
[ 0.51404696 1.01396046]
[ 0.72348212 -0.28970299]
[ 0.8281997 1.01396046]
[ 1.81254495 1.01396046]
[ 0.96642691 -0.0289703 ]
[ 1.72877089 1.01396046]
[ 1.30990057 1.27469315]
[ 1.90050772 1.01396046]
[-0.23991961 -0.0289703 ]
[ 0.40932938 -0.0289703 ]
[ 0.47215993 -0.0289703 ]
[ 0.4302729 2.31762392]]
The task in the Multiple Regression chapter was to predict the CO2 emission from a car when
you only knew its weight and volume.
When the data set is scaled, you will have to use the scale when you predict values:
Example
Predict the CO2 emission from a 1.3 liter car that weighs 2300 kilograms:
import pandas
from sklearn import linear_model
from sklearn.preprocessing import StandardScaler
scale = StandardScaler()
df = pandas.read_csv("cars2.csv")
X = df[['Weight', 'Volume']]
y = df['CO2']
scaledX = scale.fit_transform(X)
regr = linear_model.LinearRegression()
regr.fit(scaledX, y)
predictedCO2 = regr.predict([scaled[0]])
print(predictedCO2)
Result:
[107.2087328]
Machine Learning - Train/Test
In Machine Learning we create models to predict the outcome of certain events, like in the
previous chapter where we predicted the CO2 emission of a car when we knew the weight and
engine size.
To measure if the model is good enough, we can use a method called Train/Test.
What is Train/Test
It is called Train/Test because you split the the data set into two sets: a training set and a
testing set.
Our data set illustrates 100 customers in a shop, and their shopping habits.
Example
import numpy
import matplotlib.pyplot as plt
numpy.random.seed(2)
x = numpy.random.normal(3, 1, 100)
y = numpy.random.normal(150, 40, 100) / x
plt.scatter(x, y)
plt.show()
Result:
The x axis represents the number of minutes before making a purchase.
The y axis represents the amount of money spent on the purchase.
The training set should be a random selection of 80% of the original data.
train_x = x[:80]
train_y = y[:80]
test_x = x[80:]
test_y = y[80:]
Result:
To make sure the testing set is not completely different, we will take a look at the testing set as
well.
Example
plt.scatter(test_x, test_y)
plt.show()
Result:
The testing set also looks like the original data set:
Fit the Data Set
What does the data set look like? In my opinion I think the best fit would be a polynomial
regression, so let us draw a line of polynomial regression.
To draw a line through the data points, we use the plot() method of the matplotlib module:
Example
Draw a polynomial regression line through the data points:
import numpy
import matplotlib.pyplot as plt
numpy.random.seed(2)
x = numpy.random.normal(3, 1, 100)
y = numpy.random.normal(150, 40, 100) / x
train_x = x[:80]
train_y = y[:80]
test_x = x[80:]
test_y = y[80:]
Result:
The result can back my suggestion of the data set fitting a polynomial regression, even though
it would give us some weird results if we try to predict values outside of the data set. Example:
the line indicates that a customer spending 6 minutes in the shop would make a purchase
worth 200. That is probably a sign of overfitting.
But what about the R-squared score? The R-squared score is a good indicator of how well my
data set is fitting the model.
R2
It measures the relationship between the x axis and the y axis, and the value ranges from 0 to
1, where 0 means no relationship, and 1 means totally related.
The sklearn module has a method called rs_score() that will help us find this relationship.
In this case we would like to measure the relationship between the minutes a customer stays in
the shop and how much money they spend.
Example
How well does my training data fit in a polynomial regression?
import numpy
from sklearn.metrics import r2_score
numpy.random.seed(2)
x = numpy.random.normal(3, 1, 100)
y = numpy.random.normal(150, 40, 100) / x
train_x = x[:80]
train_y = y[:80]
test_x = x[80:]
test_y = y[80:]
r2 = r2_score(train_y, mymodel(train_x))
print(r2)
Now we have made a model that is OK, at least when it comes to training data.
Now we want to test the model with the testing data as well, to see if gives us the same result.
Example
Let us find the R2 score when using testing data:
import numpy
from sklearn.metrics import r2_score
numpy.random.seed(2)
x = numpy.random.normal(3, 1, 100)
y = numpy.random.normal(150, 40, 100) / x
train_x = x[:80]
train_y = y[:80]
test_x = x[80:]
test_y = y[80:]
r2 = r2_score(test_y, mymodel(test_x))
print(r2)
Note: The result 0.809 shows that the model fits the testing set as well, and we are confident
that we can use the model to predict future values.
Predict Values
Now that we have established that our model is OK, we can start predicting new values.
Example
How much money will a buying customer spend, if she or he stays in the shop for 5 minutes?
print(mymodel(5))
Output
The example predicted the customer to spend 22.88 dollars, as seems to correspond to the
diagram:
Machine Learning - Decision Tree
Decision Tree
In this chapter we will show you how to make a "Decision Tree". A Decision Tree is a Flow
Chart, and can help you make decisions based on previous experience.
In the example, a person will try to decide if he/she should go to a comedy show or not.
Luckily our example person has registered every time there was a comedy show in town, and
registered some information about the comedian, and also registered if he/she went or not.
Now, based on this data set, Python can create a decision tree that can be used to decide if any
new shows are worth attending to.
First, import the modules you need, and read the dataset with pandas:
Example
Read and print the data set:
import pandas
from sklearn import tree
import pydotplus
from sklearn.tree import DecisionTreeClassifier
import matplotlib.pyplot as plt
import matplotlib.image as pltimg
df = pandas.read_csv("shows.csv")
print(df)
Output
We have to convert the non numerical columns 'Nationality' and 'Go' into numerical values.
Pandas has a map() method that takes a dictionary with information on how to convert the
values.
Example
Change string values into numerical values:
d = {'UK': 0, 'USA': 1, 'N': 2}
df['Nationality'] = df['Nationality'].map(d)
d = {'YES': 1, 'NO': 0}
df['Go'] = df['Go'].map(d)
print(df)
Output
Then we have to separate the feature columns from the target column.
The feature columns are the columns that we try to predict from, and the target column is the
column with the values we try to predict.
.
.
Example
X is the feature columns, y is the target column:
features = ['Age', 'Experience', 'Rank', 'Nationality']
X = df[features]
y = df['Go']
print(X)
print(y)
Output
0 0
1 0
2 0
3 0
4 1
5 0
6 1
7 1
8 1
9 1
10 0
11 1
12 1
Name: Go, dtype: int64
Now we can create the actual decision tree, fit it with our details, and save a .png file on the
computer:
Example
Create a Decision Tree, save it as an image, and show the image:
dtree = DecisionTreeClassifier()
dtree = dtree.fit(X, y)
data = tree.export_graphviz(dtree, out_file=None, feature_names=features)
graph = pydotplus.graph_from_dot_data(data)
graph.write_png('mydecisiontree.png')
img=pltimg.imread('mydecisiontree.png')
imgplot = plt.imshow(img)
plt.show()
Output
Result Explained
The decision tree uses your earlier decisions to calculate the odds for you to wanting to go see
a comedian or not.
Rank <= 6.5 means that every comedian with a rank of 6.5 or lower will follow the True
arrow (to the left), and the rest will follow the False arrow (to the right).
gini = 0.497 refers to the quality of the split, and is always a number between 0.0 and 0.5,
where 0.0 would mean all of the samples got the same result, and 0.5 would mean that the split
is done exactly in the middle.
samples = 13 means that there are 13 comedians left at this point in the decision, which is all
of them since this is the first step.
value = [6, 7] means that of these 13 comedians, 6 will get a "NO", and 7 will get a "GO".
Gini
There are many ways to split the samples, we use the GINI method in this tutorial.
Where x is the number of positive answers("GO"), n is the number of samples, and y is the
number of negative answers ("NO"), which gives us this calculation:
gini = 0.0 means all of the samples got the same result.
samples = 5 means that there are 5 comedians left in this branch (5 comedian with a Rank of
6.5 or lower).
value = [5, 0] means that 5 will get a "NO" and 0 will get a "GO".
Nationality
Nationality <= 0.5 means that the comedians with a nationality value of less than 0.5 will
follow the arrow to the left (which means everyone from the UK, ), and the rest will follow the
arrow to the right.
gini = 0.219 means that about 22% of the samples would go in one direction.
samples = 8 means that there are 8 comedians left in this branch (8 comedian with a Rank
higher than 6.5).
value = [1, 7] means that of these 8 comedians, 1 will get a "NO" and 7 will get a "GO".
True - 4 Comedians Continue:
Age
Age <= 35.5 means that comedians at the age of 35.5 or younger will follow the arrow to the
left, and the rest will follow the arrow to the right.
gini = 0.375 means that about 37,5% of the samples would go in one direction.
samples = 4 means that there are 4 comedians left in this branch (4 comedians from the UK).
value = [1, 3] means that of these 4 comedians, 1 will get a "NO" and 3 will get a "GO".
gini = 0.0 means all of the samples got the same result.
samples = 4 means that there are 4 comedians left in this branch (4 comedians not from the
UK).
value = [0, 4] means that of these 4 comedians, 0 will get a "NO" and 4 will get a "GO".
gini = 0.0 means all of the samples got the same result.
samples = 2 means that there are 2 comedians left in this branch (2 comedians at the age 35.5
or younger).
value = [0, 2] means that of these 2 comedians, 0 will get a "NO" and 2 will get a "GO".
False - 2 Comedians Continue:
Experience
Experience <= 9.5 means that comedians with 9.5 years of experience, or more, will follow
the arrow to the left, and the rest will follow the arrow to the right.
gini = 0.5 means that 50% of the samples would go in one direction.
samples = 2 means that there are 2 comedians left in this branch (2 comedians older than
35.5).
value = [1, 1] means that of these 2 comedians, 1 will get a "NO" and 1 will get a "GO".
gini = 0.0 means all of the samples got the same result.
samples = 1 means that there is 1 comedian left in this branch (1 comedian with 9.5 years of
experience or less).
value = [0, 1] means that 0 will get a "NO" and 1 will get a "GO".
gini = 0.0 means all of the samples got the same result.
samples = 1 means that there is 1 comedians left in this branch (1 comedian with more than
9.5 years of experience).
value = [1, 0] means that 1 will get a "NO" and 0 will get a "GO".
Predict Values
Example: Should I go see a show starring a 40 years old American comedian, with 10 years of
experience, and a comedy ranking of 7?
Example
Use predict() method to predict new values:
print(dtree.predict([[40, 10, 7, 1]]))
Example
What would the answer be if the comedy rank was 6?
print(dtree.predict([[40, 10, 6, 1]]))
Different Results
You will see that the Decision Tree gives you different results if you run it enough times, even
if you feed it with the same data.
That is because the Decision Tree does not give us a 100% certain answer. It is based on the
probability of an outcome, and the answer will vary.
Python MySQL
Python can be used in database applications.
One of the most popular databases is MySQL.
MySQL Database
To be able to experiment with the code examples in this tutorial, you should have MySQL
installed on your computer.
You can download a free MySQL database at https://www.mysql.com/downloads/.
To test if the installation was successful, or if you already have "MySQL Connector" installed,
create a Python page with the following content:
demo_mysql_test.py:
import mysql.connector
#if this page is executed with no errors, you have the "mysql.connector" module installed.
Output
C:\Users\My Name>python demo_mysql_test.py
If the above code was executed with no errors, "MySQL Connector" is installed and ready to
be used.
Create Connection
Start by creating a connection to the database.
Use the username and password from your MySQL database:
demo_mysql_connection.py:
import mysql.connector
mydb = mysql.connector.connect(
host="localhost",
user="yourusername",
passwd="yourpassword"
)
print(mydb)
Output
Now you can start querying the database using SQL statements.
Python MySQL Create Database
Creating a Database
Example
create a database named "mydatabase":
Python Program Output
import mysql.connector C:\Users\My Name>python
demo_mysql_create_db.py
mydb = mysql.connector.connect(
host="localhost",
user="yourusername",
passwd="yourpassword"
)
mycursor = mydb.cursor()
mycursor.execute("CREATE DATABASE mydatabase")
If the above code was executed with no errors, you have successfully created a database.
You can check if a database exist by listing all databases in your system by using the "SHOW
DATABASES" statement:
Example
Return a list of your system's databases:
Python Program Output
import mysql.connector C:\Users\My Name>python
demo_mysql_show_databases.py
mydb = mysql.connector.connect( ('information_scheme',)
host="localhost", ('mydatabase',)
user="yourusername", ('performance_schema',)
passwd="yourpassword" ('sys',)
)
mycursor = mydb.cursor()
mycursor.execute("SHOW DATABASES")
for x in mycursor:
print(x)
Or you can try to access the database when making the connection:
Example
Try connecting to the database "mydatabase":
Python Program Output
import mysql.connector C:\Users\My Name>python
demo_mysql_db_exist.py
mydb = mysql.connector.connect(
host="localhost",
user="yourusername",
passwd="yourpassword",
database="mydatabase"
)
#If this page is executed with no error, the database "mydatabase" exists in your system
Creating a Table
Make sure you define the name of the database when you create the connection
Example
Create a table named "customers":
import mysql.connector
mydb = mysql.connector.connect(
host="localhost",
user="yourusername",
passwd="yourpassword",
database="mydatabase"
)
mycursor = mydb.cursor()
If the above code was executed with no errors, you have now successfully created a table.
Output
C:\Users\My Name>python demo_mysql_create_table.py
You can check if a table exist by listing all tables in your database with the "SHOW TABLES"
statement:
Example
Return a list of your system's databases:
import mysql.connector
mydb = mysql.connector.connect(
host="localhost",
user="yourusername",
passwd="yourpassword",
database="mydatabase"
)
mycursor = mydb.cursor()
mycursor.execute("SHOW TABLES")
for x in mycursor:
print(x)
Output
C:\Users\My Name>python demo_mysql_show_tables.py
('customers',)
Primary Key
When creating a table, you should also create a column with a unique key for each record.
We use the statement "INT AUTO_INCREMENT PRIMARY KEY" which will insert a
unique number for each record. Starting at 1, and increased by one for each record.
Example
Create primary key when creating the table:
import mysql.connector
mydb = mysql.connector.connect(
host="localhost",
user="yourusername",
passwd="yourpassword",
database="mydatabase"
)
mycursor = mydb.cursor()
Output
#If this page is executed with no error, the table "customers" now has a primary key
.
Example
Create primary key on an existing table:
import mysql.connector
mydb = mysql.connector.connect(
host="localhost",
user="yourusername",
passwd="yourpassword",
database="mydatabase"
)
mycursor = mydb.cursor()
Output
#If this page is executed with no error, the table "customers" now has a primary key
Python MySQL Insert Into Table
Example
Insert a record in the "customers" table:
import mysql.connector
mydb = mysql.connector.connect(
host="localhost",
user="yourusername",
passwd="yourpassword",
database="mydatabase"
)
mycursor = mydb.cursor()
mydb.commit()
Output
C:\Users\My Name>python demo_mysql_insert.py
1 record inserted.
The second parameter of the executemany() method is a list of tuples, containing the data
you want to insert:
Example
Fill the "customers" table with data:
import mysql.connector
mydb = mysql.connector.connect(
host="localhost",
user="yourusername",
passwd="yourpassword",
database="mydatabase"
)
mycursor = mydb.cursor()
mycursor.executemany(sql, val)
mydb.commit()
Output
C:\Users\My Name>python demo_mysql_insert_many.py
13 record was inserted.
Get Inserted ID
You can get the id of the row you just inserted by asking the cursor object.
Note: If you insert more that one row, the id of the last inserted row is returned.
.
Example
Insert one row, and return the ID:
import mysql.connector
mydb = mysql.connector.connect(
host="localhost",
user="yourusername",
passwd="yourpassword",
database="mydatabase"
)
mycursor = mydb.cursor()
mydb.commit()
Output
C:\Users\My Name>python demo_mysql_insert_id.py
1 record inserted, ID: 15
Python MySQL Select From
Example
Select all records from the "customers" table, and display the result:
import mysql.connector
mydb = mysql.connector.connect(
host="localhost",
user="yourusername",
passwd="yourpassword",
database="mydatabase"
)
mycursor = mydb.cursor()
myresult = mycursor.fetchall()
for x in myresult:
print(x)
Note: We use the fetchall() method, which fetches all rows from the last executed
statement.
Output
Example
Select only the name and address columns:
import mysql.connector
mydb = mysql.connector.connect(
host="localhost",
user="yourusername",
passwd="yourpassword",
database="mydatabase"
)
mycursor = mydb.cursor()
myresult = mycursor.fetchall()
for x in myresult:
print(x)
Output
C:\Users\My Name>python demo_mysql_select_columns.py
('John', 'Highway 21')
('Peter', 'Lowstreet 27')
('Amy', 'Apple st 652')
('Hannah', 'Mountain 21')
('Michael', 'Valley 345')
('Sandy', 'Ocean blvd 2')
('Betty', 'Green Grass 1')
('Richard', 'Sky st 331')
('Susan', 'One way 98')
('Vicky', 'Yellow Garden 2')
('Ben', 'Park Lane 38')
('William', 'Central st 954')
('Chuck', 'Main Road 989')
('Viola', 'Sideway 1633')
('Michelle', 'Blue Village')
.
Example
Fetch only one row:
import mysql.connector
mydb = mysql.connector.connect(
host="localhost",
user="yourusername",
passwd="yourpassword",
database="mydatabase"
)
mycursor = mydb.cursor()
myresult = mycursor.fetchone()
print(myresult)
Output
When selecting records from a table, you can filter the selection by using the "WHERE"
statement:
Example
Select record(s) where the address is "Park Lane 38": result:
import mysql.connector
mydb = mysql.connector.connect(
host="localhost",
user="yourusername",
passwd="yourpassword",
database="mydatabase"
)
mycursor = mydb.cursor()
mycursor.execute(sql)
myresult = mycursor.fetchall()
for x in myresult:
print(x)
Output
C:\Users\My Name>python demo_mysql_where.py
(11, 'Ben', 'Park Lane 38')
Wildcard Characters
You can also select the records that starts, includes, or ends with a given letter or phrase.
Use the % to represent wildcard characters:
Example
Select records where the address contains the word "way":
import mysql.connector
mydb = mysql.connector.connect(
host="localhost",
user="yourusername",
passwd="yourpassword",
database="mydatabase"
)
mycursor = mydb.cursor()
mycursor.execute(sql)
myresult = mycursor.fetchall()
for x in myresult:
print(x)
Output
C:\Users\My Name>python demo_mysql_where_wildcard.py
(1, 'John', 'Highway 21')
(9, 'Susan', 'One way 98')
(14, 'Viola', 'Sideway 1633')
Example
Escape query values by using the placholder %s method:
import mysql.connector
mydb = mysql.connector.connect(
host="localhost",
user="yourusername",
passwd="yourpassword",
database="mydatabase"
)
mycursor = mydb.cursor()
mycursor.execute(sql, adr)
myresult = mycursor.fetchall()
for x in myresult:
print(x)
Output
C:\Users\My Name>python demo_mysql_where_escape.py
(10, 'Vicky', 'Yellow Garden 2')
Python MySQL Order By
Use the ORDER BY statement to sort the result in ascending or descending order.
The ORDER BY keyword sorts the result ascending by default. To sort the result in
descending order, use the DESC keyword.
Example
Sort the result alphabetically by name: result:
import mysql.connector
mydb = mysql.connector.connect(
host="localhost",
user="yourusername",
passwd="yourpassword",
database="mydatabase"
)
mycursor = mydb.cursor()
mycursor.execute(sql)
myresult = mycursor.fetchall()
for x in myresult:
print(x)
Output
Example
Sort the result reverse alphabetically by name:
import mysql.connector
mydb = mysql.connector.connect(
host="localhost",
user="yourusername",
passwd="yourpassword",
database="mydatabase"
)
mycursor = mydb.cursor()
mycursor.execute(sql)
myresult = mycursor.fetchall()
for x in myresult:
print(x)
Output
Delete Record
You can delete records from an existing table by using the "DELETE FROM" statement:
Example
Delete any record where the address is "Mountain 21":
import mysql.connector
mydb = mysql.connector.connect(
host="localhost",
user="yourusername",
passwd="yourpassword",
database="mydatabase"
)
mycursor = mydb.cursor()
mycursor.execute(sql)
mydb.commit()
Output
Notice the WHERE clause in the DELETE syntax: The WHERE clause specifies which
record(s) that should be deleted. If you omit the WHERE clause, all records will be deleted!
It is considered a good practice to escape the values of any query, also in delete statements.
This is to prevent SQL injections, which is a common web hacking technique to destroy or
misuse your database.
The mysql.connector module uses the placeholder %s to escape values in the delete statement:
Example
Escape values by using the placeholder %s method:
import mysql.connector
mydb = mysql.connector.connect(
host="localhost",
user="yourusername",
passwd="yourpassword",
database="mydatabase"
)
mycursor = mydb.cursor()
mycursor.execute(sql, adr)
mydb.commit()
Output
C:\Users\My Name>python demo_mysql_delete_escape.py
1 record(s) deleted
Python MySQL Drop Table
Delete a Table
You can delete an existing table by using the "DROP TABLE" statement:
Example
Delete the table "customers":
import mysql.connector
mydb = mysql.connector.connect(
host="localhost",
user="yourusername",
passwd="yourpassword",
database="mydatabase"
)
mycursor = mydb.cursor()
mycursor.execute(sql)
Output
If the the table you want to delete is already deleted, or for any other reason does not exist, you
can use the IF EXISTS keyword to avoid getting an error.
Example
Delete the table "customers" if it exists:
import mysql.connector
mydb = mysql.connector.connect(
host="localhost",
user="yourusername",
passwd="yourpassword",
database="mydatabase"
)
mycursor = mydb.cursor()
sql = "DROP TABLE IF EXISTS customers"
mycursor.execute(sql)
Output
C:\Users\My Name>python demo_mysql_drop_table2.py
Python MySQL Update Table
Update Table
You can update existing records in a table by using the "UPDATE" statement:
Example
Overwrite the address column from "Valley 345" to "Canyoun 123":
import mysql.connector
mydb = mysql.connector.connect(
host="localhost",
user="yourusername",
passwd="yourpassword",
database="mydatabase"
)
mycursor = mydb.cursor()
sql = "UPDATE customers SET address = 'Canyon 123' WHERE address = 'Valley 345'"
mycursor.execute(sql)
mydb.commit()
Output
C:\Users\My Name>python demo_mysql_update.py
1 record(s) affected
Notice the WHERE clause in the UPDATE syntax: The WHERE clause specifies which
record or records that should be updated. If you omit the WHERE clause, all records will be
updated!
It is considered a good practice to escape the values of any query, also in update statements.
This is to prevent SQL injections, which is a common web hacking technique to destroy or
misuse your database.
The mysql.connector module uses the placeholder %s to escape values in the delete statement:
Example
Escape values by using the placholder %s method:
import mysql.connector
mydb = mysql.connector.connect(
host="localhost",
user="yourusername",
passwd="yourpassword",
database="mydatabase"
)
mycursor = mydb.cursor()
mycursor.execute(sql, val)
mydb.commit()
Output
You can limit the number of records returned from the query, by using the "LIMIT" statement:
Example
Select the 5 first records in the "customers" table:
import mysql.connector
mydb = mysql.connector.connect(
host="localhost",
user="yourusername",
passwd="yourpassword",
database="mydatabase"
)
mycursor = mydb.cursor()
myresult = mycursor.fetchall()
for x in myresult:
print(x)
Output
C:\Users\My Name>python demo_mysql_limit.py
(1, 'John', 'Highway 21')
(2, 'Peter', 'Lowstreet 27')
(3, 'Amy', 'Apple st 652')
(4, 'Hannah', 'Mountain 21')
(5, 'Michael', 'Valley 345')
If you want to return five records, starting from the third record, you can use the "OFFSET"
keyword:
Example
Start from position 3, and return 5 records:
import mysql.connector
mydb = mysql.connector.connect(
host="localhost",
user="yourusername",
passwd="yourpassword",
database="mydatabase"
)
mycursor = mydb.cursor()
myresult = mycursor.fetchall()
for x in myresult:
print(x)
Output
products
{ id: 154, name: 'Chocolate Heaven' },
{ id: 155, name: 'Tasty Lemons' },
{ id: 156, name: 'Vanilla Dreams' }
These two tables can be combined by using users' fav field and products' id field.
Example
Join users and products to see the name of the users favorite product:
import mysql.connector
mydb = mysql.connector.connect(
host="localhost",
user="yourusername",
passwd="yourpassword",
database="mydatabase"
)
mycursor = mydb.cursor()
sql = "SELECT \
users.name AS user, \
products.name AS favorite \
FROM users \
INNER JOIN products ON users.fav = products.id"
mycursor.execute(sql)
myresult = mycursor.fetchall()
for x in myresult:
print(x)
Output
Note: You can use JOIN instead of INNER JOIN. They will both give you the same result.
LEFT JOIN
In the example above, Hannah, and Michael were excluded from the result, that is because
INNER JOIN only shows the records where there is a match.
If you want to show all users, even if they do not have a favorite product, use the LEFT JOIN
statement:
Example
Select all users and their favorite product:
import mysql.connector
mydb = mysql.connector.connect(
host="localhost",
user="myusername",
passwd="mypassword",
database="mydatabase"
)
mycursor = mydb.cursor()
sql = "SELECT \
users.name AS user, \
products.name AS favorite \
FROM users \
LEFT JOIN products ON users.fav = products.id"
mycursor.execute(sql)
myresult = mycursor.fetchall()
for x in myresult:
print(x)
Output
C:\Users\My Name>python demo_mysql_left_join.py
('John', 'Chocolate Heaven')
('Peter', 'Chocolate Heaven')
('Amy', 'Tasty Lemon')
('Hannah', None)
('Michael', None)
RIGHT JOIN
If you want to return all products, and the users who have them as their favorite, even if no
user have them as their favorite, use the RIGHT JOIN statement:
Example
Select all products, and the user(s) who have them as their favorite:
import mysql.connector
mydb = mysql.connector.connect(
host="localhost",
user="myusername",
passwd="mypassword",
database="mydatabase"
)
mycursor = mydb.cursor()
sql = "SELECT \
users.name AS user, \
products.name AS favorite \
FROM users \
RIGHT JOIN products ON users.fav = products.id"
mycursor.execute(sql)
myresult = mycursor.fetchall()
for x in myresult:
print(x)
Output
Note: Hannah and Michael, who have no favorite product, are not included in the result.
Python MongoDB
MongoDB
MongoDB stores data in JSON-like documents, which makes the database very flexible and
scalable.
To be able to experiment with the code examples in this tutorial, you will need access to a
MongoDB database.
PyMongo
Navigate your command line to the location of PIP, and type the following:
Test PyMongo
To test if the installation was successful, or if you already have "pymongo" installed, create a
Python page with the following content:
demo_mongodb_test.py:
import pymongo
If the above code was executed with no errors, "pymongo" is installed and ready to be used.
Output
Creating a Database
MongoDB will create the database if it does not exist, and make a connection to it.
Example
Create a database called "mydatabase":
import pymongo
myclient = pymongo.MongoClient("mongodb://localhost:27017/")
mydb = myclient["mydatabase"]
# database created!
Output
MongoDB waits until you have created a collection (table), with at least one document
(record) before it actually creates the database (and collection).
Example
Return a list of your system's databases:
print(myclient.list_database_names())
Output
C:\Users\My Name>python demo_mongodb_check_db.py
['admin', 'local', 'mydatabase']
Example
Check if "mydatabase" exists:
dblist = myclient.list_database_names()
if "mydatabase" in dblist:
print("The database exists.")
Output
Creating a Collection
To create a collection in MongoDB, use database object and specify the name of the collection
you want to create.
Example
Create a collection called "customers":
import pymongo
myclient = pymongo.MongoClient("mongodb://localhost:27017/")
mydb = myclient["mydatabase"]
mycol = mydb["customers"]
Output
MongoDB waits until you have inserted a document before it actually creates the collection.
print(mydb.list_collection_names())
Output
collist = mydb.list_collection_names()
if "customers" in collist:
print("The collection exists.")
Output
The first parameter of the insert_one() method is a dictionary containing the name(s) and
value(s) of each field in the document you want to insert.
Example
Insert a record in the "customers" collection:
import pymongo
myclient = pymongo.MongoClient("mongodb://localhost:27017/")
mydb = myclient["mydatabase"]
mycol = mydb["customers"]
x = mycol.insert_one(mydict)
Output
Example
Insert another record in the "customers" collection, and return the value of the _id field:
mydict = { "name": "Peter", "address": "Lowstreet 27" }
x = mycol.insert_one(mydict)
print(x.inserted_id)
Output
If you do not specify an _id field, then MongoDB will add one for you and assign a unique id
for each document.
In the example above no _id field was specified, so MongoDB assigned a unique _id for the
record (document).
The first parameter of the insert_many() method is a list containing dictionaries with the
data you want to insert:
Example
import pymongo
myclient = pymongo.MongoClient("mongodb://localhost:27017/")
mydb = myclient["mydatabase"]
mycol = mydb["customers"]
mylist = [
{ "name": "Amy", "address": "Apple st 652"},
{ "name": "Hannah", "address": "Mountain 21"},
{ "name": "Michael", "address": "Valley 345"},
{ "name": "Sandy", "address": "Ocean blvd 2"},
{ "name": "Betty", "address": "Green Grass 1"},
{ "name": "Richard", "address": "Sky st 331"},
{ "name": "Susan", "address": "One way 98"},
{ "name": "Vicky", "address": "Yellow Garden 2"},
{ "name": "Ben", "address": "Park Lane 38"},
{ "name": "William", "address": "Central st 954"},
{ "name": "Chuck", "address": "Main Road 989"},
{ "name": "Viola", "address": "Sideway 1633"}
]
x = mycol.insert_many(mylist)
If you do not want MongoDB to assign unique ids for you document, you can specify the _id
field when you insert the document(s).
Remember that the values has to be unique. Two documents cannot have the same _id.
Example
import pymongo
myclient = pymongo.MongoClient("mongodb://localhost:27017/")
mydb = myclient["mydatabase"]
mycol = mydb["customers"]
mylist = [
{ "_id": 1, "name": "John", "address": "Highway 37"},
{ "_id": 2, "name": "Peter", "address": "Lowstreet 27"},
{ "_id": 3, "name": "Amy", "address": "Apple st 652"},
{ "_id": 4, "name": "Hannah", "address": "Mountain 21"},
{ "_id": 5, "name": "Michael", "address": "Valley 345"},
{ "_id": 6, "name": "Sandy", "address": "Ocean blvd 2"},
{ "_id": 7, "name": "Betty", "address": "Green Grass 1"},
{ "_id": 8, "name": "Richard", "address": "Sky st 331"},
{ "_id": 9, "name": "Susan", "address": "One way 98"},
{ "_id": 10, "name": "Vicky", "address": "Yellow Garden 2"},
{ "_id": 11, "name": "Ben", "address": "Park Lane 38"},
{ "_id": 12, "name": "William", "address": "Central st 954"},
{ "_id": 13, "name": "Chuck", "address": "Main Road 989"},
{ "_id": 14, "name": "Viola", "address": "Sideway 1633"}
]
x = mycol.insert_many(mylist)
Output
C:\Users\My Name>python demo_mongodb_insert_many2.py
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
Python MongoDB Find
In MongoDB we use the find and findOne methods to find data in a collection.
Just like the SELECT statement is used to find data in a table in a MySQL database.
Find One
To select data from a collection in MongoDB, we can use the find_one() method.
Example
import pymongo
myclient = pymongo.MongoClient("mongodb://localhost:27017/")
mydb = myclient["mydatabase"]
mycol = mydb["customers"]
x = mycol.find_one()
print(x)
Output
Find All
To select data from a table in MongoDB, we can also use the find() method.
The first parameter of the find() method is a query object. In this example we use an empty
query object, which selects all documents in the collection.
No parameters in the find() method gives you the same result as SELECT * in MySQL.
Example
Return all documents in the "customers" collection, and print each document:
import pymongo
myclient = pymongo.MongoClient("mongodb://localhost:27017/")
mydb = myclient["mydatabase"]
mycol = mydb["customers"]
for x in mycol.find():
print(x)
Output
The second parameter of the find() method is an object describing which fields to include in
the result.
This parameter is optional, and if omitted, all fields will be included in the result.
Example
Return only the names and addresses, not the _ids:
import pymongo
myclient = pymongo.MongoClient("mongodb://localhost:27017/")
mydb = myclient["mydatabase"]
mycol = mydb["customers"]
Output
You are not allowed to specify both 0 and 1 values in the same object (except if one of the
fields is the _id field). If you specify a field with the value 0, all other fields get the value 1,
and vice versa:
Example
This example will exclude "address" from the result:
import pymongo
myclient = pymongo.MongoClient("mongodb://localhost:27017/")
mydb = myclient["mydatabase"]
mycol = mydb["customers"]
Example
You get an error if you specify both 0 and 1 values in the same object (except if one of the
fields is the _id field):
import pymongo
myclient = pymongo.MongoClient("mongodb://localhost:27017/")
mydb = myclient["mydatabase"]
mycol = mydb["customers"]
for x in mycol.find({},{ "name": 1, "address": 0 }):
print(x)
Python MongoDB Query
When finding documents in a collection, you can filter the result by using a query object.
The first argument of the find() method is a query object, and is used to limit the search.
Example
Find document(s) with the address "Park Lane 38":
import pymongo
myclient = pymongo.MongoClient("mongodb://localhost:27017/")
mydb = myclient["mydatabase"]
mycol = mydb["customers"]
mydoc = mycol.find(myquery)
for x in mydoc:
print(x)
Output
Advanced Query
To make advanced queries you can use modifiers as values in the query object.
E.g. to find the documents where the "address" field starts with the letter "S" or higher
(alphabetically), use the greater than modifier: {"$gt": "S"}:
Example
Find documents where the address starts with the letter "S" or higher:
C:\Users\My Name>python demo_mongodb_query_modifier.py
{'_id': 5, 'name': 'Michael', 'address': 'Valley 345'}
{'_id': 8, 'name': 'Richard', 'address': 'Sky st 331'}
{'_id': 10, 'name': 'Vicky', 'address': 'Yellow Garden 2'}
{'_id': 14, 'name': 'Viola', 'address': 'Sideway 1633'}
myclient = pymongo.MongoClient("mongodb://localhost:27017/")
mydb = myclient["mydatabase"]
mycol = mydb["customers"]
mydoc = mycol.find(myquery)
for x in mydoc:
print(x)
Output
Use the sort() method to sort the result in ascending or descending order.
The sort() method takes one parameter for "fieldname" and one parameter for "direction"
(ascending is the default direction).
Example
Sort the result alphabetically by name:
import pymongo
myclient = pymongo.MongoClient("mongodb://localhost:27017/")
mydb = myclient["mydatabase"]
mycol = mydb["customers"]
mydoc = mycol.find().sort("name")
for x in mydoc:
print(x)
Output
Sort Descending
sort("name", 1) #ascending
sort("name", -1) #descending
Example
import pymongo
myclient = pymongo.MongoClient("mongodb://localhost:27017/")
mydb = myclient["mydatabase"]
mycol = mydb["customers"]
for x in mydoc:
print(x)
Output
Delete Document
The first parameter of the delete_one() method is a query object defining which document
to delete.
Note: If the query finds more than one document, only the first occurrence is deleted.
Example
Delete the document with the address "Mountain 21":
import pymongo
myclient = pymongo.MongoClient("mongodb://localhost:27017/")
mydb = myclient["mydatabase"]
mycol = mydb["customers"]
mycol.delete_one(myquery)
Output
The first parameter of the delete_many() method is a query object defining which documents
to delete.
Example
Delete all documents were the address starts with the letter S:
import pymongo
myclient = pymongo.MongoClient("mongodb://localhost:27017/")
mydb = myclient["mydatabase"]
mycol = mydb["customers"]
x = mycol.delete_many(myquery)
Output
To delete all documents in a collection, pass an empty query object to the delete_many()
method:
Example
Delete all documents in the "customers" collection:
import pymongo
myclient = pymongo.MongoClient("mongodb://localhost:27017/")
mydb = myclient["mydatabase"]
mycol = mydb["customers"]
x = mycol.delete_many({})
Output
C:\Users\My Name>python demo_mongodb_delete_all.py
11 documents deleted.
Python MongoDB Drop Collection
Delete Collection
You can delete a table, or collection as it is called in MongoDB, by using the drop() method.
Example
Delete the "customers" collection:
import pymongo
myclient = pymongo.MongoClient("mongodb://localhost:27017/")
mydb = myclient["mydatabase"]
mycol = mydb["customers"]
mycol.drop()
The drop() method returns true if the collection was dropped successfully, and false if the
collection does not exist.
The first parameter of the update_one() method is a query object defining which document
to update.
Note: If the query finds more than one record, only the first occurrence is updated.
The second parameter is an object defining the new values of the document.
Example
import pymongo
myclient = pymongo.MongoClient("mongodb://localhost:27017/")
mydb = myclient["mydatabase"]
mycol = mydb["customers"]
myquery = { "address": "Valley 345" }
newvalues = { "$set": { "address": "Canyon 123" } }
mycol.update_one(myquery, newvalues)
Update Many
To update all documents that meets the criteria of the query, use the update_many() method.
Example
Update all documents where the address starts with the letter "S":
import pymongo
myclient = pymongo.MongoClient("mongodb://localhost:27017/")
mydb = myclient["mydatabase"]
mycol = mydb["customers"]
x = mycol.update_many(myquery, newvalues)
Output
C:\Users\My Name>python demo_mongodb_update_many.py
2 documents updated.
Python MongoDB Limit
Example
Limit the result to only return 5 documents:
import pymongo
myclient = pymongo.MongoClient("mongodb://localhost:27017/")
mydb = myclient["mydatabase"]
mycol = mydb["customers"]
myresult = mycol.find().limit(5)
Output
C:\Users\My Name>python demo_mongodb_limit.py
{'_id': 1, 'name': 'John', 'address': 'Highway37'}
{'_id': 2, 'name': 'Peter', 'address': 'Lowstreet 27'}
{'_id': 3, 'name': 'Amy', 'address': 'Apple st 652'}
{'_id': 4, 'name': 'Hannah', 'address': 'Mountain 21'}
{'_id': 5, 'name': 'Michael', 'address': 'Valley 345'}
Python Reference
Python Built in Functions
Function Description
delattr() Deletes the specified attribute (property or method) from the specified object
divmod() Returns the quotient and the remainder when argument1 is divided by argument2
hasattr() Returns True if the specified object has the specified attribute (property/method)
map() Returns the specified iterator with the specified function applied to each item
range() Returns a sequence of numbers, starting from 0 and increments by 1 (by default)
Note: All string methods returns new values. They do not change the original string.
Method Description
endswith() Returns true if the string ends with the specified value
find() Searches the string for a specified value and returns the position of where it was found
index() Searches the string for a specified value and returns the position of where it was found
isalpha() Returns True if all characters in the string are in the alphabet
islower() Returns True if all characters in the string are lower case
partition() Returns a tuple where the string is parted into three parts
replace() Returns a string where a specified value is replaced with a specified value
Searches the string for a specified value and returns the last position of where it was
rfind()
found
Searches the string for a specified value and returns the last position of where it was
rindex()
found
rpartition() Returns a tuple where the string is parted into three parts
rsplit() Splits the string at the specified separator, and returns a list
split() Splits the string at the specified separator, and returns a list
startswith() Returns true if the string starts with the specified value
swapcase() Swaps cases, lower case becomes upper case and vice versa
zfill() Fills the string with a specified number of 0 values at the beginning
Python List/Array Methods
Python has a set of built-in methods that you can use on lists/arrays.
Method Description
extend() Add the elements of a list (or any iterable), to the end of the current list
index() Returns the index of the first element with the specified value
Note: Python does not have built-in support for Arrays, but Python Lists can be used instead.
Python has a set of built-in methods that you can use on dictionaries.
Method Description
items() Returns a list containing a tuple for each key value pair
Returns the value of the specified key. If the key does not exist: insert the key, with the
setdefault()
specified value
Python has two built-in methods that you can use on tuples.
Method Description
Searches the tuple for a specified value and returns the position of where it was
index()
found
Python has a set of built-in methods that you can use on sets.
Method Description
difference() Returns a set containing the difference between two or more sets
Removes the items in this set that are also included in another,
difference_update()
specified set
Removes the items in this set that are not present in other, specified
intersection_update()
set(s)
symmetric_difference_update() inserts the symmetric differences from this set and another
update() Update the set with the union of this set and others
Method Description
fileno() Returns a number that represents the stream, from the operating system's perspective
seekable() Returns whether the file allows us to change the file position
Python Keywords
Python has a set of keywords that are reserved words that cannot be used as variable names,
function names, or any other identifiers:
Method Description
as To create an alias
or A logical operator
Python Glossary
Feature Description
Global Variables Global variables are variables that belongs to the global scope
Negative Indexing on a String How to use negative indexing when accessing a string
Evaluate Booleans Evaluate a value or statement and return either True or False
Identity operators are used to see if two objects are in fact the
Identity Operators
same object
Loop Through List Items How to loop through the items in a list
Check if List Item Exists How to check if a specified item is present in a list
Check if Tuple Item Exists How to check if a specified item is present in a tuple
Tuple With One Item How to create a tuple with only one item
Check if Dictionary Item Exists How to check if a specified item is present in a dictionary
The pass Keyword in If Use the pass keyword inside empty if statements
While Continue How to stop the current iteration and continue wit the next
For Continue How to stop the current iteration and continue wit the next
For pass Use the pass keyword inside empty for loops
The pass Statement i Functions Use the pass statement in empty functions
Function Recursion Functions that can call itself is called recursive functions
Why Use Lambda Functions Learn when to use a lambda function or not
What is an Array Arrays are variables that can hold more than one value
The Class __init__() Function The __init__() function is executed when the class is initiated
Object Methods Methods in objects are functions that belongs to the object
self The self parameter refers to the current instance of the class
super Function The super() function make the child class inherit the parent class
Using the dir() Function List all variable names and function names in a module
The strftime Method How to format a date object into a readable string
Date Format Codes The datetime module has a set of legal format codes
Format JSON How to format JSON output with indentations and line breaks
Python has a built-in module that you can use to make random numbers.
Method Description
getstate() Returns the current internal state of the random number generator
choices() Returns a list with a random selection from the given sequence
Returns a random float number between two given parameters, you can also set a
triangular()
mode parameter to specify the midpoint between the two other parameters
Returns a random float number between 0 and 1 based on the Beta distribution
betavariate()
(used in statistics)
Returns a random float number between 0 and 1 based on the Gamma distribution
gammavariate()
(used in statistics)
Returns a random float number between 0 and 1 based on the Gaussian
gauss()
distribution (used in probability theories)
Returns a random float number between 0 and 1 based on the normal distribution
normalvariate()
(used in probability theories)
Returns a random float number between 0 and 1 based on the von Mises
vonmisesvariate()
distribution (used in directional statistics)
Returns a random float number between 0 and 1 based on the Pareto distribution
paretovariate()
(used in probability theories)
Returns a random float number between 0 and 1 based on the Weibull distribution
weibullvariate()
(used in statistics)
Example
import requests
x = requests.get('https://w3schools.com/python/demopage.htm')
print(x.text)
The requests module allows you to send HTTP requests using Python.
The HTTP request returns a Response Object with all the response data (content, encoding,
status, etc).
Navigate your command line to the location of PIP, and type the following:
Methods
Method Description
post(url, data, json, args) Sends a POST request to the specified url
request(method, url, args) Sends a request of the specified method to the specified url
How to Remove Duplicates From a Python
List
Learn how to remove duplicates from a List in Python.
Example
Example Explained
Create a dictionary, using the List items as keys. This will automatically remove any
duplicates because dictionaries cannot have duplicate keys.
Create a Dictionary
mylist = ["a", "b", "a", "c", "c"]
mylist = list( dict.fromkeys(mylist) )
print(mylist)
If you like to have a function where you can send your lists, and get them back without
duplicates, you can create a function and insert the code from the example above.
Example
def my_function(x):
return list(dict.fromkeys(x))
print(mylist)
Output
Example Explained
Create a Function
def my_function(x):
return list(dict.fromkeys(x))
Create a Dictionary
def my_function(x):
return list( dict.fromkeys(x) )
Return List
def my_function(x):
return list(dict.fromkeys(x))
print(mylist)
print(mylist)
dlroW olleH
Example Explained
We have a string, "Hello World", which we want to reverse:
The String to Reverse
txt = "Hello World" [::-1]
print(txt)
Create a slice that starts at the end of the string, and moves backwards.
In this particular example, the slice statement [::-1] means start at the end of the string and
end at position 0, move with the step -1, negative one, which means one step backwards.
Create a Function
If you like to have a function where you can send your strings, and return them backwards,
you can create a function and insert the code from the example above.
Example
def my_function(x):
return x[::-1]
print(mytxt)
Output
print(mytxt)
Slice the string starting at the end of the string and move backwards.
print(mytxt )
print(mytxt)
print(mytxt)