Python Programming For Beginners - 3 Books in 1 - Beginner's Guide, Data Science and Machine Learning
Python Programming For Beginners - 3 Books in 1 - Beginner's Guide, Data Science and Machine Learning
3 books in 1
William Wizner
© Copyright 2020 - All rights reserved.
The content contained within this book may not be reproduced, duplicated or transmitted without direct
written permission from the author or the publisher.
Under no circumstances will any blame or legal responsibility be held against the publisher, or author,
for any damages, reparation, or monetary loss due to the information contained within this book. Either
directly or indirectly.
Legal Notice:
This book is copyright protected. This book is only for personal use. You cannot amend, distribute, sell,
use, quote or paraphrase any part, or the content within this book, without the consent of the author or
publisher.
Disclaimer Notice:
Please note the information contained within this document is for educational and entertainment
purposes only. All effort has been executed to present accurate, up to date, and reliable, complete
information. No warranties of any kind are declared or implied. Readers acknowledge that the author is
not engaging in the rendering of legal, financial, medical or professional advice. The content within this
book has been derived from various sources. Please consult a licensed professional before attempting
any techniques outlined in this book.
By reading this document, the reader agrees that under no circumstances is the author responsible for
any losses, direct or indirect, which are incurred as a result of the use of the information contained
within this document, including, but not limited to, — errors, omissions, or inaccuracies.
Python for beginners
Introduction
Chapter 1: Installing Python
Lосаl Envіrоnmеnt Sеtuр
Gеttіng Pуthоn
Inѕtаllіng Pуthоn
Hеrе Is A Quick Оvеrvіеw Оf Installing Python Оn Various Platforms:
Unіx And Lіnux Installation
Wіndоwѕ Installation
Macintosh Inѕtаllаtіоn
Setting Uр PATH
Sеttіng Path at Unіx/Lіnux
Sеttіng Раth Аt Windows
Pуthоn Environment Variables
Runnіng Pуthоn
Intеrасtіvе Interpreter
Script from The Cоmmаnd-Lіnе
Intеgrаtеd Dеvеlорmеnt Envіrоnmеnt
IDLE
A Fіlе Edіtоr
Edіtіng A Fіlе
How to Improve Yоur Wоrkflоw
Chapter 2: Python Loops and Numbers
Loops
Numbers
Chapter 3: Data Types
String Manipulation
String Formatting
Type Casting
Assignment and Formatting Exercise
Chapter 4: Variable in Python
Variable Vs. Constants
Variables Vs. Literals
Variables Vs. Arrays
Classifications of Python Arrays Essential for Variables
Naming Variables
Learning Python Strings, Numbers and Tuple
Types of Data Variables
Chapter 5: Inputs, Printing, And Formatting Outputs
Inputs
Printing and Formatting Outputs
Input and Formatting Exercise
Chapter 6: Mathematical Notation, Basic Terminology, and Building
Machine Learning Systems
Mathematical Notation for Machine Learning
Terminologies Used for Machine Learning
Chapter 7: Lists and Sets Python
Lists
Sets
Chapter 8: Conditions Statements
“if” statements
Else Statements
Code Blocks
While
For Loop
Break
Infinite Loop
Continue
Practice Exercise
Chapter 9: Iteration
While Statement
Definite and Indefinite Loops
The for Statement
Chapter 10: Functions and Control Flow Statements in Python
What is a Function?
Defining Functions
Call Function
Parameters of Function
Default Parameters
What is the control flow statements?
break statement
continue statement
pass statement
else statement
Conclusion:
Python for data science
Introduction:
Chapter 1: What is Data Analysis?
Chapter 2: The Basics of the Python Language
The Statements
The Python Operators
The Keywords
Working with Comments
The Python Class
How to Name Your Identifiers
Python Functions
Chapter 3: Using Pandas
Pandas
Chapter 4: Working with Python for Data Science
Why Python Is Important?
What Is Python?
Python's Position in Data Science
Data Cleaning
Data Visualization
Feature Extraction
Model Building
Python Installation
Installation Under Windows
Conda
Spyder
Installation Under MAC
Installation Under Linux
Install Python
Chapter 5: Indexing and Selecting Arrays
Conditional selection
NumPy Array Operations
Array – Array Operations
Array – Scalar operations
Chapter 6: K-Nearest Neighbors Algorithm
Splitting the Dataset
Feature Scaling
Training the Algorithm
Evaluating the Accuracy
K Means Clustering
Data Preparation
Visualizing the Data
Creating Clusters
Chapter 7: Big Data
The Challenge
Applications in the Real World
Chapter 8: Reading Data in your Script
Reading data from a file
Dealing with corrupt data
Chapter 9: The Basics of Machine Learning
The Learning Framework
PAC Learning Strategies
The Generalization Models
Chapter 10: Using Scikit-Learn
Uses of Scikit-Learn
Representing Data in Scikit-Learn
Tabular Data
Features Matrix
Target Arrays
Understanding the API
Conclusion:
Machine learning with Python
Introduction:
Chapter 1: Python Installation
Anaconda Python Installation
Jupyter Notebook
Fundamentals of Python programming
Chapter 2: Python for Machine Learning
Chapter 3: Data Scrubbing
What is Data Scrubbing?
Removing Variables
One-hot Encoding
Drop Missing Values
Chapter 4: Data Mining Categories
Predictive Modeling
Analysis of Associations
Group Analysis
Anomaly Detection
Chapter 5: Difference Between Machine Learning and AI
What is artificial intelligence?
How is machine learning different?
Chapter 6: K-Means Clustering
Data Preparation
Visualizing the Data
Creating Clusters
Chapter 7: Linear Regression with Python
Chapter 8: Feature Engineering
Rescaling Techniques
Creating Derived Variables
Non-Numeric Features
Chapter 9: How Do Convolutional Neural Networks Work?
Pixels and Neurons
The Pre-Processing
Convolutions
Filter: Kernel Set
Activation Function
Subsampling
Subsampling with Max-Pooling
Now, More Convolutions!
Connect With a "Traditional" Neural Network
Chapter 10: Top AI Frameworks and Machine Learning Libraries
TеnѕоrFlоw
Ѕсikit-lеаrn
AI as a Dаtа Analyst
Thеаnо
Caffe
Keras
Miсrоѕоft Cоgnitivе Tооlkit
PyTorch
Tоrсh
Chapter 11: The Future of Machine Learning
Conclusion:
Python for beginners:
LEARN CODING, PROGRAMMING, DATA ANALYSIS, AND
ALGORITHMIC THINKING WITH THE LATEST PYTHON
CRASH COURSE. A STARTER GUIDE WITH TIPS AND TRICKS
FOR THE APPRENTICE PROGRAMMER.
William Wizner
© Copyright 2020 - All rights reserved.
The content contained within this book may not be reproduced, duplicated or transmitted without direct
written permission from the author or the publisher.
Under no circumstances will any blame or legal responsibility be held against the publisher, or author,
for any damages, reparation, or monetary loss due to the information contained within this book. Either
directly or indirectly.
Legal Notice:
This book is copyright protected. This book is only for personal use. You cannot amend, distribute, sell,
use, quote or paraphrase any part, or the content within this book, without the consent of the author or
publisher.
Disclaimer Notice:
Please note the information contained within this document is for educational and entertainment
purposes only. All effort has been executed to present accurate, up to date, and reliable, complete
information. No warranties of any kind are declared or implied. Readers acknowledge that the author is
not engaging in the rendering of legal, financial, medical or professional advice. The content within this
book has been derived from various sources. Please consult a licensed professional before attempting
any techniques outlined in this book.
By reading this document, the reader agrees that under no circumstances is the author responsible for
any losses, direct or indirect, which are incurred as a result of the use of the information contained
within this document, including, but not limited to, — errors, omissions, or inaccuracies.
Introduction
So, you have heard about this programming language that everyone considers
amazing, easy and fast…. the language of the future. You sit with your
friends, and all they have to talk about is essentially gibberish to you, and yet
it seems interesting to the rest of them. Perhaps you plan to lead a business,
and a little research into things reveals that a specific language is quite a lot
in demand these days. Sure enough, you can hire someone to do the job for
you, but how would you know if the job is being done the way you want it to
be, top-notch in quality and original in nature?
Whether you aim to pursue a career out of this journey, you are about to
embark on or set up your own business to serve hundreds of thousands of
clients who are looking for someone like you; you need to learn Python.
When it comes to Python, there are so many videos and tutorials which you
can find online. The problem is that each seems to be heading in a different
direction. There is no way to tell which structure you need to follow, or
where you should begin and where should it end. There is a good possibility
you might come across a video that seemingly answers your call, only to find
out that the narrator is not explaining much and pretty much all you see, you
have to guess what it does.
I have seen quite a few tutorials like that by myself. They can be annoying
and some even misleading. Some programmers will tell you that you are
already too late to learn Python and that you will not garner the kind of
success you seek out for yourself. Let me put such rumors and ill-messages to
rest.
● Age – It is just a number. What truly matters are the desire you have to
learn. You do not need to be X years old to learn this effectively.
Similarly, there is no upper limit of Y years for the learning process. You
can be 60 and still be able to learn the language and execute brilliant
commands. All it requires is a mind that is ready to learn and a piece of
good knowledge on how to operate a computer, open and close programs,
and download stuff from the internet. That’s it!
● Language – Whether you are a native English speaker or a non-native
one, the language is open for all. As long as you can form basic sentences
and make sense out of them, you should easily be able to understand the
language of Python itself. It follows something called the “clean-code”
concept, which effectively promotes the readability of codes.
● Python is two decades old already – If you are worried that you are two
decades late, let me remind you that Python is a progressive language in
nature. That means, every year, we find new additions to the language of
Python, and some obsolete components are removed as well. Therefore,
the concept of “being too late” already stands void. You can learn today,
and you will already be familiar with every command by the end of a
year. Whatever has existed so far, you will already know. What would
follow then, you will eventually pick up. There is no such thing as being
too late to learn Python.
Of course, some people are successful and some not. Everything boils down
to how effectively and creatively you use the language to execute problems
and solutions. The more original your program is, the better you fare off.
“I vow that I will give my best to learn the language of Python and master the
basics. I also promise to practice writing codes and programs after I am done
with this book.”
Bravo! You just took the first step. Now, we are ready to turn the clock back
a little and see exactly where Python came from. If you went through the
introduction, I gave you a brief on how Python came into existence, but I left
out quite a few parts. Let us look into those and see why Python was the need
of the hour.
Before the inception of Python, and the famous language that it has gone on
to become, things were quite different. Imagine a world where programmers
gathered from across the globe in a huge computer lab. You have some of the
finest minds from the planet, working together towards a common goal,
whatever that might be. Naturally, even the finest intellectuals can end up
making mistakes.
Suppose one such programmer ended up creating a program, and he is not too
sure of what went wrong. The room is full of other programmers, and sure
enough, approaching someone for assistance would be the first thought of the
day. The programmer approaches another busy person who gladly decides to
help out a fellow intellectual programmer. Within that brief walk from one
station to the other, the programmer quickly exchanges the information,
which seems to be a common error. It is only when the programmer views the
code that they are caught off-guard. This fellow member has no idea what
any of the code does. The variables are labeled with what can only be defined
as encryptions. The words do not make any sense, nor is there any way to
find out where the error lies.
The compiler continues to throw in error after error. Remember, this was well
before 1991 when people did not have IDEs, which would help them see
where the error is and what needs to be done. The entire exercise would end
up wasting hours upon hours just to figure out that a semi-colon was missing.
Embarrassing and time-wasting!
This was just a small example, imagine the entire thing but on a global scale.
The programming community struggled to find ways to write codes that
could be understood easily by others. Some languages supported some
syntaxes, while others did not. These languages would not necessarily work
in harmony with each other, either. The world of programming was a mess.
Had Python not come at the opportune moment that it did, things would have
been so much more difficult for us to handle.
Guido Van Rossum, a Dutch-programmer, decided to work on a pet project.
Yes, you read that, right! Mr. Van Rossum wanted to keep himself occupied
during the holiday season and, hence, decided to write a new interpreter for a
language he had been thinking of lately. He decided to call the language
Python, and contrary to popular belief, it has nothing to do with the reptile
itself. Tracing its root from its predecessor, the ABC, Python came into
existence just when it was needed.
For our non-programming friends, ABC is the name of an old programming
language. Funny as it may sound, naming conventions wasn't exactly the
strongest here.
Python was quickly accepted by the programming community, albeit there is
the fact that programmers were a lot less numerous back then. It’s
revolutionary user-friendliness, responsive nature and adaptability
immediately caught the attention of everyone around. The more people
vested their time into this new language, the more Mr. Van Rossum started
investing his resources and knowledge to enhance the experience further.
Within a short period, Python was competing against the then leading
languages of the world. It soon went on to outlive quite a few of them owing
to the core concept is brought to the table: ease of readability. Unlike any
other programming language of that time, Python delivered codes that were
phenomenally easy to read and understand right away.
Remember our friend, the programmer, who asked for assistance? If he were
to do that now, the other fellow would immediately understand what was
going on.
Python also acquired fame for being a language that had an object-oriented
approach. This opened more usability of the language to the programmers
who required an effective way to manipulate objects. Think of a simple game.
Anything you see within it is an object that behaves in a certain way. Giving
that object that ‘sense’ is object-oriented programming (OOP). Python was
able to pull that off rather easily. Python is considered as a multi-paradigm
language, with OOP being a part of that as well.
Fast forward to the world we live in, and Python continues to dominate some
of the cutting-edge technologies in existence. With real-world applications
and a goliath of a contribution to aspects like machine learning, data sciences,
and analytics, Python is leading the charge with full force.
An entire community of programmers has dedicated their careers to maintain
Python and develop it as time goes by. As for the founder, Mr. Van Rossum
initially accepted the title of Benevolent Dictator for Life (BDFL) and retired
on 12 July 2018. This title was bestowed upon Mr. Van Rossum by the
Python community.
Today, Python 3 is the leading version of the language alongside Python 2,
which has its days numbered. You do not need to learn both of these to
succeed. We will begin with the latest version of Python as almost everything
that was involved in the previous version was carried forward, except for
components that were either dull or useless.
I know, right about now you are rather eager to dive into the concepts and get
done with history. It is vital for us to learn a few things about the language
and why it came into existence in the first place. This information might be
useful at some point in time, especially if you were to look at various codes
and identify which one of those was written in Python and which one was
not.
For anyone who may have used languages like C, C++, C#, JavaScript, you
might find quite a few similarities within Python, and some major
improvements too. Unlike in most of these languages, where you need to use
a semicolon to let the compiler know that the line has ended, Python needs
none of that. Just press enter and the program immediately understands that
the line has ended.
Before we do jump ahead, remember how some skeptics would have you
believe it is too late to learn Python? It is because of Python that self-driving
cars are coming into existence. Has the world seen too much of them
already? When was the last time you saw one of these vehicles on the road?
This is just one of a gazillion possibilities that lay ahead for us to conquer.
All it needs is for us to learn the language, brush up our skills, and get started.
“A journey to a thousand miles begins with the first step. After that, you are
already one step closer to your destination.”
Chapter 1: Installing Python
Wіndоwѕ Installation
Hеrе аrе thе steps to install Pуthоn оn Wіndоwѕ mасhіnе.
Oреn a Wеb brоwѕеr and go tо
httрѕ://www.руthоn.оrg/dоwnlоаdѕ/.
Follow thе link fоr thе Wіndоwѕ іnѕtаllеr руthоn-XYZ.mѕі
fіlе whеrе XYZ is the vеrѕіоn уоu nееd tо install.
To uѕе thіѕ іnѕtаllеr руthоn-XYZ.mѕі, thе Wіndоwѕ ѕуѕtеm
must ѕuрроrt Mісrоѕоft Inѕtаllеr 2.0. Sаvе thе іnѕtаllеr file
tо уоur lосаl mасhіnе and thеn run іt to fіnd оut іf уоur
machine supports MSI.
Run the dоwnlоаdеd fіlе. Thіѕ brіngѕ up thе Pуthоn іnѕtаll
wіzаrd, which іѕ rеаllу еаѕу tо uѕе. Just ассерt thе dеfаult
ѕеttіngѕ, wаіt until thе install is fіnіѕhеd, and you аrе dоnе.
Macintosh Inѕtаllаtіоn
Rесеnt Macs соmе wіth Python installed, but іt may be several years
оut оf dаtе. Sее httр://www.руthоn.оrg/dоwnlоаd/mас/ fоr іnѕtruсtіоnѕ оn
gеttіng thе сurrеnt vеrѕіоn along wіth еxtrа tооlѕ tо support dеvеlорmеnt оn
thе Mас. For оldеr Mас OS'ѕ bеfоrе Mac OS X 10.3 (rеlеаѕеd іn 2003),
MacPython is available.
Setting Uр PATH
Programs and оthеr еxесutаblе fіlеѕ can be in mаnу directories, ѕо ореrаtіng
ѕуѕtеmѕ рrоvіdе a ѕеаrсh раth that lists thе directories that thе OS searches
fоr еxесutаblеѕ.
The раth is ѕtоrеd іn аn еnvіrоnmеnt variable, which іѕ a nаmеd ѕtrіng
mаіntаіnеd bу the ореrаtіng ѕуѕtеm. Thіѕ variable contains information
аvаіlаblе tо thе соmmаnd ѕhеll аnd оthеr programs. Thе раth vаrіаblе is
nаmеd аѕ PATH іn Unix оr Path in Wіndоwѕ (Unіx іѕ саѕе ѕеnѕіtіvе;
Windows іѕ not).
In Mас OS, the іnѕtаllеr hаndlеѕ the path details. Tо іnvоkе thе Python
interpreter from аnу раrtісulаr dіrесtоrу, уоu muѕt add thе Pуthоn dіrесtоrу
to уоur path.
Sеttіng Path at Unіx/Lіnux
Tо add thе pуthоn directory to thе раth fоr a раrtісulаr session іn Unix:
Runnіng Pуthоn
Thеrе аrе three dіffеrеnt wауѕ to start Python:
Intеrасtіvе Interpreter
Yоu can ѕtаrt Python frоm Unix, DOS, or аnу other ѕуѕtеm thаt рrоvіdеѕ уоu
a соmmаnd-lіnе interpreter оr ѕhеll window.
Enter руthоn the соmmаnd line.
Stаrt coding rіght аwау in thе interactive іntеrрrеtеr.
$руthоn # Unіx/Lіnux
or
руthоn% # Unіx/Lіnux
оr
C:> руthоn # Windows/DOS
Here іѕ thе lіѕt оf аll thе available command line орtіоnѕ:
Sr.No. Option Dеѕсrірtіоn
1 -d It provides debug оutрut.
2 -O
It gеnеrаtеѕ optimized bуtесоdе (resulting in. руо
fіlеѕ).
3 -S
Dо not run іmроrt ѕіtе to look for Pуthоn раthѕ оn
ѕtаrtuр.
4 -v Verbose оutрut (dеtаіlеd trасе оn іmроrt
ѕtаtеmеntѕ).
7 File
Run Python ѕсrірt from given fіlе
A Fіlе Edіtоr
Every рrоgrаmmеr nееdѕ to bе аblе to edit аnd ѕаvе text fіlеѕ. Python
programs аrе fіlеѕ with thе .py extension that соntаіn lines of Python code.
Pуthоn IDLE gives уоu thе аbіlіtу tо create and еdіt thеѕе fіlеѕ with ease.
Pуthоn IDLE аlѕо рrоvіdеѕ ѕеvеrаl uѕеful fеаturеѕ thаt you’ll see іn
рrоfеѕѕіоnаl IDEs, lіkе bаѕіс ѕуntаx hіghlіghtіng, code completion, аnd auto-
indentation. Prоfеѕѕіоnаl IDEѕ аrе mоrе robust pieces of ѕоftwаrе аnd they
have a ѕtеер lеаrnіng сurvе. If you’re juѕt bеgіnnіng уоur Pуthоn
рrоgrаmmіng jоurnеу, thеn Pуthоn IDLE is a grеаt alternative!
Edіtіng A Fіlе
Onсе you’ve ореnеd a fіlе іn Pуthоn IDLE, you саn thеn mаkе changes tо іt.
Whеn уоu’rе rеаdу tо edit a fіlе, you’ll see ѕоmеthіng lіkе thіѕ:
3 Nеѕtеd lоорѕ
You can uѕе one or mоrе loop іnѕіdе any аnоthеr
whіlе, fоr оr do. while lоор.
Numbers
Number dаtа tуреѕ ѕtоrе numеrіс vаluеѕ. They аrе іmmutаblе dаtа types,
means thаt сhаngіng thе value оf a number dаtа tуре results in a nеwlу
аllосаtеd оbjесt. Numbеr objects are сrеаtеd whеn уоu assign a vаluе tо
them. For example:
var1 = 1
vаr2 = 10
Yоu саn also dеlеtе the reference to a number оbjесt bу using thе del
statement. The ѕуntаx оf thе dеl ѕtаtеmеnt is:
Dеl vаr1[, vаr2[, vаr3[...., vаrN]
Yоu can dеlеtе a ѕіnglе оbjесt оr multiple оbjесtѕ bу uѕіng the dеl ѕtаtеmеnt.
For еxаmрlе:
Dеl var
Del vаr_а, vаr_b
Pуthоn Ѕuрроrtѕ Fоur Dіffеrеnt Numerical Tуреѕ
1. Іnt (Ѕіgnеd Integers): Thеу are оftеn саllеd juѕt іntеgеrѕ оr іntѕ, аrе
роѕіtіvе оr negative whоlе numbers with no decimal point.
2. Lоng (Long Іntеgеrѕ): Alѕо саllеd lоngѕ, they are іntеgеrѕ оf unlіmіtеd
ѕіzе, wrіttеn like integers and fоllоwеd bу аn uрреrсаѕе оr lоwеrсаѕе L.
3. Flоаt (Flоаtіng Роіnt Rеаl Vаluеѕ): Also called flоаtѕ, they rерrеѕеnt rеаl
numbеrѕ аnd аrе wrіttеn wіth a dесіmаl роіnt dividing the integer аnd
frасtіоnаl раrtѕ. Flоаtѕ mау аlѕо be in ѕсіеntіfіс notation, wіth E оr e
indicating thе роwеr of 10 (2.5e2 = 2.5 x 102 = 250).
4. Соmрlеx (Соmрlеx Numbеrѕ): аrе of thе fоrm a + bJ, whеrе a аnd b are
floats аnd J (оr j) represents thе ѕ ԛ uаrе rооt of -1 (whісh іѕ аn іmаgіnаrу
number). Thе rеаl раrt of thе numbеr іѕ a, and thе imaginary раrt is b.
Cоmрlеx numbers аrе nоt used muсh in Python programming.
Examples
Hеrе Are Ѕоmе Еxаmрlеѕ Оf Numbers
Pуthоn allows you to uѕе a lоwеrсаѕе L wіth lоng, but іt іѕ rесоmmеndеd that
уоu use оnlу an uрреrсаѕе L tо аvоіd соnfuѕіоn with thе numbеr 1. Python
dіѕрlауѕ long іntеgеrѕ wіth аn uрреrсаѕе L.
A complex numbеr соnѕіѕtѕ оf ordered раіr оf rеаl floating роіnt numbers
dеnоtеd bу a + bj, whеrе a is thе rеаl раrt аnd b is thе imaginary part of thе
соmрlеx numbеr.
Numbеr Tуре Conversion
Pуthоn соnvеrtѕ numbers internally in an expression containing mixed tуреѕ
tо a соmmоn type fоr evaluation. But ѕоmеtіmеѕ, you nееd tо соеrсе a
numbеr еxрlісіtlу frоm оnе tуре tо another tо ѕаtіѕfу thе requirements оf аn
ореrаtоr оr funсtіоn parameter.
Mathematical Functions →
Pуthоn includes fоllоwіng funсtіоnѕ thаt реrfоrm mаthеmаtісаl саlсulаtіоnѕ.
SR.NO Functions and Return Description
1 аbѕ(x) Thе absolute value оf x: thе
(positive) dіѕtаnсе bеtwееn x аnd
zеrо.
Trigonometric Funсtіоnѕ
Pуthоn includes fоllоwіng funсtіоnѕ thаt perform trіgоnоmеtrіс calculations.
String Manipulation
When it comes to manipulating strings, we can combine strings in more or
less the exact way we combine numbers. All you must do is insert an
additional operator in between two strings to combine them. Try replicating
the code below:
Str_1 = "Words "
Str_2 = "and "
Str_3 = "more words."
Str_4 = Str_1 + Str_2 + Str_3
print (Str_4)
What you should get back is: “Words and more words.”
Python provides many easy-to-use, built-in commands you can use to alter
strings. For instance, adding. upper () to a string will make all characters in
the string uppercase while using. lower () on the string will make all the
characters in the string lowercase. These commands are called “functions,”
and we’ll go into them in greater detail, but for now know that Python has
already done much of the heavy lifting for you when it comes to
manipulating strings.
String Formatting
Other methods of manipulating strings include string formatting
accomplished with the “%” operator. The fact that the “%” symbol returns
remainders when carrying out mathematical operations, but it has another use
when working with strings. In the context of strings, however, the % symbol
allows you to specify values/variables you would like to insert into a string
and then have the string filled in with those values in specified areas. You can
think of it like sorting a bunch of labeled items (the values beyond the %
symbol) into bins (the holes in the string you’ve marked with %).
Try running this bit of code to see what happens:
String_to_print = "With the modulus operator, you can add %s, integers like
%d, or even floats like %2.1f." % ("strings", 25, 12.34)
print (String_to_print)
Type Casting
The term “type casting” refers to the act of converting data from one type to
another type. As you program, you may often find out that you need to
convert data between types. There are three helpful commands that Python
has which allow the quick and easy conversion between data types: int (),
float () and str ().
All three of the above commands convert what is placed within the
parenthesis to the data type outside the parentheses. This means that to
convert a float into an integer, you would write the following:
int (float here)
Because integers are whole numbers, anything after the decimal point in a
float is dropped when it is converted into an integer. (Ex. 3.9324 becomes 3,
4.12 becomes 4.) Note that you cannot convert a non-numerical string into an
integer, so typing: int (“convert this”) would throw an error.
The float () command can convert integers or certain strings into floats.
Providing either an integer or an integer in quotes (a string representation of
an integer) will convert the provided value into a float. Both 5 and “5”
become 5.0.
Finally, the str () function is responsible for the conversion of integers and
floats to strings. Plug any numerical value into the parenthesis and get back a
string representation of it.
We’ve covered a fair amount of material so far. Before we go any farther,
let’s do an exercise to make sure that we understand the material we’ve
covered thus far.
Assignment and Formatting Exercise
Here’s an assignment. Write a program that does the following:
● Assigns a numerical value to a variable and changes the value in some
way.
● Assigns a string value to some variable.
● Prints the string and then the value using string formatting.
● Converts the numerical data into a different format and prints the new
data form.
Give it your best shot before looking below for an example of how this could
be done.
When writing complex codes, your program will demand data essential to
conduct changes when you proceed with your executions. Variables are,
therefore, sections used to store code values created after you assign a value
during program development. Python, unlike other related language
programming software, lacks the command to declare a variable as they
change after being set. Besides, Python values are undefined like in most
cases of programming in other computer languages.
Variation in Python is therefore described as memory reserves used for
storing data values. As such, Python variables act as storage units, which feed
the computer with the necessary data for processing. Each value comprises of
its database in Python programming, and every data are categorized as
Numbers, Tuple, Dictionary, and List, among others. As a programmer, you
understand how variables work and how helpful they are in creating an
effective program using Python. As such, the tutorial will enable learners to
understand declare, re-declare, and concatenate, local and global variables as
well as how to delete a variable.
Variable Vs. Constants
Variables and constants are two components used in Python programming
but perform separate functions. Variables, as well as constants, utilize values
used to create codes to execute during program creation. Variables act as
essential storage locations for data in the memory, while constants are
variables whose value remains unchanged. In comparison, variables store
reserves for data while constants are a type of variable files with consistent
values written in capital letters and separated by underscores.
Variables Vs. Literals
Variables also are part of literals which are raw data fed on either variable or
constant with several literals used in Python programming. Some of the
common types of literals used include Numeric, String, and Boolean, Special
and Literal collections such as Tuple, Dict, List, and Set. The difference
between variables and literals arises where both deal with unprocessed data
but variables store the while laterals feed the data to both constants and
variables.
Variables Vs. Arrays
Python variables have a unique feature where they only name the values and
store them in the memory for quick retrieval and supplying the values when
needed. On the other hand, Python arrays or collections are data types used in
programming language and categorized into a list, tuple, set, and dictionary.
When compared to variables, the array tends to provide a platform to include
collectives functions when written while variables store all kinds of data
intended. When choosing your charming collection, ensure you select the one
that fits your requirements henceforth meaning retention of meaning,
enhancing data security and efficiency.
Classifications of Python Arrays Essential for Variables
Lists
Python lists offer changeable and ordered data and written while
accompanying square brackets, for example, "an apple," "cherry." Accessing
an already existing list by referring to the index number while with the ability
to write negative indexes such as '-1' or '-2'. You can also maneuver within
your list and select a specific category of indexes by first determining your
starting and endpoints. The return value with therefore be the range of
specified items. You can also specify a scale of negative indexes, alter the
value of the current item, loop between items on the list, add or remove
items, and confirming if items are available.
Naming Variables
The naming of variables remains straightforward, and both beginners and
experienced programmers can readily perform the process. However,
providing titles to these variables accompany specific rules to ensure the
provision of the right name. Consistency, style, and adhering to variable
naming rules ensure that you create an excellent and reliable name to use
both today and the future. The rules are:
Unsigned int
Unsigned int also referred to, as unsigned integers are data types for storing
up to 2 bytes of values but do not include negative numbers. The numbers are
all positive with a range of 0 to 65,535 with Duo stores of up to 4 bytes for
32-byte values, which range from 0 to 4,294,967,195. In comparison,
unsigned integers comprise positive values and have a much higher bit.
However, ints take mostly negative values and have a lower bit hence store
chapters with fewer values. The syntax for unsigned int is ‘unsigned int var =
val;’ while an example code being ‘unsigned int ledPin = 13;’
Float
Float data types are values with point numbers, that is to say, a number with a
decimal point. Floating numbers usually indicate or estimate analog or
continuous numbers, as they possess a more advanced resolution compared to
integers. The numbers stored may range from the highest of 7.5162306E+38
and the lowest of -3.2095174E+38. Floating-point numbers remain stored in
the form of 32 bits taking about 4 bytes per information fed.
Unsigned Long
This is data types of variables with an extended size hence it stores values
with larger storages compare to other data types. It stores up to 32 bits for 4
bytes and does not include negative numbers henceforth has a range of 0 to
4,294,967,295. The syntax for the unsigned long data type is 'unsigned long
var = val;’ essential for storing characters with much larger sizes.
Chapter 5: Inputs, Printing, And Formatting Outputs
Inputs
So far, we’ve only been writing programs that only use data we have
explicitly defined in the script. However, your programs can also take in
input from the user and utilize it. Python lets us solicit inputs from the user
with a very intuitively named function - the input () function. Writing out the
code input () enabless us to prompt the user for information, which we can
further manipulate. We can take the user input and save it as a variable, print
it straight to the terminal, or do anything else we might like.
When we use the input function, we can pass in a string. The user will see
this string as a prompt, and their response to the prompt will be saved as the
input value. For instance, if we wanted to query the user for their favorite
food, we could write the following:
favorite_food = input ("What is your favorite food? ")
If you ran this code example, you would be prompted for your favorite food.
You could save multiple variables this way and print them all at once using
the print () function along with print formatting, as we covered earlier. To be
clear, the text that you write in the input function is what the user will see as a
prompt; it isn’t what you are inputting into the system as a value.
When you run the code above, you’ll be prompted for an input. After you
type in some text and hit the return key, the text you wrote will be stored as
the variable favorite_food. The input command can be used along with string
formatting to inject variable values into the text that the user will see. For
instance, if we had a variable called user_name that stored the name of the
user, we could structure the input statement like this:
favorite_food = input (" What is ()’s favorite food? "). format (" user name
here")
Printing and Formatting Outputs
We’ve already dealt with the print () function quite a bit, but let’s take some
time to address it again here and learn a bit more about some of the more
advanced things you can do with it.
By now, you’ve gathered that it prints whatever is in the parentheses to the
terminal. In addition, you’ve learned that you can format the printing of
statements with either the modulus operator (%) or the format function (.
format ()). However, what should we do if we are in the process of printing a
very long message?
In order to prevent a long string from running across the screen, we can use
triple quotes that surround our string. Printing with triple quotes allows us to
separate our print statements onto multiple lines. For example, we could print
like this:
print (''' By using triple quotes we can
divide our print statement onto multiple
lines, making it easier to read. ''')
Formatting the print statement like that will give us:
By using triple quotes, we can
divide our print statement onto multiple
lines, making it easier to read.
What if we need to print characters that are equivalent to string formatting
instructions? For example, if we ever needed to print out the characters “%s
“or “%d “, we would run into trouble. If you recall, these are string
formatting commands, and if we try to print these out, the interpreter will
interpret them as formatting commands.
Here’s a practical example. As mentioned, typing “/t” in our string will put a
tab in the middle of our string. Assume we type the following:
print (“We want a \t here, not a tab.”)
We’d get back this:
We want a here, not a tab.
By using an escape character, we can tell Python to include the characters
that come next as part of the string’s value. The escape character we want to
use is the “raw string” character, an “r” before the first quote in a string, like
this:
print (r"We want a \t here, not a tab.")
So, if we used the raw string, we’d get the format we want back:
We want a \t here, not a tab.
The “raw string” formatter enables you to put any combination of characters
you’d like within the string and have it to be considered part of the string’s
value.
However, what if we did want the tab in the middle of our string? In that
case, using special formatting characters in our string is referred to as using
“escape characters.” “Escaping” a string is a method of reducing the
ambiguity in how characters are interpreted. When we use an escape
character, we escape the typical method that Python uses to interpret certain
characters, and the characters we type are understood to be part of the string’s
value. The escape primarily used in Python is the backslash (\). The
backslash prompts Python to listen for a unique character to follow that will
translate to a specific string formatting command.
We already saw that using the “\t” escape character puts a tab in the middle
of our string, but there are other escape characters we can use as well.
\n - Starts a new line
\\ - Prints a backslash itself
\” - Prints out a double quote instead of a double quote marking the end of
a string
\’ - Like above but prints out a single quote
Input and Formatting Exercise
Let’s do another exercise that applies what we’ve covered in this section.
You should try to write a program that does the following:
5. Set theory
To describe a list of distinct elements: Set
6. Statistics
To describe the median value of variable x: Median
To describe the correlation between variables X and Y: Correlation
To describe the standard deviation of a sample set: Sample standard deviation
To describe the population standard deviation: Standard deviation
To describe the variance of a subset of a population: Sample variance
To describe the variance of a population value: Population variance
To describe the mean of a subset of a population: Sample mean
To describe the mean of population values: Population means
Terminologies Used for Machine Learning
The following terminologies are what you will encounter most often during
machine learning. You may be getting into machine learning for professional
purposes or even as an artificial intelligence (AI) enthusiast. Anyway,
whatever your reasons, the following are categories and subcategories of
terminologies that you will need to know and probably understand to get
along with your colleagues. In this section, you will get to see the significant
picture explanation and then delve into the subcategories. Here are machine-
learning terms that you need to know:
1. Natural language processing (NLP)
Natural language is what you as a human, use, i.e., human language. By
definition, NLP is a way of machine learning where the machine learns your
human form of communication. NLP is the standard base for all if not most
machine languages that allow your device to make use of human (natural)
language. This NLP ability enables your machine to hear your natural
(human) input, understand it, execute it then give a data output. The device
can realize humans and interact appropriately or as close to appropriate as
possible.
There are five primary stages in NLP: machine translation, information
retrieval, sentiment analysis, information extraction, and finally question
answering. It begins with the human query which straight-up leads to
machine translation and then through all the four other processes and finally
ending up in question explaining itself. You can now break down these five
stages into subcategories as suggested earlier:
Text classification and ranking - This step is a filtering mechanism that
determines the class of importance based on relevance algorithms that filter
out unwanted stuff such as spam or junk mail. It filters out what needs
precedence and the order of execution up to the final task.
Sentiment analysis - This analysis predicts the emotional reaction of a human
towards the feedback provided by the machine. Customer relations and
satisfaction are factors that may benefit from sentiment analysis.
Document summarization - As the phrase suggests, this is a means of
developing short and precise definitions of complex and complicated
descriptions. The overall purpose is to make it easy to understand.
Named-Entity Recognition (NER) - This activity involves getting structured
and identifiable data from an unstructured set of words. The machine learning
process learns to identify the most appropriate keywords, applies those words
to the context of the speech, and tries to come up with the most appropriate
response. Keywords are things like company name, employee name, calendar
date, and time.
Speech recognition - An example of this mechanism can easily be appliances
such as Alexa. The machine learns to associate the spoken text to the speech
originator. The device can identify audio signals from human speech and
vocal sources.
It understands Natural language and generation - As opposed to Named-
Entity Recognition; these two concepts deal with human to computer and
vice versa conversions. Natural language understanding allows the machine
to convert and interpret the human form of spoken text into a coherent set of
understandable computer format. On the other hand, natural language
generation does the reverse function, i.e., transforming the incorrect computer
format to the human audio format that is understandable by the human ear.
Machine translation - This action is an automated system of converting one
written human language into another human language. Conversion enables
people from different ethnic backgrounds and different styles to understand
each other. An artificial intelligence entity that has gone through the process
of machine learning carries out this job.
2. Dataset
A dataset is a range of variables that you can use to test the viability and
progress of your machine learning. Data is an essential component of your
machine learning progress. It gives results that are indicative of your
development and areas that need adjustments and tweaking for fine-tuning
specific factors. There are three types of datasets:
Training data - As the name suggests, training data is used to predict patterns
by letting the model learn via deduction. Due to the enormity of factors to be
trained on, yes, there will be factors that are more important than others are.
These features get a training priority. Your machine-learning model will use
the more prominent features to predict the most appropriate patterns required.
Over time, your model will learn through training.
Validation data - This set is the data that is used to micro tune the small tiny
aspects of the different models that are at the completion phase. Validation
testing is not a training phase; it is a final comparison phase. The data
obtained from your validation is used to choose your final model. You get to
validate the various aspects of the models under comparison and then make a
final decision based on this validation data.
Test data - Once you have decided on your final model, test data is a stage
that will give you vital information on how the model will handle in real life.
The test data will be carried out using an utterly different set of parameters
from the ones used during both training and validation. Having the model go
through this kind of test data will give you an indication of how your model
will handle the types of other types of inputs. You will get answers to
questions such as how will the fail-safe mechanism react. Will the fail-safe
even come online in the first place?
3. Computer vision
Computer vision is responsible for the tools providing a high-level analysis of
image and video data. Challenges that you should look out for in computer
vision are:
Image classification - This training allows the model to identify and learn
what various images and pictorial representations are. The model needs to
retain a memory of a familiar-looking image to maintain mind and identify
the correct image even with minor alterations such as color changes.
Object detection - Unlike image classification, which detects whether there is
an image in your model field of view, object detection allows it to identify
objects. Object identification enables the model to take a large set of data and
then frames them to detect a pattern recognition. It is akin to facial
recognition since it looks for patterns within a given field of view.
Image segmentation - The model will associate a specific image or video
pixel with a previously encountered pixel. This association depends on the
concept of a most likely scenario based on the frequency of association
between a particular pixel and a corresponding specific predetermined set.
Saliency detection - In this case, it will involve that you train and get your
model accustomed to increase its visibility. For instance, advertisements are
best at locations with higher human traffic. Hence, your model will learn to
place itself at positions of maximum social visibility. This computer vision
feature will naturally attract human attention and curiosity.
4. Supervised learning
You achieve supervised learning by having the models teach themselves by
using targeted examples. If you wanted to show the models how to recognize
a given task, then you would label the dataset for that particular supervised
task. You will then present the model with the set of labeled examples and
monitor its learning through supervision.
The models get to learn themselves through constant exposure to the correct
patterns. You want to promote brand awareness; you could apply supervised
learning where the model leans by using the product example and mastering
its art of advertisement.
5. Unsupervised learning
This learning style is the opposite of supervised learning. In this case, your
models learn through observations. There is no supervision involved, and the
datasets are not labeled; hence, there is no correct base value as learned from
the supervised method.
Here, through constant observations, your models will get to determine their
right truths. Unsupervised models most often learn through associations
between different structures and elemental characteristics common to the
datasets. Since unsupervised learning deals with similar groups of related
datasets, they are useful in clustering.
6. Reinforcement learning
Reinforcement learning teaches your model to strive for the best result
always. In addition to only performing its assigned tasks correctly, the model
gets rewarded with a treat. This learning technique is a form of
encouragement to your model to always deliver the correct action and
perform it well or to the best of its ability. After some time, your model will
learn to expect a present or favor, and therefore, the model will always strive
for the best outcome.
This example is a form of positive reinforcement. It rewards good behavior.
However, there is another type of support called negative reinforcement.
Negative reinforcement aims to punish or discourage bad behavior. The
model gets reprimanded in cases where the supervisor did not meet the
expected standards. The model learns as well that lousy behavior attracts
penalties, and it will always strive to do good continually.
Chapter 7: Lists and Sets Python
Lists
We create a list in Python by placing items called elements inside square
brackets separated by commas. The items in a list can be of a mixed data
type.
Start IDLE.
Navigate to the File menu and click New Window.
Type the following:
list_mine= [] #empty list list_mine= [2,5,8] #list of integers
list_mine= [5,” Happy”, 5.2] #list having mixed data types
Practice Exercise
Write a program that captures the following in a list: “Best”, 26,89,3.9
Nested Lists
A nested list is a list as an item in another list.
Example
Start IDLE.
Navigate to the File menu and click New Window.
Type the following: list_mine= [“carrot”, [9, 3, 6], [‘g’]]
Practice Exercise
Write a nested for the following elements: [36,2,1],” Writer”,’t’, [3.0, 2.5]
Accessing Elements from a List
In programming and in Python specifically, the first time is always indexed
zero. For a list of five items, we will access them from index0 to index4.
Failure to access the items in a list in this manner will create index error. The
index is always an integer as using other number types will create a type
error. For nested lists, they are accessed via nested indexing.
Example
Start IDLE.
Navigate to the File menu and click New Window.
Type the following:
list_mine=[‘b’,’e’,’s’,’t’] print(list_mine[0]) #the output will be b
print(list_mine[2]) #the output will be s print(list_mine[3]) #the output will
be t
Practice Exercise Given the following list: your_collection=
[‘t’,’k’,’v’,’w’,’z’,’n’,’f’]
✓ Write a Python program to display the second item in the list
✓ Write a Python program to display the sixth item in the last
✓ Write a Python program to display the last item in the list.
Nested List Indexing
Start IDLE.
Navigate to the File menu and click New Window.
Type the following:
nested_list= [“Best’, [4,7,2,9]]
print (nested_list [0][1]
Python Negative Indexing
For its sequences, Python allows negative indexing. The last item on the list
is index-1, index -2 is the second last item, and so on.
Start IDLE.
Navigate to the File menu and click New Window.
Type the following:
list_mine=[‘c’,’h’,’a’,’n’,’g’,’e’,’s’] print (list_mine [-1]) #Output is s print
(list_mine [-4]) ##Output is n
Slicing Lists in Python
Slicing operator (full colon) is used to access a range of elements in a list.
Example
Start IDLE.
Navigate to the File menu and click New Window.
Type the following:
list_mine=[‘c’,’h’,’a’,’n’,’g’,’e’,’s’]
print (list_mine [3:5]) #Picking elements from the 4 to the sixth
Example
Picking elements from start to the fifth Start IDLE.
Navigate to the File menu and click New Window.
Type the following: print (list_mine [: -6])
Example
Picking the third element to the last.
print (list_mine [2:])
Practice Exercise
Given class_names= [‘John’, ‘Kelly’, ‘Yvonne’, ‘Una’,’Lovy’,’Pius’,
‘Tracy’]
✓ Write a python program using a slice operator to display from
the second students and the rest.
✓ Write a python program using a slice operator to display the
first student to the third using a negative indexing feature.
✓ Write a python program using a slice operator to display the
fourth and fifth students only.
Manipulating Elements in a List using the assignment operator
Items in a list can be changed meaning lists are mutable.
Start IDLE.
Navigate to the File menu and click New Window.
Type the following: list_yours= [4,8,5,2,1] list_yours [1] =6
print(list_yours) #The output will be [4,6,5,2,1]
Changing a range of items in a list
Start IDLE.
Navigate to the File menu and click New Window.
Type the following: list_yours [0:3] = [12,11,10] #Will change first item to
fourth item in the list print(list_yours) #Output will be: [12,11,10,1]
Appending/Extending items in the List
The append () method allows extending the items on the list. The extend ()
can also be used.
Example
Start IDLE.
Navigate to the File menu and click New Window.
Type the following: list_yours= [4, 6, 5] list_yours. append (3)
print(list_yours) #The output will be [4,6,5, 3]
Example
Start IDLE.
Navigate to the File menu and click New Window.
Type the following: list_yours= [4,6,5] list_yours. extend ([13,7,9])
print(list_yours) #The output will be [4,6,5,13,7,9]
The plus operator (+) can also be used to combine two lists. The * operator
can be used to iterate a list a given number of times.
Example
Start IDLE.
Navigate to the File menu and click New Window.
Type the following: list_yours= [4,6,5]
print (list_yours+ [13,7,9]) # Output: [4, 6, 5,13,7,9]
print([‘happy’] *4) #Output: [“happy”,” happy”, “happy”,” happy”]
Removing or Deleting Items from a List
The keyword del is used to delete elements or the entire list in Python.
Example
Start IDLE.
Navigate to the File menu and click New Window.
Type the following:
list_mine=[‘t’,’r’,’o’,’g’,’r’,’a’,’m’] del list_mine [1] print(list_mine) #t, o, g,
r, a, m
Deleting Multiple Elements
Example
Start IDLE.
Navigate to the File menu and click New Window.
Type the following: del list_mine [0:3]
Example
print(list_mine) #a, m
Delete Entire List Start IDLE.
Navigate to the File menu and click New Window.
Type the following:
delete list_mine
print(list_mine) #will generate an error of lost not found
The remove () method or pop () method can be used to remove the specified
item. The pop () method will remove and return the last item if the index is
not given and helps implement lists as stacks. The clear () method is used to
empty a list.
Start IDLE.
Navigate to the File menu and click New Window.
Type the following: list_mine=[‘t’,’k’,’b’,’d’,’w’,’q’,’v’]
list_mine.remove(‘t’) print(list_mine) #output will be
[‘t’,’k’,’b’,’d’,’w’,’q’,’v’] print(list_mine.pop(1)) #output will be ‘k’
print(list_mine.pop()) #output will be ‘v’
Practice Exercise
Given list_yours=[‘K’,’N’,’O’,’C’,’K’,’E’,’D’]
✓ Pop the third item in the list, save the program as list1.
✓ Remove the fourth item using remove () method and save the
program as list2
✓ Delete the second item in the list and save the program as list3.
✓ Pop the list without specifying an index and save the program as
list4.
Using Empty List to Delete an Entire or Specific Elements
Start IDLE.
Navigate to the File menu and click New Window.
Type the following: list_mine=[‘t’,’k’,’b’,’d’,’w’,’q’,’v’] list_mine= [1:2] =
[]
print(list_mine) #Output will be [‘t’,’w’,’q’,’v’]
Practice Exercise
➢ Use list access methods to display the following items in reversed
order list_yours= [4,9,2,1,6,7]
➢ Use list access method to count the elements in a.
➢ Use list access method to sort the items in a. in an ascending
order/default.
Summary
Lists store an ordered collection of items which can be of different types. The
list defined above has items that are all of the same type (int), but all the
items of a list do not need to be of the same type as you can see below.
# Define a list
heterogenousElements = [3, True, 'Michael', 2.0]
Sets
The attributes of a set are that it contains unique elements, the items are not
ordered, and the elements are not changeable. The set itself can be changed.
Creating a set
Example
Start IDLE.
Navigate to the File menu and click New Window.
Type the following: set_mine= {5,6,7} print(set_mine)
set_yours= {2.1,” Great”, (7,8,9)} print(set_mine)
Creating a Set from a List
Example
Start IDLE.
Navigate to the File menu and click New Window.
Type the following: set_mine=set ([5,6,7,5]) print(set_mine) Practice
Exercise Start IDLE.
Navigate to the File menu and click New Window.
Type the following:
Correct and create a set in Python given the following set, trial_set=
{1,1,2,3,1,5,8,9}
Note
The {} will create a dictionary that is empty in Python. There is no need to
index sets since they are ordered.
Adding elements to a set for multiple members we use the update () method.
For a single addition of a single element to a set, we use the add () method.
Duplicates should be avoided when handling sets.
Example
Start IDLE.
Navigate to the File menu and click New Window.
Type the following: your_set={6,7} print(your_set) your_set.add(4)
print(your_set) your_set.update([9,10,13]) print(your_set)
your_set.update([23, 37],{11,16,18}) print(your_set)
Removing Elements from a Set
The methods discard (0 and remove () are used to purge an item from a set.
Example
Start IDLE.
Navigate to the File menu and click New Window.
Type the following: set_mine= {7,2,3,4,1} print(set_mine) set_mine. discard
(2) print(set_mine) #Output will be {7,3,4,1} set_mine. remove (1)
print(set_mine) #Output will be {7,3,4}
Using the pop () Method to Remove an Item from a Set
Since sets are unordered, the order of popping items is arbitrary.
It is also possible to remove all items in a set using the clear () method in
Python.
Start IDLE.
Navigate to the File menu and click New Window.
Type the following: your_set=set(“Today”) print(your_set) print
(your_set.pop ()) your_set.pop () print(your_set) your_set. clear ()
print(your_set)
Set Operations in Python
We use sets to compute difference, intersection, and union of sets.
Example
Start IDLE.
Navigate to the File menu and click New Window.
Type the following:
C= {5,6,7,8,9,11} D= {6,9,11,13,15}
Set Union
A union of sets C and D will contain both sets’ elements.
In Python the| operator generates a union of sets. The union () will also
generate a union of sets.
Example
Start IDLE.
Navigate to the File menu and click New Window.
Type the following:
C= {5,6,7,8,9,11} D= {6,9,11,13,15}
print(C|D) #Output: {5,6,7,8,9,11,13,15}
Example 2
Using the union () Start IDLE.
Navigate to the File menu and click New Window.
Type the following:
C= {5,6,7,8,9,11} D= {6,9,11,13,15}
print (D. union(C)) #Output: {5,6,7,8,9,11,13,15}
Practice Exercise
Rewrite the following into a set and find the set union.
A= {1,1,2,3,4,4,5,12,14,15}
D= {2,3,3,7,8,9,12,15}
Set Intersection
A and D refer to a new item set that is shared by both sets. The & operator is
used to perform intersection. The intersection () function can also be used to
intersect sets.
Example
Start IDLE.
Navigate to the File menu and click New Window.
Type the following:
A = {11, 12, 13, 14, 15}
D= {14, 15,16, 17, 18}
Print(A&D) #Will display {14,15}
Using intersection ()
Example
Start IDLE.
Navigate to the File menu and click New Window.
Type the following:
A = {11, 12, 13, 14, 15}
D= {14, 15,16, 17, 18}
A. intersection(D)
Chapter 8: Conditions Statements
Else Statements
Else statements are used in conjunction with “if” statements. They are used to
perform alternative statements if the preceding “if” statement returns False.
In the previous example, if the userAge is equal or greater than 18, the
expression in the “if” statement will return False. And since the expression
returns False on the “if” statement, the statements in the else statement will
be executed.
On the other hand, if the userAge is less than 18, the expression in the “if”
statement will return True. When that happens, the statements within the “if”
statement will be executed while those in the else statement will be ignored.
Mind you, an else statement has to be preceded by an “if” statement. If there
is none, the program will return an error. Also, you can put an else statement
after another else statement as long as it precedes an “if” statement.
In summary:
1. If the “if” statement returns True, the program will skip the
else statement that follows.
Code Blocks
Just to jog your memory, code blocks are simply groups of statements or
declarations that follow if and else statements.
Creating code blocks is an excellent way to manage your code and make it
efficient. You will mostly be working with statements and scenarios that will
keep you working on code blocks.
Aside from that, you will learn about the variable scope as you progress. For
now, you will mostly be creating code blocks “for” loops.
Loops are an essential part of programming. Every program that you use and
see use loops.
Loops are blocks of statements that are executed repeatedly until a condition
is met. It also starts when a condition is satisfied.
By the way, did you know that your monitor refreshes the image itself 60
times a second? Refresh means displaying a new image. The computer itself
has a looping program that creates a new image on the screen.
You may not create a program with a complex loop to handle the display, but
you will definitely use one in one of your programs. A good example is a
small snippet of a program that requires the user to login using a password.
For example:
>>> password = "secret"
>>> user Input = ""
>>> while (userInput! = password):
userInput = input ()
This example will ask for a user input. On the text cursor, you need to type
the password and then press the Enter key. The program will keep on asking
for a user input until you type the word secret.
While
Loops are easy to code. All you need is the correct keyword, a conditional
value, and statements you want to execute repeatedly.
One of the keywords that you can use to loop is while. While is like an “if”
statement. If its condition is met or returns True, it will start the loop. Once
the program executes the last statement in the code block, it will recheck the
while statement and condition again. If the condition still returns True, the
code block will be executed again. If the condition returns False, the code
block will be ignored, and the program will execute the next line of code. For
example
>>> i = 1
>>> while i < 6:
print(i)
i += 1
1
2
3
4
5
>>> _
For Loop
While the while loop statement loops until the condition returns false, the
“for” loop statement will loop at a set number of times depending on a string,
tuple, or list. For example:
>>> carBrands = ["Toyota", "Volvo", "Mitsubishi", "Volkswagen"]
>>> for brands in carBrands:
print(brands)
Toyota
Volvo
Mitsubishi
Volkswagen
>>> _
Break
Break is a keyword that stops a loop. Here is one of the previous examples
combined with break.
For example:
>>> password = "secret"
>>> userInput = ""
>>> while (userInput! = password):
userInput = input ()
break
print ("This will not get printed.")
Wrongpassword
>>> _
As you can see here, the while loop did not execute the print keyword and did
not loop again after an input was provided since the break keyword came
after the input assignment.
The break keyword allows you to have better control of your loops. For
example, if you want to loop a code block in a set amount of times without
using sequences, you can use while and break.
>>> x = 0
>>> while (True):
x += 1
print(x)
if (x == 5):
break
1
2
3
4
5
>>> _
Using a counter, variable x (any variable will do of course) with an integer
that increments every loop in this case, condition and break is common
practice in programming. In most programming languages, counters are even
integrated in loop statements. Here is a “for” loop with a counter in
JavaScript.
for (i = 0; i < 10; i++) {
alert(i);
}
This script will loop for ten times. On one line, the counter variable is
declared, assigned an initial value, a conditional expression was set, and the
increments for the counter are already coded.
Infinite Loop
You should be always aware of the greatest problem with coding loops:
infinity loops. Infinity loops are loops that never stop. And since they never
stop, they can easily make your program become unresponsive, crash, or hog
all your computer’s resources. Here is an example similar with the previous
one but without the counter and the usage of break.
>>> while (True):
print ("This will never end until you close the program")
This will never end until you close the program
This will never end until you close the program
This will never end until you close the program
Whenever possible, always include a counter and break statement in your
loops. Doing this will prevent your program from having infinite loops.
Continue
The continue keyword is like a soft version of break. Instead of breaking out
from the whole loop, “continue” just breaks away from one loop and directly
goes back to the loop statement. For example:
>>> password = "secret"
>>> userInput = ""
>>> while (userInput! = password):
userInput = input ()
continue
print ("This will not get printed.")
Wrongpassword
Test
secret
>>> _
When this example was used on the break keyword, the program only asks
for user input once regardless of anything you enter and it ends the loop if
you enter anything. This version, on the other hand, will still persist on
asking input until you put the right password. However, it will always skip on
the print statement and always go back directly to the while statement.
Here is a practical application to make it easier to know the purpose of the
continue statement.
>>> carBrands = ["Toyota", "Volvo", "Mitsubishi", "Volkswagen"]
>>> for brands in carBrands:
if (brands == "Volvo"):
continue
print ("I have a " + brands)
I have a Toyota
I have a Mitsubishi
I have a Volkswagen
>>> _
When you are parsing or looping a sequence, there are items that you do not
want to process. You can skip the ones you do not want to process by using a
continue statement. In the above example, the program did not print “I have a
Volvo”, because it hit continue when a Volvo was selected. This caused it to
go back and process the next car brand in the list.
Practice Exercise
For this chapter, create a choose-your-adventure program. The program
should provide users with two options. It must also have at least five choices
and have at least two different endings.
You must also use dictionaries to create dialogues.
Here is an example:
creepometer = 1
prompt = "\nType 1 or 2 then press enter...\n\n: :> "
clearScreen = ("\n" * 25)
scenario = [
"You see your crush at the other side of the road on your way to school.",
"You notice that her handkerchief fell on the ground.",
"You heard a ring. She reached on to her pocket to get her phone and
stopped.",
"Both of you reached the pedestrian crossing, but its currently red light.",
"You got her attention now and you instinctively grabbed your phone."
]
choice1 = [
"Follow her using your eyes and cross when you reach the intersection.",
"Pick it up and give it to her.",
"Walk pass her.",
"Smile and wave at her.",
"Ask for her number."
Chapter 9: Iteration
Well, how can you write a code that can count to 10,000? Are you going to
copy-paste and change the 10, 000 printing statements? You can but that is
going to be tiresome. But counting is a common thing and computers count
large values. So, there must be an efficient way to do so. What you need to do
is to print the value of a variable and start to increment the variable, and
repeat the process until you get 10,000. This process of implementing the
same code, again and again, is known as looping. In Python, there are two
unique statements, while and for, that support iteration.
Here is a program that uses while statement to count to five:
The while statement used in this particular program will repeatedly output the
variable count. The program then implements this block of statement five
times:
After every display of the count variable, the program increases it by one.
Finally, after five repetitions, the condition will not be true, and the block of
code is not executed anymore.
The word while is a Python reserved word that starts the statement.
The condition shows whether the body will be executed or not. A colon (:)
has to come after the condition.
A block is made up of one or more statements that should be implemented if
the condition is found to be true. All statements that make up the block must
be indented one level deeper than the first line of the while statement.
Technically, the block belongs to the while statement.
The while statement can resemble the if statements and thus new
programmers may confuse the two. Sometimes, they may type if when they
wanted to use while. Often, the uniqueness of the two statements shows the
problem instantly. But in some nested and advanced logic, this error can be
hard to notice.
The running program evaluates the condition before running the while block
and then confirms the condition after running the while block. If the
condition remains true, the program will continuously run the code in the
while block. If initially, the condition is true, the program will run the block
iteratively until when the condition is false. This is the point when the loop
exits from execution. Below is a program that will count from zero as long as
the user wants
it to
do.
Here is another program that will let the user type different non-negative
integers. If the user types a negative value, the program stops to accept inputs
and outputs the total of all nonnegative values. In case a negative number is
the first
entry, the sum will be zero.
Entry
At the start, you will initialize the entry to zero because we want the
condition entry >=0 of the while statement to be true. Failure to initialize the
variable entry, the program will generate a run-time error when it tries to
compare entry to zero in the while condition. The variable entry stores the
number typed by the user. The value of the variable entry changes every time
inside the loop.
Sum
This variable is one that stores the total of each number entered by the user.
For this particular variable, it is initialized to zero in the start because a value
of zero shows that it has not evaluated anything. If you don’t initialize the
variable sum, the program will also generate a run-time error when it tries to
apply the +- operator to change the variable. Inside the loop, you can
constantly add the user’s input values to sum. When the loop completes, the
variable sum will feature the total of all nonnegative values typed by the
user.
The initialization of the entry to zero plus the condition entry >= 0 of the
whiles ensures that the program will run the body of the while loop only
once. The if statement confirms that the program won’t add a negative entry
to the sum.
When a user types a negative value, the running program may not update the
sum variable and the condition of the while will not be true. The loop exits
and the program implements the print statement.
This program doesn’t store the number of values typed. But it adds the values
entered in the variable sum.
A while block occupies a huge percent of this program. The program has a
Boolean variable done that regulates the loop. The loop will continue to run
as long as done is false. The name of this Boolean variable called a flag.
Now, when the flag is raised, the value is true, if not, the value is false.
Don’t forget the not done is the opposite of the variable done.
Definite and Indefinite Loops
Let us look at the following code:
We examine this code and establish the correct number of iterations inside
the loop. This type of loop is referred to as a definite loop because we can
accurately tell the number of times the loop repeats.
Now, take a look at the following code:
In this code, it is hard to establish the number of times it will loop. The
number of repetitions relies on the input entered by the user. But it is possible
to know the number of repetitions the while loop will make at the point of
execution after entering the user’s input before the next execution begins.
For that reason, the loop is said to be a definite loop.
Now compare the previous programs with this one:
For this program, you cannot tell at any point inside the loop’s execution the
number of times the iterations can run. The value 999 is known before and
after the loop but the value of the entry can be anything the user inputs. The
user can decide to input 0 or even 999 and end it. The while statement in this
program is a great example of an indefinite loop.
So, the while statement is perfect for indefinite loops. While these examples
have applied the while statements to demonstrate definite loops, Python has a
better option for definite loops. That is none other than the for statement.
The for Statement
The while loop is perfect for indefinite loops. This has been demonstrated in
the previous programs, where it is impossible to tell the number of times the
while loop will run. Previously, the while loop was used to run a definite loop
such as:
In the following code snippet, the print statement will only run 10 times. This
code demands three important parts to control the loop:
Initialization
Check
Update
Python language has an efficient method to demonstrate a definite loop. The
for statement repeats over a series of values. One method to demonstrate a
series is to use a tuple. For example:
This code works the same way as the while loop is shown earlier. In this
example, the print statement runs 10 times. The code will print first 1, then 2,
and so forth. The last value it prints is 10.
It is always tedious to display all elements of a tuple. Imagine going over all
the integers from 1 to 1, 000, and outputting all the elements of the tuple in
writing. That would be impractical. Fortunately, Python has an efficient
means of displaying a series of integers that assume a consistent pattern.
This code applies the range expression to output integers between 1-10.
The range expression (1,11) develops a range object that will let the for loop
to allocate the variable n the values 1, 2, ….10.
The line of code in this code snippet is interpreted as “for every integer n in
the range 1 ≤ n < 11.” In the first execution of the loop, the value of n is 1
inside the block. In the next iteration of the loop, the value of n is 2. The
value of n increases by one for each loop. The code inside the block will
apply the value of n until it hits 10. The general format for the range
expression goes as follows:
This means that you can use the range to display a variety of sequences.
For range expressions that have a single argument like range(y), the y is the
end of the range, while 0 is the beginning value, and then 1 the step value.
For expressions carrying two arguments like range (m, n), m is the begin
value, while y is the end of the range. The step value becomes 1.
For expressions that have three arguments like range (m, n, y), m is the begin
value, n is the end, and y is the step value.
When it comes to a for loop, the range object has full control on selecting the
loop variable each time via the loop.
If you keep a close eye on older Python resources or even online Python
example, you are likely to come across the xrange expression. Python version
2 has both the range and xrange. However, Python 3 doesn’t have the xrange.
The range expression of Python 3 is like the xrange expression in Python 2.
In Python 2, the range expression builds a data structure known as a list and
this process can demand some time for a running program. In Python 2, the
xrange expression eliminates the additional time. Hence, it is perfect for a big
sequence. When creating loops using the for statement, developers of Python
2 prefer the xrange instead of the range to optimize the functionality of the
code.
Chapter 10: Functions and Control Flow Statements in Python
For example, the code that calls the Useforprint function in the above section
is as follows:
# After the function is defined, the function will not be executed
automatically and needs to be called
Useforprint ()
Parameters of Function
Before introducing the parameters of the function, let's first solve a problem.
For example, it is required to define a function that is used to calculate the
sum of two numbers and print out the calculated results. Convert the above
requirements into codes.
For every programmer, the beginning is always the biggest hurdle. Once you
set your mind to things and start creating a program, things automatically
start aligning. The needless information is automatically omitted by your
brain through its cognitive powers and understanding of the subject matter.
All that remains then is a grey area that we discover further through various
trials and errors.
There is no shortcut to learn to program in a way that will let you type codes
100% correctly, without a hint of an error, at any given time. Errors and
exceptions appear even for the best programmers on earth. There is no
programmer that I know of personally who can write programs without
running into errors. These errors may be as simple as forgetting to close
quotation marks, misplacing a comma, passing the wrong value, and so on.
Expect yourself to be accompanied by these errors and try to learn how to
avoid them in the long run. It takes practice, but there is a good chance you
will end up being a programmer who runs into these issues only rarely.
We were excited when we began this workbook. Then came some arduously
long tasks which quickly turned into irritating little chores that nagged us as
programmers and made us think more than we normally would. There were
times where some of us even felt like dropping the whole idea of being a
programmer in the first place. But, every one of us who made it to this page,
made it through with success.
Speaking of success, always know that your true success is never measured
properly nor realized until you have hit a few failures along the road. It is a
natural way of learning things. Every programmer, expert, or beginner, is
bound to make mistakes. The difference between a good programmer and a
bad one is that the former would learn and develop the skills while the latter
would just resort to Google and locate an answer.
If you have chosen to be a successful Python programmer, know that there
will be some extremely trying times ahead. The life of a programmer is rarely
socially active, either unless your friend circle is made up of programmers
only. You will struggle to manage your time at the start, but once you get the
hang of things, you will start to perform exceptionally well. Everything will
then start aligning, and you will begin to lead a more relaxed lifestyle as a
programmer and as a human being.
Until that time comes, keep your spirits high and always be ready to
encounter failures and mistakes. There is nothing to be ashamed of when
going through such things. Instead, look back at your mistakes and learn from
them to ensure they are not repeated in the future. You might be able to make
programs even better or update the ones which are already functioning well
enough.
Lastly, let me say it has been a pleasure to guide you through both these
books and to be able to see you convert from a person who had no idea about
Python to a programmer who now can code, understand and execute matters
at will. Congratulations are in order. Here are digital cheers for you!
Print (“Bravo, my friend!”)
I wish you the best of luck for your future and hope that one day, you will
look back on this book and this experience as a life-changing event that led to
a superior success for you as a professional programmer. Do keep an eye out
for updates and ensure you visit the forums and other Python communities to
gain the finest learning experience and knowledge to serve you even better
when stepping into the more advanced parts of Python.
Python for data science:
DATA ANALYSIS AND DEEP LEARNING WITH PYTHON
CODING AND PROGRAMMING. THE LATEST BEGINNER’S
GUIDE WITH PRACTICAL APPLICATIONS ON MACHINE
LEARNING AND ARTIFICIAL INTELLIGENCE.
William Wizner
© Copyright 2020 - All rights reserved.
The content contained within this book may not be reproduced, duplicated, or transmitted without
direct written permission from the author or the publisher.
Under no circumstances will any blame or legal responsibility be held against the publisher, or author,
for any damages, reparation, or monetary loss due to the information contained within this book. Either
directly or indirectly.
Legal Notice:
This book is copyright protected. This book is only for personal use. You cannot amend, distribute, sell,
use, quote or paraphrase any part, or the content within this book, without the consent of the author or
publisher.
Disclaimer Notice:
Please note the information contained within this document is for educational and entertainment
purposes only. All effort has been executed to present accurate, up to date, and reliable, complete
information. No warranties of any kind are declared or implied. Readers acknowledge that the author is
not engaging in the rendering of legal, financial, medical, or professional advice. The content within
this book has been derived from various sources. Please consult a licensed professional before
attempting any techniques outlined in this book.
By reading this document, the reader agrees that under no circumstances is the author responsible for
any losses, direct or indirect, which are incurred as a result of the use of the information contained
within this document, including, but not limited to, — errors, omissions, or inaccuracies.
Introduction:
In this Book, we will lay down the foundational concepts of data science,
starting with the term ‘big data.’ As we move along, we will steer the focus
of our discussion towards the recognition of what exactly is data science and
the various types of data we normally deal with within this field. By doing so,
the readers will be able to gather a much-needed insight on the processes
surrounding the niche of data science and, consequently, easily understand
the concepts we put forward in this regarding the fields of data science and
big data. After the theoretical explanatory sections, the book will conclude on
working out some basic and common examples of Hadoop.
When handling data, the most common, traditional, and widely used
management technique is the ‘Relational Database Management Systems,’
also known as ‘RDBMS.’ This technique applies to almost every dataset as it
easily meets the dataset’s required demands of processing; however, this is
not the case for ‘Big Data.’ Before we can understand why such management
techniques fail to process big data, we need first to understand what does the
term ‘Big Data’ refers to. The name itself gives away a lot of the information
regarding the data natures. Nevertheless, big data is a term that is used to
define a collection of datasets that are very large and complex in size alone.
Such datasets become difficult to process using traditional data management
techniques and, thus, demand a new approach for handling them, as it is
evident from the fact that the commonly used technique RDBMS has zero
working compatibility with big data.
The core of data science is to employ methods and techniques that are the
most suitable for the analysis of the sample dataset so that we can take out the
essential bits of information contained in it. In other words, big data is like a
raw mineral ore containing a variety of useful materials. Still, in its current
form, its contents are unusable and no use to us. Data science is the refinery
which essentially uses effective techniques to analyze this ore and then
employ corresponding methods to extract its contents for us to use.
The world of big data is exponentially vast, and the use of data science with
big data can be seen in almost every sector of the modern age, be it
commercial, non-commercial, business, or even industrial settings. For
instance, in a commercial setting, the corresponding companies use the data
science and big data elements to chiefly get a better insight into the demands
of their customers and information regarding the efficiency of their products,
staff, manufacturing processes, etc. Consider Google’s advertising
department AdSense; it employs data science to analyze the big data (which
is a collection of user internet data) to extract information to ensure that the
person browsing the internet is seeing relevant advertisements. The uses of
data science extend far and beyond what we can imagine. It is not possible to
list all of its advantageous uses currently being employed in the modern-day.
However, what we do know is that the majority of the datasets gathered by
big companies all around the world are none other than big data. Data science
is essential for these companies to analyze this data and benefit from the
information it contains. Not only that, big educational institutions like
Universities and research work also benefits from data science.
While venturing across the field of data science, you will soon come to
realize that there is not one defined type of data. Instead, there are multiple
categories under which data is classified, and each category of data requires
an entirely different toolset to be processed.
Following are the seven major categories of data:
1. Structured Data
2. Unstructured Data
5. Graph-based Data
We must not confuse the terms ‘graph’ and ‘graph theory.’ The first one
represents the geometrical representation of data in a graph, and any data can
be made into a graph, but that does not necessarily change the nature of the
data. The latter refers to the mathematical structure, which essentially is a
model that connects the objects into a pair based on their inherent
relationship with each other. Hence, we can also term such categories of data
as Network data. This type of data emphasizes elements such as the
adjacency and relationship of objects and the common structures found in
graphs found in graph-based data are:
Nodes
Edges
Properties
Graph-based data is most commonly seen on social media websites. Here’s
an example of a graph-based data representing many friends on a social
network.
Now that we have been able to spend some time taking a look at the ideas of
python and what we can do with that coding language, it is time for us to
move on to some of the things that we can do with all of that knowledge and
all of the codes that we are looking. We are going to take a look here to see
more about data analysis, and how we can use this to help us see some good
results with our information as well.
Companies have spent a lot of time taking a look at data analysis and what it
has been able to do for them. Data are all around us, and it seems like each
day, tons of new information is available for us to work with regularly.
Whether you are a business trying to learn more about your industry and your
customers, or just an individual who has a question about a certain topic, you
will be able to find a wealth of information to help you get started.
Many companies have gotten into a habit of gathering up data and learning
how to make them work for their needs. They have found that there are a lot
of insights and predictions inside these data to make sure that it is going to
help them out in the future. If the data are used properly, and we can gain a
good handle of those data, they can be used to help our business become
more successful.
Once you have gathered the data, there is going to be some work to do. Just
because you can gather up all of that data doesn’t mean that you will be able
to see what patterns are inside. This is where the process of data analysis is
going to come into play to help us see some results as well. This is a process
that is meant to ensure that we fully understand what is inside of our data and
can make it easier to use all of that raw data to make some informed and
smart business decisions.
To make this a bit further, data analysis is going to be a practice where we
can take some of the raw data that our business has been collecting, and then
organize and order it to ensure that it can be useful. During this process, the
information that is the most useful is extracted and then used from that raw
data.
The one thing that we need to be careful about when we are working with
data analysis, though, is to be careful about the way that we manipulate the
data that we have. It is really easy for us to go through and manipulate the
data in the wrong way during the analysis phase, and then end up pushing
certain conclusions or agendas that are not there. This is why we need to pay
some close attention to when the data analysis is presented to us and to think
critically about the data and the conclusions that we were able to get out of
it.
If you are worried about a source that is being done, and if you are not sure
that you can complete this kind of analysis without some biases in it, then it
is important to find someone else to work on it or choose a different source.
There is a lot of data out there, and it can help your business to see some
results, but you have to be careful about these biases, or they will lead us to
the wrong decisions in the end if we are not careful.
Besides, you will find that during the data analysis, the raw data that you will
work with can take on a variety of forms. This can include things like
observations, survey responses, and measurements, to name a few. The
sources that you use for this kind of raw data will vary based on what you are
hoping to get out of it, what your main question is all about, and more.
In its raw form, the data that we are gathering is going to be very useful to
work with, but you may find that it is a bit overwhelming to work with as
well. This is a problem that a lot of companies are going to have when they
work with data analysis and something that you will have to spend some time
exploring and learning more about, as well.
Over the time that you spend on data analysis and all of the steps that come
with the process, the raw data are going to be ordered in a manner that makes
it as useful to you as possible. For example, we may send out a survey and
then will tally up the results that we get. This is going to be done because it
helps us to see at a glance how many people decided to answer the survey at
all, and how people were willing to respond to some of the specific questions
that were on that survey.
In the process of going through and organizing the data, a trend is likely
going to emerge, and sometimes more than one trend. And we are going to be
then able to take some time to highlight these trends, usually in the write-up
that is being done on the data. This needs to be highlighted because it ensures
that the person who is reading that information is going to take note.
There are a lot of places that we are going to see this. For example, in a
casual kind of survey that we may try to do, you may want to figure out the
preferences between men and women of what ice cream flavors they like the
most. In this survey, maybe we find out that women and men are going to
express a fondness for chocolate. Depending on who is using this information
and what they are hoping to get out of that information, it could be something
that the researcher is going to find very interesting.
Modeling the data that is found out of the survey, or out of another form of
data analysis, with the use of mathematics and some of the other tools out
there, can sometimes exaggerate the points of interest, such as the ice cream
preferences from before, in our data, which is going to make it so much
easier for anyone who is looking over the data, especially the researcher, to
see what is going on there.
In addition to taking a look at all of the data that you have collected and
sorted through, you will need to do a few other parts as well. These are all
meant to help the person who needs this information to read through it and
see what is inside and what they can do with all of that data. It is the way that
they can use the information to see what is going on, the complex
relationships that are there, and so much more.
This means that we need to spend our time with some write-ups of the data,
graphs, charts, and other ways to represent and show the data to those who
need it the most. This will form one of the final steps that come with data
analysis. These methods are designed in a manner to distill and refine the
data so that the readers are then able to glean some of the interesting
information from it, without having to go back through the raw data and
figure out what is there all on their own.
Summarizing the data in these steps is going to be critical, and it needs to be
done in a good and steady manner as well. Doing this is going to be critical to
helping to support some of the arguments that are made with that data, as is
presenting the data clearly and understandably. During this phase, we have to
remember that it is not always possible that the person who needs that
summary and who will use it to make some important decisions for the
business will be data scientists. They need it all written out in a simple and
easy to understand this information. This is why the data has to be written out
in a manner that is easy to understand and read through.
Often this is going to be done with some sort of data visualization. There are
many choices of visuals that we can work with, and working with some kind
of graph or chart is a good option as well. Working with the method that is
the best for your needs and the data that we are working with is going to be
the best way to determine the visual that is going to be the best for you.
Many times, reading through information that is in a more graphical format is
going to be easier to work with than just reading through the data and hoping
it to work the best way possible. You could just have it all in a written form if
you would like, but this is not going to be as easy to read through nor as
efficient. To see some of those complex relationships quickly and efficiently,
working with a visual is going to be one of the best options to choose from.
Even though we need to spend some time working with a visual of the data to
make it easier to work with and understand, it is fine to add in some of the
raw data as the appendix, rather than just throwing it out. This allows the
person who is going to work with that data regularly a chance to check your
resources and your specific numbers and can help to bolster some of the
results that you are getting overall.
If you are the one who is getting the results of the data analysis, make sure
that when you get the conclusions and the summarized data from your data
scientist that you go through and view them more critically. You should take
the time to ask where the data comes from is going to be important, and you
should also take some time to ask about the method of sampling that was
used for all of this as well when the data was collected. Knowing the size of
the sample is important as well.
Chapter 2: The Basics of the Python Language
Python language is one of the best coding languages that you can start
handling for your first data science project. This is a fantastic language that
capable to take on all of the work that you want to do with data science and
has the power that is needed to help create some great machine learning
algorithms. With that said, it is still a great option for beginners because it
has been designed to work with those who have never done programming
before. While you can choose to work with the R programming language as
well, you will find that the Python language is one of the best options because
of its ease of use and power that combines.
Before we dive into how Python can work with some of the things that you
would like to do with data science, we first need to take some time to look at
the basics of the Python language. Python is a great language to look through,
and you will be able to learn how to do some of the codings that you need to
in no time. Some of the different types of coding that you can do with the
Python language will include:
The Statements
The first thing that we are going to take a moment to look through when it
comes to our Python language is the keywords. This is going to focus on the
lines or sentences that you would like to have the compiler show up on your
screen. You will need to use some of the keywords that we will talk about
soon, and then you can tell the compiler what statements to put up on the
screen. If you would like to leave a message on the screen such as what we
can do with the Hello, World! The program, you will need to use that as your
statement, and the print keyword, so the compiler knows how to behave.
The Python Operators
We can also take some time to look at what is known as the Python operators.
These are often going to get ignored when it comes time to write out codes
because they don’t seem like they are that important. But if you skip out on
writing them, they are going to make it so that your code will not work the
way that you would like. We can focus on several different types of Python
operators, so making sure that you know what each kind is all about, and
when to add these into your code will make a world of difference as well.
The Keywords
The keywords are another important part of our Python code that we need to
take a look at. These are going to be the words that we need to reserve
because they are responsible for giving the compiler the instructions or the
commands that you would like for it to use. These key words ensure that the
code is going to perform the way that you would like it for the whole time.
These keywords need to be reserved, so make sure that you are not using
them in the wrong places. If you do not use these keywords in the right
manner, or you don’t put them in the right place, then the compiler is going to
end up with some issues understanding what you would like it to do, and you
will not be able to get the results that you want. Make sure to learn the
important keywords that come with the Python language and learn how to put
them in the right spot of your code to get the best results with it.
Working with Comments
As we work with the Python coding, there are going to be times when we
need to spend our time working with something that is known as a comment.
This is going to be one of the best things that we can do to make sure that we
can name a part of the code, or when we want to leave a little note for
yourself or another programmer, then you are going to need to work with
some of the comments as well.
These comments are going to be a great option to work with. They are going
to allow you to leave a nice message in the code, and the compiler will know
that it should just skip over that part of the code, and not read through it at all.
It is as simple as that and can save you a lot of hassle and work inside of any
code you are doing.
So, any time that you would like to write out a comment inside of your
Python code, you just need to use the # symbol, and then the compiler will
know that it is supposed to skip over that part of the code and not read it. We
can add in as many of these comments as we would like into the code. Just
remember to keep these to the number that is necessary, rather than going
overboard with this, because it ensures that we are going to keep the code
looking as nice and clean as possible.
The Python Class
One thing that is extremely important when it comes to working with Python,
and other similar languages, is the idea that the language is separated into
classes and objects. The objects are meant to fit into the classes that you
create, giving them more organization, and ensuring that the different parts
are going to fit together the way that you would like without trouble. In some
of the older types of programming languages, the organization was not there,
and this caused a lot of confusion and frustration for those who were just
starting.
These classes are simply going to be a type of container that can hold onto
your objects, the ones that you write out, and are based on actual items in the
real world and other parts of the code. You will need to make sure that you
name these classes in the right manner, and then have them listed out in the
code in the right spot to make sure they work and call up the objects that you
need. And placing the right kinds of objects into the right class is going to be
important as well.
You can store anything that you want inside a class that you design, but you
must ensure that things that are similar end up in the same class. The items
don’t have to be identical to each other, but when someone takes a look at the
class that you worked on, they need to be able to see that those objects belong
together and make sense to be together.
For example, you don’t have just to put cars into the same class, but you
could have different vehicles in the same class. You could have items that are
considered food. You can even have items that are all the same color. You
get some freedom when creating the classes and storing objects in those
classes, but when another programmer looks at the code, they should be able
to figure out what the objects inside that class are about and those objects
should share something in common.
Classes are very important when it comes to writing out your code. These are
going to hold onto the various objects that you write in the code and can
ensure that everything is stored properly. They will also make it easier for
you to call out the different parts of your code when you need them for
execution.
How to Name Your Identifiers
Inside the Python language, there are going to be several identifiers that we
need to spend some time on. Each of these identifiers is going to be
important, and they are going to make a big difference in some of the
different parts of the code that you can work with. They are going to come to
us under a lot of different names, but you will find that they are going to
follow the same kinds of rules when it comes to naming them, and that can
make it a lot easier for a beginner to work with as well.
To start with, you can use a lot of different types of characters in order to
handle the naming of the identifiers that you would like to work with. You
can use any letter of the alphabet that you would like, including uppercase
and lowercase, and any combination of the two that you would like. Using
numbers and the underscore symbol is just fine in this process as well.
With this in mind, there are going to be a few rules that you have to
remember when it comes to naming your identifiers. For example, you are
not able to start a name with the underscore symbol or with a number. So,
writing something like 3puppies or _threepuppies would not work. But you
can do it with something like threepuppies for the name. A programmer also
won’t be able to add in spaces between the names either. You can write out
threepuppies or three_puppies if you would like, but do not add the space
between the two of them.
In addition to some of these rules, we need to spend some time looking at one
other rule that is important to remember. Pick out a name for your identifier
that is easy to remember and makes sense for that part of the code. This is
going to ensure that you can understand the name and that you will be able to
remember it later on when you need to call it up again.
Python Functions
Another topic that we are going to take a quick look at here as we work with
the Python language is the idea of the Python functions. These are going to be
a set of expressions that can also be statements inside of your code as well.
You can have the choice to give them a name or let them remain anonymous.
They are often the first-class objects that we can explore as well, meaning
that your restrictions on how to work with them will be lower than we will
find with other class objects.
Now, these functions are very diversified and there are many attributes that
you can use when you try to create and bring up those functions. Some of the
choices that you have with these functions include:
· __doc__: This is going to return the docstring of the function that you
are requesting.
· Func_default: This one is going to return a tuple of the values of your
default argument.
· Func_globals: This one will return a reference that points to the
dictionary holding the global variables for that function.
· Func_dict: This one is responsible for returning the namespace that
will support the attributes for all your arbitrary functions.
· Func_closure: This will return to you a tuple of all the cells that hold
the bindings for the free variables inside of the function.
Chapter 3: Using Pandas
It would be difficult to delve deeper into the technical aspect of data science
and analysis without a refresher course on the basics of data analysis. Come
to think of it, data science, new as it is, is still a generally broad topic of
study. Many books have tried to specifically define what data science and
being a data scientist means. After all, it was voted one of the most highly
coveted jobs this decade, according to surveys done by Google.
Unfortunately, the sheer wide and general variety of data science topics
ranging from Artificial Intelligence to Machine Learning means that it is
difficult to place data science under one large umbrella. Despite the attempt
to define data science, having to clearly define it is a daunting task and one
that shouldn’t be taken lightly.
However, one fact remains about data science that could be consistently said
throughout the various practices of data science: the use of software and
programming basics is just as integral as the analysis of the data. Having the
ability to use and create models and artificially intelligent programs is
integral to the success of having clean, understandable, and readable data.
The discussions you will find in this book will regard the latest and more
advanced topics of interest in the topic of data science, as well as a refresher
course on the basics.
Pandas
The core of Data Science lies in Python. Python is one of the easiest and most
intuitive languages out there. For more than a decade, Python has absolutely
dominated the market when it comes to programming. Python is one of the
most flexible programming languages to date. It is extremely common, and
honestly, it is also one of the more readable languages. As one of the more
popular languages right now, Python is complete with an ever-supporting
community and deep and extensive support modules. If you were to open
GitHub right now, you’d find thousands of repositories filled with millions of
lines of Python code. As flexible programming, python is used for machine
learning, deep learning applications, 2D imagery, and 3D animation.
If you have no experience in Python, then it is best to learn it before
progressing through further sections of this book.
Assuming that you do have a basic understanding of Python and that coding
in this language has almost become natural to you, the following sections will
make more sense. If you have experience in Python, you should at least have
heard about Pandas and Scikit Library.
Essentially, Pandas is a data analysis tool used to manipulate and analyze
data. It is particularly useful as it offers methods to build and create data
structures as well as methods used to manipulate numerical tables and time
series. As an open-source library, the Pandas library is built on top of
NumPy, indicating that Pandas requires the prior installation of NumPy to
operate.
Pandas make use of data frames, which is essentially a two-dimensional array
with labeled axes. It is often used as it provides methods to handle missing
data easily, efficient methods to slice, reshape, merge, and concatenate data
as well as providing us with powerful time series tools to work with.
Learning to write in Pandas and NumPy is essential in the beginning steps of
becoming a Data Scientist.
A Pandas array looks like the sample photo below:
Now, the data frame doesn’t look too difficult to understand, does it? It’s
similar to the product lists you see when you check out the grocery.
This tiny 2x2 data frame is a perfect encapsulation of one of the things that
this has been trying to show. Data Science isn’t as tricky, nor is it as difficult
as some people make it seem because Data Science is simply the process of
making sense of data tables given to you. This process of analyzing and
making sense is something that we’ve been unconsciously practicing for our
whole lives, from us trying to make sense of our personal finance to us
looking at data tables of products that we’re trying to sell.
Let’s dive in further as to how to use this powerful library. As it is one of the
most popular tools for data manipulation and analysis, Pandas data structures
were designed to make data analysis in the real-world significantly easier.
There are many ways to use Pandas, and often, the choices in the
functionality of the program may be overwhelming. In this section, we’ll
begin to shed some light on the subject matter and, hopefully, begin to learn
some Pandas functionality.
Pandas have two primary components that you will be manipulating and
seeing a lot of; these are the Series and the DataFrame. There is not much
difference between these two, besides a series essentially being the
representative of a smaller DataFrame. A series is simply one column of data.
At the same time, a DataFrame is a multi-dimensional table, meaning that it
has multiple combinations of columns and arrows that are made up of a
collection of Series. We can create these DataFrames through many options,
such as lists or tuples, but for this tutorial, we’ll just be using a simple
dictionary.
Let’s create a dictionary that symbolizes the fruit that a customer bought, and
as a value connected to the fruit, the amount that each customer purchases.
data= {
‘apples’: [3,2,0,1],
‘oranges’: [0,3,7,2]
}
Great! We now have our first DataFrame. However, this isn’t accessible to
Pandas yet. For Pandas to be able to access the DataFrame, we need to pass
in the dictionary into the Pandas DataFrame constructor. We simply type in:
customer_purchases=pd. DataFrame(data)
print(purchases)
And it should output something like this:
applesoranges
030
123
207
312
Basically, what happened here was that each (key, value) item in the
dictionary “data” corresponds to a column in the data frame. Understanding
the data that we placed, here it could be said that the first customer bought
three apples and 0 oranges, the second customer bought two apples and three
oranges, the third customer bought no apples and seven oranges, and so on.
The column on the right refers to the index of the item in relation to its
position on the sequence. In programming, counting an index doesn’t begin
with one, as the counting begins, instead, with 0. So, this means that the first
item has an index of zero, the second has an index of one, the third has an
index of two, and so and so forth. We can now call the items in a sequence
based on their index. So, by calling ‘apples [0]’ where we use apples as our
key and then 0 as our index, it should return the value of ‘3’.
However, we can also replace the value of our index. To do that, we input the
following line of code.
purchases =pd. DataFrame (data, index= [‘June’, ‘Robert,’ ‘Lily,’ ‘David’])
print(purchases)
Now, instead of using the index positions to locate the item in the sequence,
we can use the customer’s name to find the order. For this, we could use the
loc function, which is written in this manner: “DataFrame.loc[x]” where
DataFrame is the name of the dataset that you would like to access, and loc is
the location of the item that you would like to access. Essentially, this
function accesses a group of rows and columns through the index or index
names. For example, we can now access June’s orders through the command
purchases.loc[‘June’], which can be found on index 0. This would return the
following:
Apples 3
oranges 0
Name: June dtype: int64
We can learn more about locating, accessing and extracting DataFrames later,
but for now, we should move on to loading files for you to use.
Honestly, the process of loading data into DataFrames is quite simple.
Assuming you already have a DataFrame that you would like to use from an
outside source, the process of creating a DataFrame out of it is much simpler
than loading it into a google drive. However, we will still be using the
purchases dataset as an example of a CSV file. CSV files are comma-
separated value files that allow for data to be used and accessed in a tabular
format. CSV files are basically spreadsheets but with an ending extension of
.csv. These can also be accessed with almost any spreadsheet program, such
as Microsoft Excel or Google Spreadsheets. In Pandas, we can access CSV
files like this:
df=pd. read_csv(‘purchases.csv’)
df
If you input it right, your text editor should output something similar to this:
Unnamed:0 apples ORANGES
0 June 3 0
1 Robert 2 3
2 Lily 0 7
3 David 1 2
What Is Python?
Python is an object-oriented and interpretive computer program language. Its
syntax is simple and contains a set of standard libraries with complete
functions, which can easily accomplish many common tasks. Speaking of
Python, its birth is also quite interesting. During the Christmas holidays in
1989, Dutch programmer Guido van Rossum stayed at home and found
himself doing nothing. So, to pass the "boring" time, he wrote the first
version of Python.
Python is widely used. According to statistics from GitHub, an open-source
community, it has been one of the most popular programming languages in
the past 10 years and is more popular than traditional C, C++ languages, and
C# which is very commonly used in Windows systems. After using Python
for some time, Estella thinks it is a programming language specially designed
for non-professional programmers.
Its grammatical structure is very concise, encouraging everyone to write as
much code as possible that is easy to understand and write as little code as
possible.
Functionally speaking, Python has a large number of standard libraries and
third-party libraries. Estella develops her application based on these existing
programs, which can get twice the result with half the effort and speed up the
development progress.
Feature Extraction
In this step, Richard usually associates relevant data stored in different
places, for example, integrating customer basic information and customer
shopping information through customer ID. Then transform the data and
extract the variables useful for modeling. These variables are called features.
In this process, Estella will use Python's NumPy, SciPy, pandas, and
PySpark.
Model Building
The open-source libraries sci-kit-learn, StatsModels, Spark ML, and
TensorFlow cover almost all the commonly used basic algorithms. Based on
these algorithm bases and according to the data characteristics and algorithm
assumptions, Estella can easily build the basic algorithms together and create
the model she wants.
The above four things are also the four core steps in Data Science. No
wonder Estella, like most other data scientists, chose Python as a tool to
complete his work.
Python Installation
After introducing so many advantages of Python, let's quickly install it and
feel it for ourselves.
Python has two major versions: Python 2 and Python 3. Python 3 is a higher
version with new features that Python 2 does not have. However, because
Python 3 was not designed with backward compatibility in mind, Python 2
was still the main product in actual production (although Python 3 had been
released for almost 10 years at the time of writing this book). Therefore, it is
recommended that readers still use Python 2 when installing completely. The
code accompanying this book is compatible with Python 2 and Python 3.
The following describes how to install Python and the libraries listed in
section
It should be noted that the distributed Machine Learning library Spark ML
involves the installation of Java and Scala, and will not be introduced here for
the time being.
Installation Under Windows
The author does not recommend people to develop under Windows system.
There are many reasons, the most important of which is that in the era of big
data, as mentioned by Estella earlier, data is stored under the Linux system.
Therefore, in production, the programs developed by data scientists will
eventually run in the Linux environment. However, the compatibility
between Windows and Linux is not good, which easily leads to the
development and debugging of good programs under Windows, and cannot
operate normally under the actual production environment.
If the computer the reader uses is a Windows system, he can choose to install
a Linux virtual machine and then develop it on the virtual machine. If readers
insist on using Windows, due to the limitation of TensorFlow under
Windows, they can only choose to install Python 3. Therefore, the tutorial
below this section is also different from other sections, using Python 3.
Anaconda installed several applications under Windows, such as IPython,
Jupyter, Conda, and Spyder. Below we will explain some of them in detail.
Conda
It is a management system for the Python development environment and open
source libraries. If readers are familiar with Linux, Conda is equivalent to
pip+virtualenv under Linux. Readers can list installed Python libraries by
entering "Condolist" on the command line.
Spyder
It is an integrated development environment (IDE) specially designed for
Python for scientific computing. If readers are familiar with the mathematical
analysis software MATLAB, they can find that Spyder and MATLAB are
very similar in syntax and interface.
Installation Under MAC
Like Anaconda's version of Windows, Anaconda's Mac version does not
contain a deep learning library TensorFlow, which needs to be installed using
pip (Python Package Management System). Although using pip requires a
command line, it is very simple to operate and even easier than installing
Anaconda. Moreover, pip is more widely used, so it is suggested that readers
try to install the required libraries with pip from the beginning. The
installation method without Anaconda is described below.
Starting with Mac OS X 10.2, Python is preinstalled on macs. For learning
purposes, you can choose to use the pre-installed version of Python ;
directly. If it is for development purposes, pre-installed Python is easy to
encounter problems when installing third-party libraries, and the latest
version of Python needs to be reinstalled. The reader is recommended to
reinstall Python here.
Installation Under Linux
Similar to Mac, Anaconda also offers Linux versions. Please refer to the
instructions under Windows and the accompanying code for specific
installation steps.
There are many versions of Linux, but due to space limitations, the only
installation on Ubuntu is described here. The following installation guide
may also run on other versions of Linux, but we have only tested these
installation steps on Ubuntu 14.04 or later.
Although Ubuntu has pre-installed Python, the version is older, and it is
recommended to install a newer version of Python.
Install Python
install [insert command here]
Pip is a Python software package management system that facilitates us to
install the required third-party libraries. The steps for installing pip are as
follows.
1) Open the terminal
2) Enter and run the following code
Python shell
Python, as a dynamic language, is usually used in two ways: it can be used as
a script interpreter to run edited program scripts; At the same time, Python
provides a real-time interactive command window (Python shell) in which
any Python statement can be entered and run. This makes it easy to learn,
debug, and test Python statements.
Enter "Python" in the terminal (Linux or Mac) or command prompt
(Windows) to start the Python shell.
1) You can assign values to variables in the Python shell and then calculate
the variables used. And you can always use these variables as long as you
don't close the shell. As shown in lines 1 to 3 of the code. It is worth noting
that Python is a so-called dynamic type language, so there is no need to
declare the type of a variable when assigning values to variables.
2) Any Python statement can be run in the Python shell, as shown in the
code, so some people even use it as a calculator.
3) You can also import and use a third-party library in the shell, as shown. It
should be noted that as shown in the code, the third-party library "numpy"
can be given an alias, such as "np" while being imported. When "numpy" is
needed later, it is replaced by "np" to reduce the amount of character input.
Chapter 5: Indexing and Selecting Arrays
Array indexing is very much similar to List indexing with the same
techniques of item selection and slicing (using square brackets). The methods
are even more similar when the array is a vector.
Example:
In []: # Indexing a vector array (values)
values
values [0] # grabbing 1st item
values [-1] # grabbing last item
values [1:3] # grabbing 2nd & 3rd item
values [3:8] # item 4 to 8
Out []: 35
Out []: 45
Tip: It is recommended to use the array_name [row, col] method, as it
saves typing and is more compact. This will be the convention for the rest
of this section.
To grab columns, we specify a slice of the row and column. Let us try to grab
the second column in the matrix and assign it to a variable column_slice.
In []: # Grabbing the second column
column_slice = matrix [: 1:2] # Assigning to variable
column_slice
Notice how the bool_array evaluates to True at all instances where the
elements of the odd_array meet the Boolean criterion.
The Boolean array itself is not usually so useful. To return the values that we
need, we will pass the Boolean_array into the original array to get our
results.
In []: useful_Array = odd_array[bool_array] # The values we want
useful_Array
Exercise: The conditional selection works on all arrays (vectors and matrices
alike). Create a two 3 3 array of elements greater than 80 from the
‘large_array’ given in the last exercise.
Hint: use the reshape method to convert the resulting array into a 3
3 matrix.
NumPy Array Operations
Finally, we will be exploring basic arithmetical operations with NumPy
arrays. These operations are not unlike that of integer or float Python lists.
Array – Array Operations
In NumPy, arrays can operate with and on each other using various arithmetic
operators. Things like the addition of two arrays, division, etc.
Example 65:
In []: # Array - Array Operations
# Addition
Array_sum = Array1 + Array2
Array_sum # show result array
#Subtraction
Array_minus = Array1 - Array2
Array_minus # Show array
# Multiplication
Array_product = Array1 * Array2
Array_product # Show
# Division
Array_divide = Array1 / Array2
Array_divide # Show
The KNN algorithm is highly used for building more complex classifiers. It is
a simple algorithm, but it has outperformed many powerful classifiers. That is
why it is used in numerous applications data compression, economic
forecasting, and genetics. KNN is a supervised learning algorithm, which
means that we are given a labeled dataset made up of training observations
(x, y) and our goal is to determine the relationship between x and y. This
means that we should find a function that x to y such that when we are given
an input value for x, we can predict the corresponding value for y. The
concept behind the KNN algorithm is very simple. We will use a dataset
named Iris. We had explored it previously. We will be using this to
demonstrate how to implement the KNN algorithm.
First, import all the libraries that are needed:
Note that we have created an instance of the class we have created and named
the instance knn_classifier. We have used one parameter in the instantiation,
that is, n_neighbors. We have used 5 as the value of this parameter, and this
basically, denotes the value of K. Note that there is no specific value for K,
and it is chosen after testing and evaluation. However, for a start, 5 is used as
the most popular value in most KNN applications. We can then use the test
data to make predictions. This can be done by running the script given below:
pred_y = knn_classifier. predict(X_test)
Evaluating the Accuracy
Evaluation of the KNN algorithm is not done in the same way as evaluating
the accuracy of the linear regression algorithm. We were using metrics like
RMSE, MAE, etc. In this case, we will use metrics like confusion matrix,
precision, recall, and f1 score. We can use the classification_report and
confusion_matrix methods to calculate these metrics. Let us first import these
from the Scikit-Learn library: from sklearn. metrics import confusion_matrix,
classification_report
Run the following script:
The results given above show that the KNN algorithm did a good job of
classifying the 30 records that we have in the test dataset. The results show
that the average accuracy of the algorithm on the dataset was about 90%. This
is not a bad percentage.
K Means Clustering
Let us manually demonstrate how this algorithm works before implementing
it on Scikit-Learn:
Suppose we have two-dimensional data instances given below and by the
name D:
Our objective is to classify the data based on the similarity between the data
points.
We should first initialize the values for the centroids of both clusters, and this
should be done randomly. The centroids will be named c1 and c2 for clusters
C1 and C2 respectively, and we will initialize them with the values for the
first two data points, that is, (5,3) and (10,15). It is after this that you should
begin the iterations. Anytime that you calculate the Euclidean distance, the
data point should be assigned to the cluster with the shortest Euclidean
distance. Let us take the example of the data point (5,3):
The Euclidean distance for the data point from point centroid c1 is shorter
compared to the distance of the same data point from centroid c2. This means
that this data point will be assigned to the cluster C1 the distance from the
data point to the centroid c2 is shorter; hence, it will be assigned to the cluster
C2. Now that the data points have been assigned to the right clusters, the
next step should involve the calculation of the new centroid values. The
values should be calculated by determining the means of the coordinates for
the data points belonging to a certain cluster. If for example for C1 we had
allocated the following two data points to the cluster:
(5, 3) and (24, 10). The new value for x coordinate will be the mean of the
two:
x = (5 + 24) / 2
x = 14.5
The new value for y will be:
y = (3 + 10) / 2
y = 13/2
y = 6.5
The new centroid value for the c1 will be (14.5, 6.5).
This should be done for c2, and the entire process is repeated. The iterations
should be repeated until when the centroid values do not update anymore.
This means if, for example, you do three iterations, you may find that the
updated values for centroids c1 and c2 in the fourth iterations are equal to
what we had in iteration 3. This means that your data cannot be clustered any
further. You are now familiar with how the K-Means algorithm works. Let
us discuss how you can implement it in the Scikit-Learn library. Let us first
import all the libraries that we need to use:
Data Preparation
We should now prepare the data that is to be used. We will be creating a
numpy array with a total of 10 rows and 2 columns. So, why have we chosen
to work with a numpy array? It is because the Scikit-Learn library can work
with the numpy array data inputs without the need for preprocessing.
If we use our eyes, we will probably make two clusters from the above data,
one at the bottom with five points and another one at the top with five points.
We now need to investigate whether this is what the K-Means clustering
algorithm will do.
Creating Clusters
We have seen that we can form two clusters from the data points, hence the
value of K is now 2. These two clusters can be created by running the
following code:
kmeans_clusters = KMeans(n_clusters=2)
kmeans_clusters.fit(X)
We have created an object named kmeans_clusters, and 2 have been used as
the value for the parameter n_clusters. We have then called the fit () method
on this object and passed the data we have in our numpy array as the
parameter to the method. We can now have a look at the centroid values that
the algorithm has created for the final clusters: print
(kmeans_clusters.cluster_centers_) This returns the following: The first row
above gives us the coordinates for the first centroid, which is, (16.8, 17). The
second row gives us the coordinates of the second centroid, which is, (70.2,
74.2). If you followed the manual process of calculating the values of these,
they should be the same. This will be an indication that the K-Means
algorithm worked well.
The following script will help us see the data point labels:
print (kmeans_clusters. labels_)
This returns the following:
The above output shows a one-dimensional array of 10 elements that
correspond to the clusters that are assigned to the 10 data points. Note that the
0 and 1 have no mathematical significance, but they have simply been used to
represent the cluster IDs. If we had three clusters, then the last one would
have been represented using 2’s.
We can now plot the data points and see how they have been clustered. We
need to plot the data points alongside their assigned labels to be able to
distinguish the clusters. Just execute the script given below:
plt. scatter (X [:0], X [:1], c=kmeans_clusters. labels_, cmap='rainbow')
plt. show ()
As you start to spend some more time on machine learning and all that it has
to offer, you will start to find that there are a lot of different learning
algorithms that you can work with. As you learn more about these, you will
be amazed at what they can do.
But before we give these learning algorithms the true time and attention that
they need, we first need to take a look at some of the building blocks that
make machine learning work the way that it should. This chapter is really
going to give us some insight into how these building blocks work and will
ensure that you are prepared to really get the most out of your learning
algorithms in machine learning.
The Learning Framework
Now that we have gotten to this point in the process, it is time to take a closer
look at some of the framework that is going to be present when you are
working with machine learning. This is going to be based a bit on statistics,
as well as the model that you plan to use when you work with machine
learning (more on that in a moment). Let’s dive into some of the different
parts of the learning framework that you need to know to really get the most
out of your machine learning process.
Let’s say that you decide that it is time to go on vacation to a new island. The
natives that you meet on this island are really interested in eating papaya, but
you have very limited experience with this kind of food. But you decide that
it is good to give it a try and head on down to the marketplace, hoping to
figure out which papaya is the best and will taste good to you.
Now, you have a few options as to how you would figure out which papaya is
the best for you. You could start by asking some people at the marketplace
which papayas are the best. But since everyone is going to have their own
opinion about it, you are going to end up with lots of answers. You can also
use some of your past experiences to do it.
At some point or another, you have worked with fresh fruit. You could use
this to help you to make a good choice. You may look at the color of the
papaya and the softness to help you make a decision. As you look through the
papaya, you will notice that there are a ton of colors, from dark browns to
reds, and even different degrees of softness so it is confusing to know what
will work the best.
After you look through the papayas a bit, you will want to come up with a
model that you can use that helps you to learn the best papaya for next time.
We are going to call this model a formal statistical learning framework and
there are going to be four main components to this framework that includes:
Learner’s input
Learner’s output
A measure of success
Simple data generalization
The first thing that we need to explore when it comes to the learning
framework in machine learning is the idea of the learner’s input. To help us
with this, we need to find a domain set, and then put all of our focus over to
it. This domain can easily be an arbitrary set that you find within your chosen
objects, and these are going to be known as the points, that you will need to
go through and label.
Once you have been able to go through and determine the best domain points
and then their sets that you are most likely to use, then you will need to go
through and create a label for the set that you are going to use, and the ones
that you would like to avoid. This helps you to make some predictions, and
then test out how well you were at making the prediction.
Then you need to take a look back at the learner’s output. Once you know
what the inputs of the scenario are all going to be about, it is going to be time
to work on a good output. The output is going to be the creation of a rule of
prediction. This is sometimes going to show up by another name such as the
hypothesis, classifier, and predictor, no matter what it is called, to take all of
your points and give them a label.
In the beginning, with any kind of program that you do, you are going to
make guesses because you aren’t sure what is going to work the best. You, or
the program, will be able to go through and use past experience to help you
make some predictions. But often, it is going to be a lot of trial and error to
see what is going to work the best.
Next, it is time to move on to the data generalization model. When you have
been able to add in the input and the output with the learner, it is time to take
a look at the part that is the data generalization model. This is a good model
to work with because it ensures that you can base it on the probability
distribution of the domain sets that you want to use.
It is possible that you will start out with all of this process and you will find
that it is hard to know what the distribution is all about. This model is going
to be designed to help you out, even if you don’t know which ones to pick out
from the beginning. You will, as you go through this, find out more about the
distribution, which will help you to make better predictions along the way.
Features Matrix
From the data we obtained from the Iris dataset, we can interpret our records
as a matrix or a two-dimensional array as shown in the table above. If we
choose to use the matrix, what we have is a features matrix.
By default, features matrices in Scikit-Learn are stored in variables identified
as x. Using the data from the table above to create a features matrix, we will
have a two-dimensional matrix that assumes the following shape [n_samples,
n_features]. Since we are introducing arrays, this matrix will, in most cases,
be part of an array in NumPy. Alternatively, you can also use Pandas
DataFrames to represent the features matrix.
Rows in Scikit-Learn (samples) allude to singular objects that are contained
within the dataset under observation. If, for example, we are dealing with
data about flowers as per the Iris dataset, our sample must be about flowers.
If you are dealing with students, the samples will have to be individual
students. Samples refer to any object under observation that can be quantified
in measurement.
Columns in Scikit-Learn (features) allude to unique descriptive observations
we use to quantify samples. These observations must be quantitative in
nature. The values used in features must be real values, though in some cases
you might come across data with discrete or Boolean values.
Target Arrays
Now that we understand what the features matrix (x) is, and its composition,
we can take a step further and look at target arrays. Target arrays are also
referred to as labels in Scikit-Learn. By default, they are identified as (y).
One of the distinct features of target arrays is that they must be one-
dimensional. The length of a target array is n_samples. You will find target
arrays either in the Pandas series or in NumPy arrays. A target array must
always have discrete labels or classes, and the values must be continuous if
using numerical values. For a start, it is wise to learn how to work with one-
dimensional target arrays. However, this should not limit your imagination.
As you advance into data analysis with Scikit-Learn, you will come across
advanced estimators that can support more than one target array. This is
represented as a two-dimensional array, in the form [n_samples, n_targets].
Remember that there exists a clear distinction between target arrays and
features columns. To help you understand the difference, take note that target
arrays identify the quantity we need to observe from the dataset. From our
knowledge of statistics, target arrays would be our dependent variables. For
example, if you build a data model from the Iris dataset that can use the
measurements to identify the flower species, the target array in this model
would be the species column.
The diagrams below give you a better distinction between the target vector
and the features matrix:
Diagram of a Target vector
Thank you for making it through to the end of Python for Data Science, let’s
hope it was informative and able to provide you with all of the tools you need
to achieve your goals whatever they may be.
The next step is to start putting the information and examples that we talked
about in this guidebook to good use. There is a lot of information inside all
that data that we have been collecting for some time now. But all of that data
is worthless if we are not able to analyze it and find out what predictions and
insights are in there. This is part of what the process of data science is all
about, and when it is combined with the Python language, we are going to see
some amazing results in the process as well.
This guidebook took some time to explore more about data science and what
it all entails. This is an in-depth and complex process, one that often includes
more steps than what data scientists were aware of when they first get started.
But if a business wants to be able actually to learn the insights that are in
their data, and they want to gain that competitive edge in so many ways, they
need to be willing to take on these steps of data science, and make it work for
their needs.
This guidebook went through all of the steps that you need to know in order
to get started with data science and some of the basic parts of the Python
code. We can then put all of this together in order to create the right
analytical algorithm that, once it is trained properly and tested with the right
kinds of data, will work to make predictions, provide information, and even
show us insights that were never possible before. And all that you need to do
to get this information is to use the steps that we outline and discuss in this
guidebook.
There are so many great ways that you can use the data you have been
collecting for some time now, and being able to complete the process of data
visualization will ensure that you get it all done. When you are ready to get
started with Python data science, make sure to check out this guidebook to
learn how.
Many programmers worry that they will not be able to work with neural
networks because they feel that these networks are going to be too difficult
for them to handle. These are more advanced than what we will see with
some of the other forms of coding, and some of the other machine learning
algorithms that you want to work with. But with some of the work that we
did with the coding above, neural networks are not going to be so bad, but the
tasks that they can take on, and the way they work, can improve the model
that you are writing, and what you can do when you bring Python into your
data science project.
Machine learning with Python:
William Wizner
© Copyright 2020 - All rights reserved.
The content contained within this book may not be reproduced, duplicated, or transmitted without
direct written permission from the author or the publisher.
Under no circumstances will any blame or legal responsibility be held against the publisher, or author,
for any damages, reparation, or monetary loss due to the information contained within this book. Either
directly or indirectly.
Legal Notice:
This book is copyright protected. This book is only for personal use. You cannot amend, distribute, sell,
use, quote or paraphrase any part, or the content within this book, without the consent of the author or
publisher.
Disclaimer Notice:
Please note the information contained within this document is for educational and entertainment
purposes only. All effort has been executed to present accurate, up to date, and reliable, complete
information. No warranties of any kind are declared or implied. Readers acknowledge that the author is
not engaging in the rendering of legal, financial, medical, or professional advice. The content within
this book has been derived from various sources. Please consult a licensed professional before
attempting any techniques outlined in this book.
By reading this document, the reader agrees that under no circumstances is the author responsible for
any losses, direct or indirect, which are incurred as a result of the use of the information contained
within this document, including, but not limited to, — errors, omissions, or inaccuracies.
Introduction:
Step 4: Now, the user can initialize a new python editor by clicking on the
“New” pull-down list and choosing “Python 3” (Figure 1.13)
Step 6: Now users can write and execute python codes. In Figure 1.15 a
famous “Hello World” code is written and executed
Fundamentals of Python programming
After learning how to install python, in this section fundamentals of
python programming which should be learned for writing basic python
programs will be described.
Data Types
In Python, data types are divided into the following categories:
1) Numbers: Includes integers, floating numbers, and complex numbers.
Integers can be at any length and are only limited by available machine
memory. Decimal points can contain up to 15 decimal places
Data Structures
A data structure or data type is a certain method a programming language
relies on for organizing data so it can be utilized most efficiently. Python
features four of these data types. Let’s go over them one by one.
1) Lists: Collections that are ordered, changeable, indexed, and allow
duplicate members.
2) Tuples: Collections that are ordered, unchangeable, indexed, and allow
duplicate members.
3) Sets: Collections that are unordered, unindexed, and don’t allow
duplicate members.
4) Dicts (Dictionaries): Collections that are unordered, changeable,
indexed, and don’t allow duplicate members.
List
Python lists can be identified through their use of square brackets.
The idea is to put the items in an orderly fashion separating each item with a
comma. Items can contain different data types or even other lists (resulting in
nested lists). After creation, you may modify the list by adding or removing
items. It is also possible to search through the list. You may access the
contents of lists by referring to the index number.
Example
Tuple
Tuples use parentheses to enclose the items. Other than that, tuples
are structured the same way as lists and you can still bring them up by
referring to the bracketed index number. The main difference is that you can’t
change the values once you create the tuple.
Example
Set
When you are using curly braces to surround a collection of elements,
you are creating a set. Unlike a list (which is something you naturally go
through from top to bottom), a set is unordered which means there is no index
you can refer to. However, you can use a “for loop” to look through the set or
use a keyword to check if a value can be found in that set. Sets let you add
new items but not change them.
Example
Dicts (Dictionaries)
Dictionaries or dicts rely on the same curly braces as sets and share
the same unordered properties. However, dicts are indexed by key names so
you have to define each by separating the key name and value with a colon.
You may also alter the values in the dict by referring to their corresponding
key names.
Example:
Variable names or identifiers
In python, variable names or identifiers (i.e. names given to variables,
functions, modules, …) can include either lowercase or uppercase letters,
numbers, parentheses, and underscore. However, python names and
identifiers cannot start with digits
Example: In the first example given in Figure 1.16, a variable called “test” is
assigned with a value of 2. In the second example, a variable called “1test” is
defined and assigned with a value of 2. However, as mentioned above,
python does not accept a variable name starting with a digit so here it gives
an error. Some predefined keywords are reserved by python and cannot be
used as variable names and identifiers. The list of these keywords is given in
Table 1.1
Comparison Operators
Within these two main groups, there are four major tasks of data mining,
which are outlined below.
Predictive Modeling
Predictive modeling refers to the creation of a model that predicts the value
of a predictive variable as a function of independent variables. There are two
types of predictive modeling.
Classification is the process of finding a model functions that can
differentiate between data classes to sort objects without a class. The
resulting model is the result of an analysis of a training data set containing
objects of a known class.
The resulting model can be presented in various forms, such as:
Classification rules (if-then),
Decision tree,
Mathematical formulas,
Naive bayesian classification,
Support vector machines (from now on referred to as svm),
Nearest neighbor.
If we use our eyes, we will probably make two clusters from the above data,
one at the bottom with five points and another one at the top with five points.
We now need to investigate whether this is what the K-Means clustering
algorithm will do.
Creating Clusters
We have seen that we can form two clusters from the data points, hence the
value of K is now 2. These two clusters can be created by running the
following code:
kmeans_clusters = KMeans(n_clusters=2)
kmeans_clusters.fit(X)
We have created an object named kmeans_clusters and 2 have been used as
the value for the parameter n_clusters. We have then called the fit () method
on this object and passed the data we have in our numpy array as the
parameter to the method.
We can now have a look at the centroid values that the algorithm has created
for the final clusters:
print (kmeans_clusters. cluster centers_)
This returns the following:
The first row above gives us the coordinates for the first centroid, which is,
(16.8, 17). The second row gives us the coordinates of the second centroid,
which is, (70.2, 74.2). If you followed the manual process of calculating the
values of these, they should be the same. This will be an indication that the
K-Means algorithm worked well.
The following script will help us see the data point labels:
print (kmeans_clusters. labels_)
This returns the following:
We have simply plotted the first column of the array named X against the
second column. At the same time, we have passed kmeans_labels_ as the
value for parameter c which corresponds to the labels. Note the use of the
parameter cmap='rainbow'. This parameter helps us to choose the color type
for the different data points.
As you expected, the first five points have been clustered together at the
bottom left and assigned a similar color. The remaining five points have been
clustered together at the top right and assigned one unique color.
We can choose to plot the points together with the centroid coordinates for
every cluster to see how the positioning of the centroid affects clustering. Let
us use three clusters to see how they affect the centroids. The following script
will help you to create the plot:
plt. scatter (X [:0], X [:1], c=kmeans_clusters. labels_, cmap='rainbow')
plt. scatter (kmeans_clusters. cluster centers_ [:0], kmeans_clusters. cluster
centers_ [:1], color='black')
plt. show ()
The script returns the following plot:
Here we see the kernel making the matrix product with the input image and
moving from 1 pixel from left to right and from top to bottom and generating
a new matrix that makes up the features map.
As we move the kernel and we get a "new image" filtered by the kernel. In
this first convolution and following the previous example, it is as if we
obtained 32 "new filtered images." These new images that they are "drawing"
are certain characteristics of the original image. This will help in the future to
distinguish one object from another (e.g., cat or dog).
The image performs a convolution with a kernel and applies the activation
function, in this case, ReLu.
Activation Function
The most commonly used activation function for this type of neural network
is called ReLu by Rectifier Linear Unit and consists of f (x) = max (0, x).
Subsampling
Now comes a step in which we will reduce the number of neurons before
making a new convolution. Why? As we saw, from our 28x28px black and
white image, we have a first input layer of 784 neurons, and after the first
convolution, we get a hidden layer of 25,088 neurons - which really are our
32 feature maps of 28 × 28.
If we made a new convolution from this layer, the number of neurons in the
next layer would go through the clouds (and that implies more processing)!
To reduce the size of the next layer of neurons, we will make a subsampling
process in which we will reduce the size of our filter. There are a few types of
subsampling methods available we will see the "mostly used": Max-Pooling.
The 3rd convolution will begin in size 7 × 7 pixels, and after the max-
pooling, it will remain in 3 × 3 with which we could do only one more
convolution. In this example, we started with a 28x28px image and made
three convolutions. If the initial image had been larger (224x224px), we
would still have been able to continue making convolutions.
TеnѕоrFlоw
“An ореn ѕоurсе mасhinе lеаrning frаmеwоrk for еvеrуоnе”
TensorFlow is Gооglе’ѕ ореn ѕоurсе AI frаmеwоrk for mасhinе lеаrning аnd
high реrfоrmаnсе numеriсаl соmрutаtiоn.
TеnѕоrFlоw iѕ a Pуthоn librаrу thаt invоkеѕ C++ tо соnѕtruсt and еxесutе
dаtаflоw grарhѕ. It supports many classifications and rеgrеѕѕiоn algorithms,
аnd mоrе generally, dеер learning and neural networks.
Onе оf the mоrе рорulаr AI libraries, TensorFlow ѕеrviсеѕ сliеntѕ likе
AirBnB, еBау, Drорbоx, аnd Cоса-Cоlа.
Pluѕ, bеing backed by Gооglе hаѕ itѕ реrkѕ. TеnѕоrFlоw саn be learned аnd
uѕеd оn Cоlаbоrаtоrу, a Jupyter notebook environment thаt runѕ in thе cloud,
requires nо set-up, and iѕ designed tо democratize mасhinе lеаrning
еduсаtiоn and research.
Some оf TensorFlow’s biggеѕt bеnеfitѕ аrе itѕ simplifications аnd
abstractions, which kеерѕ соdе lean and development efficient.
TеnѕоrFlоw iѕ AI frаmеwоrk designed tо hеlр everyone with machine
lеаrning.
Ѕсikit-lеаrn
Scikit-learn iѕ аn open ѕоurсе, соmmеrсiаllу uѕаblе AI library. Anоthеr
Python library, ѕсikit-lеаrn ѕuрроrtѕ bоth ѕuреrviѕеd аnd unѕuреrviѕеd
machine lеаrning. Sресifiсаllу, it supports сlаѕѕifiсаtiоn, regression, and
сluѕtеring аlgоrithmѕ, аѕ wеll аѕ dimensionality rеduсtiоn, mоdеl ѕеlесtiоn,
аnd preprocessing.
It’s built оn thе NumPY, mаtрlоtlib, аnd SciPy libraries, and in fact, the nаmе
“ѕсikit-lеаrn” iѕ a рlау оn “SciPy Tооlkit.”
Sсikit-lеаrn mаrkеtѕ itѕеlf as “ѕimрlе аnd efficient tools fоr data mining and
dаtа аnаlуѕiѕ” thаt iѕ “ассеѕѕiblе tо еvеrуbоdу, аnd rеuѕаblе in vаriоuѕ
соntеxtѕ.”
To ѕuрроrt thеѕе сlаimѕ, ѕсikit-lеаrn оffеrѕ аn extensive uѕеr guidе ѕо thаt
dаtа ѕсiеntiѕtѕ саn ԛ uiсklу ассеѕѕ resources оn аnуthing frоm multiclass and
multilabel аlgоrithmѕ tо соvаriаnсе еѕtimаtiоn.
AI as a Dаtа Analyst
AI, аnd ѕресifiсаllу machine lеаrning, hаѕ аdvаnсеd to a роint where it саn
реrfоrm the dау-tо-dау analysis that mоѕt business реорlе require. Does this
mean that data ѕсiеntiѕtѕ аnd analysts ѕhоuld fear for thеir jobs?
We don’t think so. With ѕеlf-ѕеrviсе analytics, machine lеаrning algorithms
саn hаndlе thе rероrting grunt wоrk ѕо that analysts and data scientists can
focus thеir timе оn thе аdvаnсеd tasks thаt lеvеrаgе their degrees аnd
ѕkillѕеtѕ. Pluѕ, buѕinеѕѕ реорlе won’t nееd tо wait аrоund for thе answers
thеу nееd.
Thеаnо
“A Python library thаt аllоwѕ you tо dеfinе, орtimizе, аnd еvаluаtе
mаthеmаtiсаl expressions invоlving multi-dimеnѕiоnаl arrays еffiсiеntlу”
Thеаnо iѕ a Pуthоn librаrу аnd орtimizing compiler designed fоr
mаniрulаting аnd еvаluаting expressions. In раrtiсulаr, Thеаnо еvаluаtеѕ
mаtrix-vаluеd еxрrеѕѕiоnѕ.
Speed iѕ one оf Theano’s strongest ѕuitѕ. It саn compete tое-tо-tое with thе
ѕрееd of hаnd-сrаftеd C language imрlеmеntаtiоnѕ thаt involve a lot of dаtа.
Bу taking аdvаntаgе оf recent GPUѕ, Thеаnо has also been аblе tо top C оn a
CPU bу a significant degree.
Bу раiring еlеmеntѕ оf a соmрutеr аlgеbrа ѕуѕtеm (CAS) with еlеmеntѕ of аn
орtimizing compiler, Thеаnо рrоvidеѕ аn idеаl еnvirоnmеnt fоr tаѕkѕ where
соmрliсаtеd mathematical еxрrеѕѕiоnѕ rе ԛ uirе repeated, fаѕt evaluation. It
саn minimizе extraneous соmрilаtiоn аnd analysis whilе рrоviding important
ѕуmbоliс fеаturеѕ.
Evеn thоugh new dеvеlорmеnt hаѕ сеаѕеd fоr Theano, it’ѕ ѕtill a роwеrful
and efficient platform fоr deep learning.
Theano is a machine learning librаrу that can help уоu dеfinе аnd орtimizе
mаthеmаtiсаl expressions with еаѕе.
Caffe
Cаffе iѕ аn ореn deep lеаrning framework dеvеlореd by Bеrkеlеу AI
Research in соllаbоrаtiоn with community соntributоrѕ, and it offers bоth
models and wоrkеd еxаmрlеѕ for dеер lеаrning.
Cаffе рriоritizеѕ еxрrеѕѕiоn, ѕрееd, аnd mоdulаritу in its framework. In fасt,
itѕ аrсhitесturе ѕuрроrtѕ соnfigurаtiоn-dеfinеd models and орtimizаtiоn
without hаrd соding, аѕ well аѕ the ability tо switch between CPU and GPU.
Pluѕ, Cаffе iѕ highly аdарtivе tо research еxреrimеntѕ аnd industry
deployments because it can process over 60M images реr day with a ѕinglе
NVIDIA K40 GPU— one of thе fastest соnvnеt imрlеmеntаtiоnѕ аvаilаblе,
according to Caffe.
Cаffе’ѕ lаnguаgе iѕ C++ and CUDA with Cоmmаnd line, Python, and
MATLAB intеrfасеѕ. Caffe’s Berkeley Viѕiоn аnd Learning Center mоdеlѕ
аrе liсеnѕеd fоr unrеѕtriсtеd uѕе, аnd thеir Mоdеl Zоо offers аn open
соllесtiоn оf dеер mоdеlѕ dеѕignеd to share innovation аnd rеѕеаrсh.
Cаffе iѕ an open dеер lеаrning framework and AI librаrу dеvеlореd by
Bеrkеlеу.
Keras
Kеrаѕ is a high-lеvеl nеurаl network API thаt саn run оn top оf TеnѕоrFlоw,
Miсrоѕоft Cоgnitivе Tооlkit, оr Theano. Thiѕ Pуthоn dеер lеаrning librаrу
fасilitаtеѕ fast еxреrimеntаtiоn аnd сlаimѕ thаt “bеing able tо gо from idеа tо
rеѕult with thе lеаѕt роѕѕiblе dеlау iѕ key to doing gооd rеѕеаrсh.”
Instead of an еnd-tо-еnd machine lеаrning frаmеwоrk, Keras ореrаtеѕ аѕ a
uѕеr-friеndlу, еаѕilу еxtеnѕiblе intеrfасе thаt ѕuрроrtѕ modularity аnd total
expressiveness. Standalone modules — such as nеurаl lауеrѕ, cost functions,
and mоrе — саn be соmbinеd with few rеѕtriсtiоnѕ, аnd new modules аrе
еаѕу tо add.
With consistent аnd simple APIѕ, user асtiоnѕ are minimized fоr соmmоn uѕе
cases. It саn run in bоth CPU and GPU аѕ well.
Kеrаѕ iѕ a руthоn deep learning library thаt runѕ оn top оf оthеr рrоminеnt
machine learning librаriеѕ.
Miсrоѕоft Cоgnitivе Tооlkit
“A free, easy-to-use, ореn-ѕоurсе, соmmеrсiаl-grаdе toolkit that trains dеер
lеаrning аlgоrithmѕ to learn like thе human brаin.”
Prеviоuѕlу known аѕ Miсrоѕоft CNTK, Microsoft Cognitive Toolkit iѕ an
ореn ѕоurсе dеер learning librаrу dеѕignеd to ѕuрроrt robust, соmmеrсiаl-
grаdе dаtаѕеtѕ аnd аlgоrithmѕ.
With big-nаmе clients likе Skуре, Cortana, аnd Bing, Microsoft Cognitive
Toolkit offers efficient ѕсаlаbilitу frоm a ѕinglе CPU to GPUѕ tо multiрlе
machines— withоut ѕасrifiсing a ԛ uаlitу degree of ѕрееd and ассurасу.
Miсrоѕоft Cognitive Tооlkit supports C++, Pуthоn, C#, аnd BrаinSсriрt. It
offers pre-built algorithms fоr trаining, аll of which can bе сuѕtоmizеd,
though уоu can uѕе аlwауѕ uѕе your оwn. Cuѕtоmizаtiоn орроrtunitiеѕ еxtеnd
tо parameters, algorithms, and nеtwоrkѕ.
Microsoft Cоgnitivе Tооlkit is a free аnd open-source AI librаrу that's
dеѕignеd to train dеер lеаrning аlgоrithmѕ like thе humаn brain.
PyTorch
“An ореn ѕоurсе deep learning рlаtfоrm thаt provides a ѕеаmlеѕѕ раth frоm
rеѕеаrсh prototyping to production dерlоуmеnt.”
PуTоrсh is an ореn source mасhinе lеаrning library for Pуthоn that wаѕ
developed mаinlу by Fасеbооk’ѕ AI research grоuр.
PyTorch supports both CPU and GPU computations and оffеrѕ ѕсаlаblе
distributed training аnd реrfоrmаnсе орtimizаtiоn in rеѕеаrсh аnd рrоduсtiоn.
It’ѕ twо high-level fеаturеѕ inсludе tеnѕоr соmрutаtiоn (similar tо NumPу)
with GPU ассеlеrаtiоn and dеер nеurаl networks built оn a tаре-bаѕеd
аutоdiff system.
With extensive tооlѕ аnd libraries, PуTоrсh рrоvidеѕ plenty оf rеѕоurсеѕ tо
ѕuрроrt dеvеlорmеnt, inсluding:
AllеnNLP, an ореn ѕоurсе rеѕеаrсh librаrу dеѕignеd tо evaluate deep lеаrning
models fоr nаturаl language рrосеѕѕing.
ELF, a gаmе rеѕеаrсh рlаtfоrm that allows developers to trаin аnd tеѕt
аlgоrithmѕ in different gаmе еnvirоnmеntѕ.
Glоw, a mасhinе learning соmрilеr thаt enhances реrfоrmаnсе for dеер
lеаrning frameworks on vаriоuѕ hаrdwаrе рlаtfоrmѕ.
PуTоrсh is a dеер learning рlаtfоrm and AI librаrу fоr rеѕеаrсh рrоtоtурing
and рrоduсtiоn deployment.
Tоrсh
Similаr to PyTorch, Tоrсh iѕ a Tеnѕоr librаrу that’s ѕimilаr tо NumPy аnd
аlѕо supports GPU (in fact, Tоrсh рrосlаimѕ thаt thеу рut GPUѕ “firѕt”).
Unlikе PуTоrсh, Tоrсh is wrарреd in LuaJIT, with an undеrlуing C/CUDA
implementation.
A scientific computing frаmеwоrk, Torch рriоritizеѕ speed, flеxibilitу, аnd
ѕimрliсitу when it comes tо building algorithms.
With рорulаr nеurаl nеtwоrkѕ аnd optimization librаriеѕ, Tоrсh рrоvidеѕ
uѕеrѕ with libraries thаt are easy tо uѕе whilе enabling flеxiblе
imрlеmеntаtiоn оf соmрlеx nеurаl nеtwоrk tороlоgiеѕ. Tоrсh is an AI
frаmеwоrk fоr computing with LuаJIT.
Chapter 11: The Future of Machine Learning
Now that we have come to the end of the book, I hope you have
gathered a basic understanding of what machine learning is and how you can
build a machine learning model in Python. One of the best ways to begin
building a machine learning model is to practice the code in the book, and
also try to write similar code to solve other problems. It is important to
remember that the more you practice, the better you will get. The best way to
go about this is to begin working on simple problem statements and solve
them using the different algorithms. You can also try to solve these problems
by identifying newer ways to solve the problem. Once you get a hang of the
basic problems, you can try using some advanced methods to solve those
problems.
Thanks for reading to the end!
Python Machine Learning may be the answer that you are looking for when it
comes to all of these needs and more. It is a simple process that can teach
your machine how to learn on its own, similar to what the human mind can
do, but much faster and more efficient. It has been a game-changer in many
industries, and this guidebook tried to show you the exact steps that you can
take to make this happen.
There is just so much that a programmer can do when it comes to using
Machine Learning in their coding, and when you add it together with the
Python coding language, you can take it even further, even as a beginner.
The next step is to start putting some of the knowledge that we discussed in
this guidebook to good use. There are a lot of great things that you can do
when it comes to Machine Learning, and when we can combine it with the
Python language, there is nothing that we can’t do when it comes to training
our machine or our computer.
This guidebook took some time to explore a lot of the different things that
you can do when it comes to Python Machine Learning. We looked at what
Machine Learning is all about, how to work with it, and even a crash course
on using the Python language for the first time. Once that was done, we
moved right into combining the two of these to work with a variety of Python
libraries to get the work done.
You should always work towards exploring different functions and features
in Python, and also try to learn more about the different libraries like SciPy,
NumPy, PyRobotics, and Graphical User Interface packages that you will be
using to build different models.
Python is a high-level language which is both interpreters based and object-
oriented. This makes it easy for anybody to understand how the language
works. You can also extend the programs that you build in Python onto other
platforms. Most of the inbuilt libraries in Python offer a variety of functions
that make it easier to work with large data sets.
You will now have gathered that machine learning is a complex concept that
can easily be understood. It is not a black box that has undecipherable terms,
incomprehensible graphs, or difficult concepts. Machine learning is easy to
understand, and I hope the book has helped you understand the basics of
machine learning. You can now begin working on programming and building
models in Python. Ensure that you diligently practice since that is the only
way you can improve your skills as a programmer.
If you have ever wanted to learn how to work with the Python coding
language, or you want to see what Machine Learning can do for you, then
this guidebook is the ultimate tool that you need! Take a chance to read
through it and see just how powerful Python Machine Learning can be for
you.