Chapter 1 Introduction To Datascience

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 13

INTRODUCTION TO

DATA SCIENCE
CHAPTER 1

“Introduction to Data Science : Practical Approach with R and Python ”


B.Uma Maheswari and R Sujatha
Copyright @ 2021 Wiley India Pvt. Ltd. All rights reserved.
LEARNING OBJECTIVES
•Understand the concept of data science
•Briefly learn the history of data science
•Learn about the fundamental fields related to data science
•Understand the different terminologies related to data science like big data,
•Business intelligence, data mining, artificial intelligence, machine learning and deep
learning.
•Learn about the different types of analytics- descriptive, diagnostic, predictive and
prescriptive
•Learn briefly about the applications of data science.
•Comprehend the data science process model
DATA SCIENCE

Data Science is the science of


understanding data using
processes, tools and techniques
which aid in decision making. It
involves techniques for
identifying, collecting and
exploring the data using colorful
plots and graphs
HISTORY OF DATA SCIENCE
John W.Tukey, a mathematician in his article “The Future of Data Analysis”.
John Chambers, Consulting Professor, Stanford University. The S system is the
basis for all the future statistical programming languages including the R language
which will be discussed in this book
Jeff Wu, Coco - Cola chair in Engineering Statistics and Professor at Georgia Tech
coined the term “Data Science” in 1997
William Cleveland , Distinguished Professor of Statistics and Professor of Computer
Science at Purdue University authored many books on data visualization
Leo Breiman, distinguished statistician at the University of California, Berkeley was
one of the pioneers in ‘machine learning.
WHY IS DATA SCIENCE
RECEIVING SO MUCH
ATTENTION
•Increasing usage of internet which has generated more data.
•Growing usage of smart phones, tablets and digital devices
•Increasing usage of social media
•Increasing computational capability with both hardware and software becoming powerful by the
day.
•Programming languages to work with such data are freely available through open source platforms.
•Programmers across the world are creating complex algorithms and contributing to the open source
developers’ community.
•Easy and speedy access to such data for every individual or organization irrespective of the size of
the concern.
•Storage of data becoming cheaper.
According to the data captured by the cloud software
company Domo, as on April 2020, internet has reached
DATA, DATA AND MORE DATA 59% of the world population.

Every minute on the internet,


Zoom hosts 2,08,333 participants in meetings
Netflix users stream 4,00,444 hours of video
Instagram users post 3,47,222 stories.
YouTube users upload 500 hours of video
Twitter gains 319 new users
Facebook users share 1,50,000 messages
Linkedin users apply for 69,444 jobs
Amazon ships 6,659 packages
Whatsapp users share 4,16,66,667 messages
Consumers spend $10,00,000 online.
FUNDAMENTAL FIELDS OF
STUDY RELATING TO DATA
SCIENCE
Mathematic
Statistics
s

Computer Domain
Science Data Knowledge
Scienc
e
BIG DATA
Business Intelligence

Business Intelligence (BI) involves gathering, pre-processing and most importantly


presenting such data using data-visualization tools and techniques through charts, plots,
tables and dashboards
DATA MINING
Data mining is the technology used for processing large volume of data
Generate inferences from data such as
Identifying trends in stock prices
Categorizing customers on the basis of their preferences
Ascertaining the purchasing patterns of customers
Predicting student performance in an educational institution
 Lie detection in dealing with criminals etc.
Applications of data mining can be seen in the field of agriculture,
education, industrial engineering, marketing, healthcare etc.
ARTIFICIAL INTELLIGENCE-
MACHINE LEARNING-DEEP
LEARNING
Artificial Intelligence:AI is the design of smart machines or
algorithms which can perform functions or tasks that generally
requires human intelligence
Machine Learning:Machine Learning (ML) is a subset of
artificial intelligence which refers to the modelling techniques,
where the model learns on its own without human intervention.
Deep Learning:Deep learning is a part of machine learning
which works more effectively on larger datasets and aims at
pattern recognition by imitating the human brain.
TYPES OF ANALYTICS

Why did it What should


• Descriptive happen? • Predictive we do ?
Analytics • Diagnostic Analytics • Prescriptive
Analytics Analytics
What has What will
happened ? happen ?
DATA SCIENCE PROCESS
MODEL

Exploratory Data Dimensionality Model


Objective Data collection
Data analysis visualization reduction building

The project Collate the data


objective needs to from the different (Chapter 3) (Chapter 4) (Chapter 5) (Chapter 7-14)
be identified sources

You might also like