DVP 2

The document discusses data gathering and cleaning. It notes that data is vital for decision making and strategic planning. There are five main steps for data science processing: 1) data acquisition, 2) data cleaning, 3) exploratory analysis, 4) creating an analysis model, and 5) data visualization. Python provides libraries like Pandas, NumPy, SciPy, and Matplotlib to support data gathering, cleaning, processing, and visualization. Cleaning data is important as data collected from various sources may contain errors, missing values, and noisy data that needs to be addressed.

Uploaded by

padma

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views

DVP 2

Uploaded by

padma

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 5

Data Gathering and Cleaning

In the 21st century,

• data is vital for decision-making and developing long-term strategic

plans.
• Python provides numerous libraries and built-in features that make
it easy to support data analysis and processing.

Making business decisions, forecasting weather, studying protein

structures in biology, and designing a marketing campaign are all
examples that require collecting data and then cleaning, processing, and
visualizing it.
There are five main steps for data science processing.

1. Data acquisition is where you read data from various sources of unstructured data, semi
structured data, or full-structured data that might be stored in a spreadsheet, comma separated file,
web page, database, etc.

2. Data cleaning is where you remove noisy data and make operations needed to keep only the relevant data.

3. Exploratory analysis is where you look at your cleaned data and make statistical processing fits for specific analysis
purposes

4. An analysis model needs to be created. Advanced tools such as machine learning algorithms can be used in this step.

5. Data visualization is where the results are plotted using various systems provided by Python to help in the decision-
making process.
Python provides several libraries for data gathering, cleaning, integration, processing, and
visualizing.

• Pandas is an open-source Python library used to load, organize, manipulate, model, and analyze data by
offering powerful data structures.

• NumPy is a Python package that stands for “numerical Python. It is a library consisting of multidimensional array objects and
a collection of routines for manipulating arrays. It can be used to perform mathematical, logical,
and linear algebra operations on arrays.

• SciPy is another built-in Python library for numerical integration and optimization.

• Matplotlib is a Python library used to create 2D graphs and plots. It supports a wide variety of graphs and plots
such as histograms, bar charts, power spectra, error charts, and so on, with additional formatting such as control line
styles, font properties, formatting axes, and more
Cleaning Data

Data is collected and entered manually or automatically using various

methods such as weather sensors, financial stock market data servers,
users’ online commercial preferences, etc.

Collected data is not error-free and usually has various missing data points and
erroneously entered data. For instance, online users might not want to enter their
information because of privacy concerns. Therefore,
treating missing and noisy data (NA or NaN) is important for any data
analysis processing.
Checking for Missing Values

Software Testing An ISTQB ISEB Foundation Guide Peter Morgan - Download the ebook today and own the complete content
100% (1)
Software Testing An ISTQB ISEB Foundation Guide Peter Morgan - Download the ebook today and own the complete content
47 pages
Assignment 1 - Making The Familiar Unfamiliar
No ratings yet
Assignment 1 - Making The Familiar Unfamiliar
3 pages
Data Analysis with Python
No ratings yet
Data Analysis with Python
51 pages
Data Science I: Charles C.N. Wang
No ratings yet
Data Science I: Charles C.N. Wang
68 pages
E-Book Data Cleaning Techniques in Python
100% (2)
E-Book Data Cleaning Techniques in Python
50 pages
Data Analysis With Python
No ratings yet
Data Analysis With Python
29 pages
PDS_Exp_7_to_9
No ratings yet
PDS_Exp_7_to_9
10 pages
PREREQUISITE SESSION 4
No ratings yet
PREREQUISITE SESSION 4
12 pages
IJERT Data Analysis Using Python
No ratings yet
IJERT Data Analysis Using Python
6 pages
DS FINAL
No ratings yet
DS FINAL
46 pages
Efficient Data Preparation: With Python
No ratings yet
Efficient Data Preparation: With Python
19 pages
Experiment No: 1 Title:: Creating Vectors and Data Frames and Implementing Data Summary Functions
No ratings yet
Experiment No: 1 Title:: Creating Vectors and Data Frames and Implementing Data Summary Functions
8 pages
Data Science Workflow
No ratings yet
Data Science Workflow
7 pages
Getting Started With Python Data Analysis - Sample Chapter
0% (1)
Getting Started With Python Data Analysis - Sample Chapter
17 pages
Lecture 2 The data science process and tools for each step
No ratings yet
Lecture 2 The data science process and tools for each step
8 pages
Python for Data Analysis
No ratings yet
Python for Data Analysis
84 pages
11_20241108_DataAnalysis_AppliExamples
No ratings yet
11_20241108_DataAnalysis_AppliExamples
36 pages
Unit 2 Data Gathering
No ratings yet
Unit 2 Data Gathering
14 pages
Data Science lecture 5 6th semster
No ratings yet
Data Science lecture 5 6th semster
3 pages
Python For Data Exploration
No ratings yet
Python For Data Exploration
28 pages
Machine Learning Lecture2
No ratings yet
Machine Learning Lecture2
38 pages
Report
No ratings yet
Report
18 pages
Foundations of Data Science
No ratings yet
Foundations of Data Science
139 pages
DAL EXT 1 and 2
No ratings yet
DAL EXT 1 and 2
125 pages
2.1 - Introduction To Data Analytics
No ratings yet
2.1 - Introduction To Data Analytics
32 pages
Prac 7
No ratings yet
Prac 7
5 pages
Data Science - III
No ratings yet
Data Science - III
94 pages
Deep Learning Ram
No ratings yet
Deep Learning Ram
21 pages
DS-DS Lab-1
No ratings yet
DS-DS Lab-1
4 pages
DATA ANALYSIS USING PYTHON2
No ratings yet
DATA ANALYSIS USING PYTHON2
27 pages
S08 Slides
No ratings yet
S08 Slides
14 pages
Unit - 1
No ratings yet
Unit - 1
25 pages
data science
No ratings yet
data science
42 pages
Python (Unit - 2)
No ratings yet
Python (Unit - 2)
22 pages
Data Visulization Chapter 2
No ratings yet
Data Visulization Chapter 2
24 pages
Python Data Wrangling for Business Analytics: Python for Business Analytics Series
From Everand
Python Data Wrangling for Business Analytics: Python for Business Analytics Series
George Snypes
2/5 (1)
Dav Exps - Merged - Merged
No ratings yet
Dav Exps - Merged - Merged
99 pages
Python For Data Analysis
No ratings yet
Python For Data Analysis
49 pages
Python Data Analysis Sample Chapter
No ratings yet
Python Data Analysis Sample Chapter
40 pages
Sodapdf
No ratings yet
Sodapdf
1 page
Advanced Python Lab
No ratings yet
Advanced Python Lab
17 pages
Data Analytics and Reporting - Notes Unit 1 and 2
No ratings yet
Data Analytics and Reporting - Notes Unit 1 and 2
11 pages
Data Processing with Python and R
No ratings yet
Data Processing with Python and R
6 pages
Unit 2, 3
No ratings yet
Unit 2, 3
9 pages
AA MDM MST
No ratings yet
AA MDM MST
8 pages
BDA-24_Lect (3-4)-(Fundamentals of Data Analysis)
No ratings yet
BDA-24_Lect (3-4)-(Fundamentals of Data Analysis)
15 pages
lab2report
No ratings yet
lab2report
6 pages
MGNM801 Ca2 Final
No ratings yet
MGNM801 Ca2 Final
13 pages
Ipl Data Analysis Pbl
No ratings yet
Ipl Data Analysis Pbl
11 pages
Data Exploration Preparation
No ratings yet
Data Exploration Preparation
12 pages
TY FDS Workbook
No ratings yet
TY FDS Workbook
56 pages
Data Science in Society Cat
No ratings yet
Data Science in Society Cat
5 pages
Labdev
No ratings yet
Labdev
57 pages
tool and lib in Data Science
No ratings yet
tool and lib in Data Science
32 pages
DS Lec 6
No ratings yet
DS Lec 6
27 pages
Lab Assignment 1 Title: Data Wrangling I: Problem Statement
No ratings yet
Lab Assignment 1 Title: Data Wrangling I: Problem Statement
12 pages
Lavanya Sharma IP File 2024-25-1
No ratings yet
Lavanya Sharma IP File 2024-25-1
37 pages
IV_AI-DS_AD3491_FDSA_QB_Unit1
No ratings yet
IV_AI-DS_AD3491_FDSA_QB_Unit1
5 pages
Hands-on Data Analysis and Visualization with Pandas: Engineer, Analyse and Visualize Data, Using Powerful Python Libraries
From Everand
Hands-on Data Analysis and Visualization with Pandas: Engineer, Analyse and Visualize Data, Using Powerful Python Libraries
PURNA CHANDER RAO. KATHULA
5/5 (1)
Internship
No ratings yet
Internship
31 pages
Unit 1
100% (1)
Unit 1
69 pages
Data Science Workshop - Day 1
No ratings yet
Data Science Workshop - Day 1
80 pages
Module 1 - CN
No ratings yet
Module 1 - CN
126 pages
Functions in C
No ratings yet
Functions in C
110 pages
Module 3
No ratings yet
Module 3
3 pages
IPV4 Notes
No ratings yet
IPV4 Notes
21 pages
CNS Lab CIE SET Rubrics
No ratings yet
CNS Lab CIE SET Rubrics
3 pages
22MCN1PCO2 (ACN) Syllabus
No ratings yet
22MCN1PCO2 (ACN) Syllabus
3 pages
Module 3
No ratings yet
Module 3
50 pages
Mod 3
No ratings yet
Mod 3
44 pages
1 Calender
No ratings yet
1 Calender
1 page
Lung - Pathophysiology
No ratings yet
Lung - Pathophysiology
66 pages
Zoheb Hijama
100% (1)
Zoheb Hijama
49 pages
Al-Harahsheh, Sabah. The Challenges of Translating Military Terms From English Into Arabic
No ratings yet
Al-Harahsheh, Sabah. The Challenges of Translating Military Terms From English Into Arabic
79 pages
Proper English Vs Slang
No ratings yet
Proper English Vs Slang
5 pages
Organic Compounds Containing Oxygen PDF
No ratings yet
Organic Compounds Containing Oxygen PDF
17 pages
Report
No ratings yet
Report
11 pages
BB101 Chapter 2 Linear Motion - Edit
No ratings yet
BB101 Chapter 2 Linear Motion - Edit
9 pages
Banff Trail Area Redevelopment Plan
No ratings yet
Banff Trail Area Redevelopment Plan
47 pages
Tecknit Catalog PDF
No ratings yet
Tecknit Catalog PDF
218 pages
Y08 1028 PDF
No ratings yet
Y08 1028 PDF
8 pages
PHD Thesis On Lung Cancer
100% (3)
PHD Thesis On Lung Cancer
8 pages
Jamb 2022 Possible Questions
No ratings yet
Jamb 2022 Possible Questions
42 pages
Case Study - Quality Management System at Coca Cola Company - Docx - 1538569969006 PDF
No ratings yet
Case Study - Quality Management System at Coca Cola Company - Docx - 1538569969006 PDF
7 pages
The Discount Pharmacy Business Plan
100% (1)
The Discount Pharmacy Business Plan
30 pages
1250 - Homset Homogenizer and Pasteurizer Price Offer
No ratings yet
1250 - Homset Homogenizer and Pasteurizer Price Offer
5 pages
Buku Teks Matematik Tahun 6 KSSR
No ratings yet
Buku Teks Matematik Tahun 6 KSSR
201 pages
Os Edited
No ratings yet
Os Edited
63 pages
R-Stahl Ammeter Model
No ratings yet
R-Stahl Ammeter Model
4 pages
Paper Nhóm 2 Đã Chỉnh Sửa
100% (1)
Paper Nhóm 2 Đã Chỉnh Sửa
11 pages
Grade 11 Mid-Term Test 2018
No ratings yet
Grade 11 Mid-Term Test 2018
4 pages
Index Page: S.No. Date Name of The Experiment Marks Awarded Remarks/ Initial's Part - A
No ratings yet
Index Page: S.No. Date Name of The Experiment Marks Awarded Remarks/ Initial's Part - A
39 pages
High 1 Workbook Answer
No ratings yet
High 1 Workbook Answer
10 pages
Formula For Gravitation
No ratings yet
Formula For Gravitation
4 pages
Thermodynamics 2 Rankine Cycle
No ratings yet
Thermodynamics 2 Rankine Cycle
207 pages
Fardin IBM
No ratings yet
Fardin IBM
42 pages
D. None of These: 6. It Is The Return of All Resources To Their Respective Places of Origin
100% (1)
D. None of These: 6. It Is The Return of All Resources To Their Respective Places of Origin
2 pages
Gpms
No ratings yet
Gpms
2 pages
Role of ICT's in Decision Making: Presented By:Abdul Majid
No ratings yet
Role of ICT's in Decision Making: Presented By:Abdul Majid
13 pages