Data Preprocessing

Uploaded by

mavoho1719

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Data Preprocessing

Uploaded by

mavoho1719

0% found this document useful (0 votes)

15 views2 pages

Original Title

data_preprocessing

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Download as pdf or txt

0% found this document useful (0 votes)

15 views2 pages

Data Preprocessing

Uploaded by

mavoho1719

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Download as pdf or txt

Jump to Page

You are on page 1of 2

Search inside document

Chapter 1

Data Preprocessing

There are different techniques used in learning in order to improve the accuracy of the model
by preprocessing the data:

There are three common forms of data preprocessing a data matrix X, where we will assume
that X is of size [N × D] (N is the number of data, D is their dimensionality).

Figure 1.1: Preprocessing example 1

Figure 1.2: Preprocessing example 2

Mean subtraction Most common form of preprocessing. It involves subtracting the mean
across every individual feature in the data, and has the geometric interpretation of centring
the cloud of data around the origin along every dimension. In NumPy, this operation would
be implemented as: X -= np.mean(X, axis = 0). With images specifically, for convenience it

1
2 Chapter 1 Data Preprocessing

can be common to subtract a single value from all pixels (e.g. X -= np.mean(X)), or to do so
separately across the three color channels.

Normalization Refers to normalizing the data dimensions so that they are of approximately
the same scale. There are two common ways of achieving this normalization. One is to divide
each dimension by its standard deviation, once it has been zero-centered: (X /= np.std(X,
axis = 0)). Another form of this preprocessing normalizes each dimension so that the min and
max along the dimension is -1 and 1 respectively. It only makes sense to apply this preprocessing
if you have a reason to believe that different input features have different scales (or units), but
they should be of approximately equal importance to the learning algorithm. In case of images,
the relative scales of pixels are already approximately equal (and in range from 0 to 255), so it
is not strictly necessary to perform this additional preprocessing step.

PCA The reason why one would want to use PCA is if one expects that many of the features
are in fact dependent. This would be particularly handy for Naive Bayes where independence
is assumed. Most datasets are far too large to use PCA. Attention: PCA complexity is O(n3 ),
so more sophisticated methods are required. But if your dataset is small, and you don’t have
the time to investigate more sophisticated methods, then by all means go ahead and apply an
out-of-box PCA for feature selection.

Whitening Takes the data in the eigenbasis and divides every dimension by the eigenvalue
to normalize the scale. The geometric interpretation of this transformation is that if the input
data is a multi-variable Gaussian, then the whitened data will be a Gaussian with zero mean
and identity covariance matrix.

Deep Learning with images

For Deep Learning for images we will only use: center our data to zero. To do so, for each
pixel, compute its mean across all the dataset and subtract the resulting mean image to all the
training samples. If you have more than one channel (e.g. RGB) do it for each of the channels
separately.

Coincent - Data Science With Python Assignment
Document23 pages
Coincent - Data Science With Python Assignment
Sai Nikhil Nellore
100% (2)
Sedimentation 1
Document25 pages
Sedimentation 1
Jatskinesis
No ratings yet
CS231n Convolutional Neural Networks For Visual Recognition 6
Document17 pages
CS231n Convolutional Neural Networks For Visual Recognition 6
Ali Rahimi
No ratings yet
Report On Prediction Model
Document17 pages
Report On Prediction Model
Victor Imeh
No ratings yet
Yufan Wang-Thesis-Recurrent Networks
Document52 pages
Yufan Wang-Thesis-Recurrent Networks
18grahamt
No ratings yet
New Bridges Between Deep Learning and Partial Differential Equations
Document5 pages
New Bridges Between Deep Learning and Partial Differential Equations
Aman Jalan
No ratings yet
Lab DigitRecognitionMINST
Document10 pages
Lab DigitRecognitionMINST
techoverlord.contact
No ratings yet
Aula 4 (L) - Oggi La Tua Lezione È in Presenza
Document11 pages
Aula 4 (L) - Oggi La Tua Lezione È in Presenza
Ilaria Papillo
No ratings yet
Neural Networks Study Notes
Document11 pages
Neural Networks Study Notes
pekalu
100% (2)
Pca
Document19 pages
Pca
HJ Consultants
No ratings yet
Week7 DataProcessionMethod
Document6 pages
Week7 DataProcessionMethod
18782101508
No ratings yet
Score-Based Generative Modeling
Document31 pages
Score-Based Generative Modeling
Giovanni Carbinatti
No ratings yet
ML Lab - Sukanya Raja
Document23 pages
ML Lab - Sukanya Raja
Rick Mitra
No ratings yet
Assignment#3 AI
Document5 pages
Assignment#3 AI
Taimoor ali
No ratings yet
machine learning unit 3-5
Document13 pages
machine learning unit 3-5
sugeelgundoo121
No ratings yet
Dimensional Reduction in R
Document24 pages
Dimensional Reduction in R
Shil Shambharkar
No ratings yet
Interview Preparing - ML Draft
Document12 pages
Interview Preparing - ML Draft
الريس حمادة
No ratings yet
DENCLUE 2.0: Fast Clustering Based On Kernel Density Estimation
Document11 pages
DENCLUE 2.0: Fast Clustering Based On Kernel Density Estimation
Hanif Awalludin
No ratings yet
It-3031 (DMDW) - CS End Nov 2023
Document23 pages
It-3031 (DMDW) - CS End Nov 2023
21051796
No ratings yet
Manifold Learning Algorithms
Document17 pages
Manifold Learning Algorithms
shatakirti
No ratings yet
Distance-Preserving Dimensionality Reduction (Wiley Interdisciplinary Reviews - Data Mining and Knowledge Discovery, Vol. 1, Issue 5) (2011)
Document12 pages
Distance-Preserving Dimensionality Reduction (Wiley Interdisciplinary Reviews - Data Mining and Knowledge Discovery, Vol. 1, Issue 5) (2011)
juan
No ratings yet
DeepLearning
Document21 pages
DeepLearning
indhumathiks1
No ratings yet
Preprocessing Stage
Document4 pages
Preprocessing Stage
Al Busaidi
No ratings yet
K.means Clustering
Document8 pages
K.means Clustering
lokeshkumaar3421
No ratings yet
Decision Tree
Document18 pages
Decision Tree
Mo Shah
No ratings yet
Bayesian Dark Knowledge: N N I 1 I I N I I N I 1 I D I N N
Document9 pages
Bayesian Dark Knowledge: N N I 1 I I N I I N I 1 I D I N N
mehdaoui fatima ezzahra
No ratings yet
COMP4702 Notes 2019: Week 2 - Supervised Learning
Document23 pages
COMP4702 Notes 2019: Week 2 - Supervised Learning
Kelbie Davidson
No ratings yet
DSCI 303: Machine Learning For Data Science Fall 2020
Document5 pages
DSCI 303: Machine Learning For Data Science Fall 2020
Anonymous Student
No ratings yet
CSE176 Introduction To Machine Learning
Document3 pages
CSE176 Introduction To Machine Learning
ravigobi
No ratings yet
Nonparametric Shape Priors For Active Contour-Based Image Segmentation
Document4 pages
Nonparametric Shape Priors For Active Contour-Based Image Segmentation
Mathi Rengasamy
No ratings yet
Cerra SCC10 Final
Document6 pages
Cerra SCC10 Final
jayashreerao
No ratings yet
Data Preprocessing
Document39 pages
Data Preprocessing
Debasis Mahapatra
No ratings yet
Unit 2 MM
Document11 pages
Unit 2 MM
justicesavior08
No ratings yet
Hierarchical Multi Escala
Document11 pages
Hierarchical Multi Escala
Fabian Guisao
No ratings yet
A Simplified Generative Model Based On Gradient Descent and Mean Square Error
Document8 pages
A Simplified Generative Model Based On Gradient Descent and Mean Square Error
Omar Lopez-Rincon
No ratings yet
Support Vector Machine Classification For Large Data Sets Via Minimum Enclosing Ball Clustering
Document9 pages
Support Vector Machine Classification For Large Data Sets Via Minimum Enclosing Ball Clustering
viju001
No ratings yet
Week 5 Slides
Document25 pages
Week 5 Slides
Jaival Singh
No ratings yet
Content-Based Image Retrieval Tutorial
Document16 pages
Content-Based Image Retrieval Tutorial
wenhao zhang
No ratings yet
ML - Unit - 2
Document13 pages
ML - Unit - 2
Dr D S Naga Malleswara Rao
No ratings yet
Principle Component Analysis
Document4 pages
Principle Component Analysis
Rutvik
No ratings yet
ISOMAP in ML
Document12 pages
ISOMAP in ML
Vishwa Muthukumar
No ratings yet
Density Estimation is a statistical technique used
Document16 pages
Density Estimation is a statistical technique used
Narayana
No ratings yet
Assignment 3 B
Document7 pages
Assignment 3 B
sahilmukund.awasarkar
No ratings yet
Data Mining: A Preprocessing Engine
Document5 pages
Data Mining: A Preprocessing Engine
Manon573
No ratings yet
Lecture 7 Data Reduction
Document5 pages
Lecture 7 Data Reduction
Ifra Luqman
No ratings yet
Top 10 Machine Learning Algorithms With Their Use
Document12 pages
Top 10 Machine Learning Algorithms With Their Use
irma komariah
No ratings yet
3191 Random Projections For Manifold Learning
Document8 pages
3191 Random Projections For Manifold Learning
s_saraf
No ratings yet
Iterative Methods For Image Deblurring: A Matlab Object-Oriented Approach
Document21 pages
Iterative Methods For Image Deblurring: A Matlab Object-Oriented Approach
Navdeep Goel
No ratings yet
Overviewofthresholding
Document5 pages
Overviewofthresholding
Sanjay Shah
No ratings yet
Co-2 ML 2019
Document71 pages
Co-2 ML 2019
UrsTruly Anirudh
No ratings yet
10 ASAP Advanced Statistics Dimension Reduction
Document8 pages
10 ASAP Advanced Statistics Dimension Reduction
George Mathew
No ratings yet
Re Sizable Arrays
Document12 pages
Re Sizable Arrays
Seraphina Nix
No ratings yet
StatsLecture1 Probability
Document4 pages
StatsLecture1 Probability
choi7
No ratings yet
ISOMAP
Document11 pages
ISOMAP
Vishwa Muthukumar
100% (1)
Error and Uncertainty: General Statistical Principles
Document8 pages
Error and Uncertainty: General Statistical Principles
déborah_rosales
No ratings yet
3.2 Preprocessing
Document10 pages
3.2 Preprocessing
ALNATRON GROUPS
No ratings yet
Ipmv Mod 5&6 (Theory Questions)
Document11 pages
Ipmv Mod 5&6 (Theory Questions)
Ashwin A
No ratings yet
First Report
Document14 pages
First Report
dnthrtm3
No ratings yet
Document
Document6 pages
Document
Shraddha mali
No ratings yet
Line Drawing Algorithm: Mastering Techniques for Precision Image Rendering
From Everand
Line Drawing Algorithm: Mastering Techniques for Precision Image Rendering
Fouad Sabry
No ratings yet
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Divorce Conversation Cards
Document10 pages
Divorce Conversation Cards
Diana Carreira
No ratings yet
Neoden 3V: Desktop Pick and Place Machine With Vision System
Document42 pages
Neoden 3V: Desktop Pick and Place Machine With Vision System
suresh parmar
No ratings yet
Practice Test 5: Answers and Explanations
Document36 pages
Practice Test 5: Answers and Explanations
Thùy Linh
100% (2)
Lab 2
Document7 pages
Lab 2
M80
No ratings yet
Leadership Behaviors and Skills
Document33 pages
Leadership Behaviors and Skills
Nguyễn Việt Hoàng
No ratings yet
Fieldtrip Guidebook PDF
Document62 pages
Fieldtrip Guidebook PDF
Senatorul Melcilor
No ratings yet
Dsi4 3
Document6 pages
Dsi4 3
Teresa Co
No ratings yet
PAULESI, UI, Nigeria. PPG 713 Subsurface Geology, Prospect Generation and Well Proposition
Document2 pages
PAULESI, UI, Nigeria. PPG 713 Subsurface Geology, Prospect Generation and Well Proposition
amna
No ratings yet
12th Mid term Maths Q.P
Document2 pages
12th Mid term Maths Q.P
Anish Baxi
No ratings yet
2dfa Unreach
Document7 pages
2dfa Unreach
Monti bro
No ratings yet
Population Growth and Economic Development in The Philippines: What Has Been The Experience and What Must Be Done?
Document39 pages
Population Growth and Economic Development in The Philippines: What Has Been The Experience and What Must Be Done?
Ray chijioke
No ratings yet
Darling River: History
Document3 pages
Darling River: History
Raj Oberoi
No ratings yet
Naphthenes On Rtx-DHA-100 (ASTM D6729-14) : Conc. Peaks T (Min) (WT.%) Conc. Peaks T (Min) (WT.%)
Document1 page
Naphthenes On Rtx-DHA-100 (ASTM D6729-14) : Conc. Peaks T (Min) (WT.%) Conc. Peaks T (Min) (WT.%)
ridermate
No ratings yet
Science: Quarter 4 - Module 5: Conservation of Mechanical Energy (Activities)
Document17 pages
Science: Quarter 4 - Module 5: Conservation of Mechanical Energy (Activities)
Leo Jude Lopez
0% (1)
Works Cited
Document15 pages
Works Cited
evy
No ratings yet
City OF Glenb Rook: Planning Report
Document4 pages
City OF Glenb Rook: Planning Report
Jacob Sheridan
No ratings yet
Coaxial Cable - Wikipedia
Document15 pages
Coaxial Cable - Wikipedia
AkiHiro San Carcedo
100% (1)
Cambridge International AS & A Level: PHYSICS 9702/35
Document8 pages
Cambridge International AS & A Level: PHYSICS 9702/35
Dummy
No ratings yet
Q1 Pre Calculus Week 2 PDF
Document16 pages
Q1 Pre Calculus Week 2 PDF
Gemini
No ratings yet
Experiment#04 PDF
Document4 pages
Experiment#04 PDF
Ayaz Ahmed
No ratings yet
Full Ebook of Textbook of Tuberculosis and Nontuberculousis Mycobacterial Diseases 3Rd Edition K Surendra Sharma Online PDF All Chapter
Document70 pages
Full Ebook of Textbook of Tuberculosis and Nontuberculousis Mycobacterial Diseases 3Rd Edition K Surendra Sharma Online PDF All Chapter
steventimmerman197670
100% (8)
M2.1 - The Client Support Circle and Design Service
Document8 pages
M2.1 - The Client Support Circle and Design Service
richelle besen
No ratings yet
Nueva Ecija University of Science and Technology: (Formerly Central Luzon Polytechnic College) Cabanatuan City
Document1 page
Nueva Ecija University of Science and Technology: (Formerly Central Luzon Polytechnic College) Cabanatuan City
bktsuna0201
No ratings yet
Pollinator Adaptations: Life Lab "Garden Pollinators" Unit
Document18 pages
Pollinator Adaptations: Life Lab "Garden Pollinators" Unit
Vignesh M
No ratings yet
3a-105168 PRH 35 Ap
Document1 page
3a-105168 PRH 35 Ap
Александр Арасланкин
No ratings yet
1-SABCO-catalog EN 2022-2023 A
Document160 pages
1-SABCO-catalog EN 2022-2023 A
dridarth
No ratings yet
Math Learning Action Cell 2021
Document10 pages
Math Learning Action Cell 2021
Mei MC
100% (1)
Exam Style Answers 14 Asal Physics CB
Document2 pages
Exam Style Answers 14 Asal Physics CB
Anshul Shah
0% (2)
Clearance Analysis and Leakage Flow CFD Model of A Two-Lobe Multi-Recompression Heater
Document11 pages
Clearance Analysis and Leakage Flow CFD Model of A Two-Lobe Multi-Recompression Heater
osamaelnahrawy
No ratings yet