MDA - 1.module 1 - BI Introduction - Data Prep
MDA - 1.module 1 - BI Introduction - Data Prep
MDA - 1.module 1 - BI Introduction - Data Prep
The content contained in this module is provided only for educational of Mastering Data Analytics'
training courses. You may not copy, reproduce, distribute, publish, display, perform, modify,
create derivative works, transmit, or in any way exploit any such content, nor may you distribute
any part of this content over any network, including a local area network, sell or offer it for sale, or
use such content to construct any kind of database.
For permission to use the content, please contact via email: [email protected]
1 Business Intelligence Fundamental
2 Business Statistics
3 Descriptive Analytics
4 Diagnostics Analytics
5 Data Visualization
01 02 03 04
BI Terminology Business Intelligence in Corporates Data Preparation
1. Self-service BI & Analytics 1. Power Query Overview
1. BI vs BA 2. Analytics structure and Coordination Model 2. Get Data
2. Technology in BI 3. BI Success (Organization)
3. PQ – Basic Transform Data
4. BI Success (Individual)
3. Data, Analysis, Analytics 4. Profiling Data
5. The Evolution of Business Intelligence
4. Understanding Data 6. How can data analytics help organizations? 5. Data Issues
7. BI process 4a. Bad Shape + Dirty Data
8. Decision making with BI 4b. Missing Data + Outliers
9. Decision Bias
5. Combine Data from Folder
6. Blending Data
7. Checklist
Business Intelligence Terminology
Business Intelligence & Business Analytics
Business Intelligence Terminology
Business Intelligence & Business Analytics
Business Intelligence Terminology
Technology in Business Intelligence
Business Intelligence Terminology
Technology in Business Intelligence
Business Intelligence Terminology
Data, Analysis, Analytics
Source: www.getsmarter.com
Business Intelligence Terminology
Data, Analysis, Analytics
5 months ago, Bank ABC decreased totally Top 4 reasons due to Attrition in Bank:
10.200 bio. VND of Loan portfolio in Attrition (1) Dissatisfaction about services (50%)
(2) Lower rate in another banks (30%)
(Ending Loan portfolio = Beginning Loan + (3) Change another loan package in the bank(10%)
(4) Death (10%)
New loan – Attrition - Maturity)
Business Intelligence Terminology
Understanding Data – Categories of Data
Business Intelligence Terminology
Understanding Data – Structures of Data
Level of Measurement
Categorical
Numerical
Business Intelligence Terminology
Understanding Data – Data Sources
String data can be Numeric data are Date/time contains a The Boolean type is Images
declared in a number numbers which can specific date, or a sometimes also called Maps
of different ways be whole numbers, combination of both a logical type and is a Report objects
depending on the such as Integers or date and time conditional flag Sound
character set required numbers with decimal representing either
and the anticipated places true or false
length of the string: Byte
any kind of Integer
characters, Fixed Decimal
alphanumeric, Float
including symbols. Double
Business Intelligence Terminology
Understanding Data – Data Types Exercise
Quiz:
Employee, Address, City,
ZIP Code, Distance,
Telecommuter
Data Types ?
Business Intelligence Terminology
Understanding Data – Data Types vs Data Format
4 Type of Analytics
Data Source
Data Face
01 02 03
BI Terminology Business Intelligence in Corporates Data Preparation
1. Self-service BI & Analytics 1. Power Query Overview
1. BI vs BA 2. Analytics structure and Coordination Model 2. Get Data
2. Technology in BI 3. BI Success (Organization)
3. PQ – Basic Transform Data
4. BI Success (Individual)
3. Data, Analysis, Analytics 4. Profiling Data
5. The Evolution of Business Intelligence
4. Understanding Data 6. How can data analytics help organizations? 5. Data Issues
7. BI process 4a. Bad Shape + Dirty Data
8. Decision making with BI 4b. Missing Data + Outliers
9. Decision Bias
5. Combine Data from Folder
6. Blending Data
7. Checklist
Business Intelligence in Corporates
Self-service BI & Analytics
ORGANIZATIONS EXIST TO
CREATE VALUE
ORGANIZATIONS HAVE TO BE
Understanding
Agile BI
and turning it into action in order to
achieve a desired business outcome. Agile BI (speed-to-value)
Now, more than ever, business leaders
❖ Data must be relevant need access to the right information at
❖ Information must be meaningful the right time in order to act before
❖ Insight must be actionable decision windows close.
Describing
Defining
Analytical Organizations
How centralized or decentralized should these
organizations be?
Functions:
– Reporting
– Ad-hoc Analytics
Shading shows where
– Modeling analytics are
executed
Roles:
– Database Analysts Centralized
Functions
– Data Analysts
– Modelers Collaboration
– Data Scientists
– Etc.
Business Intelligence in Corporates
2. Analytics structure and Coordination Model
Analytical Organizations –
In a Centralized model, a set of analytical
Centralized activities are accomplished through a
central clearinghouse
Example: An enterprise analytics team
serves the needs of marketing, finance,
operations, customer care, etc. with
respect to reporting, ad-hoc analysis, and
statistical modeling
Key Advantages: Key Disadvantages
– Consistency – Responsiveness
– Optimal management of – Lack of context / expertise
bandwidth & focus on can limit effectiveness in more
enterprise priorities complex tasks
– Maximum efficiency in low- – Requires large group and
level tasks consistent overall loading
Business Intelligence in Corporates
2. Analytics structure and Coordination Model
Key Disadvantages
Key Advantages:
– Lack of consistency in methods and
– Extremely responsive
sources
– High degree of context and
– Effort & data duplication very likely
expertise attained
– Requires largest overall resources, and
– Efficient localized used of
can be expensive, esp. when
contracting resources
contractors used
Business Intelligence in Corporates
2. Analytics structure and Coordination Model
Board of director
Planning &
Marketing Analytics Strategy Financial
Analytics Quality Control Sale analytics R&D analytics IT analytics
analytics Manager analytics analytics
Goal analytics
Expanded Marketing
QC Analyst Business Planning Financial
Team Sales Analyst R&D Specialist IT Specialist
Analyst Analyst specialist Analyst
Advanced
Analytics Result Analytis Analytics Result
Specialist
Data engineer
Core Team
Insights Data
Focus
(Declutter) Storytelling
Business Intelligence in Corporates
7. BI process
Business Intelligence in Corporates
8. Decision Making with BI
Business Intelligence in Corporates
9. Decision Making Bias
Confirmation bias is our tendency to search for and favor all information that confirms our
beliefs while ignoring or devaluing information that contradicts our beliefs
Business Intelligence in Corporates
9. Decision Making Bias
BI Self-Services in Corporates
Structure of BI Team
01 02 03
BI Terminology Business Intelligence in Corporates Data Preparation
1. Self-service BI & Analytics 1. Power Query Overview
1. BI vs BA 2. Analytics structure and Coordination Model 2. Get Data
2. Technology in BI 3. BI Success (Organization)
3. PQ – Basic Transform Data
4. BI Success (Individual)
3. Data, Analysis, Analytics 4. Profiling Data
5. The Evolution of Business Intelligence
4. Understanding Data 6. How can data analytics help organizations? 5. Data Issues
7. BI process 4a. Bad Shape + Dirty Data
8. Decision making with BI 4b. Missing Data + Outliers
9. Decision Bias
5. Combine Data from Folder
6. Blending Data
7. Checklist
DATA PREPARATION
Power Query Overview - Self-Service BI
DATA PREPARATION
Power Query Overview
Power Query is an ETL tool. ETL stands for Extract, Transform and Load.
•Extract – Data can be extracted from a variety of sources: Databases, CSV files, Text files, Excel, Website
and even PDF.
•Transform – After the data has been extracted, it can be cleaned up (i.e., remove spaces, split columns,
change date formats, fill blanks, find and replace etc) and reshaped (i.e., unpivot, remove columns
etc). When data is extracted from different sources it is unlikely to be consistent, the transform process is
used to make it ready for use.
•Load – Once the data has been extracted and transformed, it needs to be put somewhere so that you
can use it. From an Excel perspective, it can be pushed into a worksheet, a data model, or another query.
To summarize, Power Query takes data from different sources and turns it into something which can be
used. As a tool, this is pretty useful already. But here is the best part. Once the ETL process has been
created, it can be run over and over again with a single click. Which can save hours of work every week.
DATA PREPARATION
Get Data – Data Sources
3 BASIC STEPS
In Power Query, you can split a column through different methods. In this case, the column(s)
selected can be split by a delimiter
DATA PREPARATION - Basic Transform Data
Replace
DATA PREPARATION - Basic Transform Data
Trim & Clean
Trim - Remove leading and trailing whitespaces from each cell in the selected columns!
Clean - Remove non-printable characters in the selected columns!
DATA PREPARATION - Basic Transform Data
Filter Table
DATA PREPARATION - Basic Transform Data
Filter
DATA PREPARATION - Basic Transform Data
Advanced Filter
DATA PREPARATION - Basic Transform Data
Advanced Filter
DATA PREPARATION - Basic Transform Data
Add Column from Example
DATA PREPARATION - Basic Transform Data
Add Custom Column
DATA PREPARATION – Basic Transform Data
Writing Power Query Functions
DATA PREPARATION – Basic Transform Data
M-code
DATA PREPARATION – Basic Transform Data
Writing Power Query Functions
DATA PREPARATION – Profile Data
Data profiling
DATA PREPARATION – Profile Data
Data profiling
DATA PREPARATION – Profile Data
Query Dependencies
DATA PREPARATION – Data Issue
Cleaning Data
DATA PREPARATION – Data Issue
Bad
Shape
Dirty Missing
Data Data Data
Outliers
DATA PREPARATION – Data Issue
Data Shape Formatting
1. Transpose Table
2. Cross Tabulation
Pivot + Unpivot
3. Aggregation (Group by)
DATA PREPARATION – Data Issue
Data Shape Formatting - Transpose Table
TRANSPOSE TABLE
DATA PREPARATION – Data Issue
Data Shape Formatting – Cross Tabulation
Data Aggregation
Aggregate by Month
DATA PREPARATION – Data Issue
Data Aggregation
Data Aggregation
Aggregate by Quarter
DATA PREPARATION – Data Issue
Dirty Data
Dirty Data contains some kind of errors in them, or in a format that’s unfriendly or unusable
DATA PREPARATION – Data Issue
Dirty Data
Extra characters can be currency symbols, number signs… We’d need to remove these before
changing between field types
DATA PREPARATION – Data Issue
Dirty Data
DATA PREPARATION – Data Issue
Dirty Data
No: Yes:
Real Data
Downward BIAS
DATA PREPARATION – Data Issue
Missing Data
SOLUTIONS
1. Deleting Missing Data
2. Imputation
DATA PREPARATION – Data Issue
Missing Data
Imputation
In statistics, Imputation is the process of
substituting values in the data where the
value are missing (we impute values, we
are making them up). We are creating
fake data in order to develop a model
that makes sense and is as close to
reality as we can get it
DATA PREPARATION – Data Issue
Missing Data – Select the method
Identifying outliers in the data helps us understand how vulnerable our model would be to a small
set of observations.
DATA PREPARATION – Data Issue
Outliers
Union allows you to take multiple datasets and deal with them as one
DATA PREPARATION - Data Blending
Join / Merge Query
DATA PREPARATION - Data Blending
Union Query
DATA PREPARATION - Data Blending
Merge Query
DATA PREPARATION - Data Blending
Fuzzy Matching
The match threshold is the minimum score achieved by the fuzzy matching for
it to be considered to be a match
DATA PREPARATION - Data Blending
Fuzzy Matching - Example
DATA PREPARATION - Data Blending
Spatial Matching
There aren’t fields that can be Gray area: How many customers fall
used to join them together within a store trade area is to match
them and assign a store number to them
DATA PREPARATION - Data Blending
Spatial Matching - Example
Customer Information
Spatial Data
DATA PREPARATION – Transform Data
Why Combine Queries
DATA PREPARATION – Check List
Know Role of Power Query & How to get data in Power Query?