Describe The Data Processing Chain: Business Understanding

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Describe the business intelligence and data mining cycle.

Any business organization needs to continuously monitor its business environment and its performance
and then rapidly adjust its future. The organizers need to develop a balanced scorecard to track its own
health and vitality.

Business intelligence is a broad set of information technology solutions that includes tools for gathering
analyzing and reporting information to the users about performance of the organization and its
environment. These IT solutions are among the most highly prioritized solutions for investment.

It is all about analyzing the data that could help identify fast-selling items, regional selling items,
seasonal selling items, fast growing customer segments and so on it might also help generate ideas
about what products sell it together, which people tend to buy and so on.

These insights and intelligence can help design better promotion plans product bundles and store
layouts which in turn led to a better performing business.

Data Mining Cycle

Business
Understanding

Data
Deployment
understanding

Data
Evaluation
preparation

Modeling

The data mining life cycle is the arrangement of stages that a specific unit of information goes through
from its starting era or capture to its possible documented and/or cancellation at the conclusion of its
valuable life.

Describe the data processing chain

Data Visualization

Data Database Datawarehouse Data Mining

Data is a natural resource. There is a sequence of steps to be followed to benefit from the data in a systematic way. Data can be
modeled and stored in a database. Relevant data can be extraced from the operational data stores according to certain
repoarting and analyzing purposes and stored in a data warehouse. The data from the warehouse can be combined with other
sources of data and mined using data mining techniques to generate new insights. This is a data processing chain.
1) Data
Anything that is recorded is data. Data could come from any number of sources. It could come from operational
records inside an organization, and it can come from records complied by the industry bodies and government
agencies, from people interaction in social contexts. It can either be paper reports or file stored in computer.
There is also data about data that is called metadata.
Data can be of different types:
- Unordered values
- Ordered values like small, medium and large
- Discrete numeric values defined in a certain range
- Binary Large Objects (BLOBs) data

Datafication is a new term that means that almost every phenomenon is now being observed and stored.

2) Database
A database is a modeled collection of data that is accessible in many ways. A data model can be designed to integrate
the operational data of the organization. Most database today follow the relational data model and its variants. Each
data modeling technique imposes rigorous rule and constraints to ensure the integrity and consistency of data
overtime. Many database management software systems (DBMSs) are available to help store and manage this data.
These include commercial systems such as Oracle and DB2 system. There is also open source, free DBMS, such as
MySQL and Postgres. These DBMSs help process and store millions of transactions worth of data every second.

3) Datawarehouse
A data warehouse is an organized store of data from all over the organization, specially designed to help make
management decisions. Data can be extracted from operational database to answer a particular set of queries. This
data, combined with other data can be rolled up to a consistent granularity and uploaded to separate data store
called the data warehouse. Therefore, the data warehouse is a simple version of the operational data base, with the
purpose of addressing reporting and decision-making needs only. The data in the warehouse cumulatively grow as
more operational data becomes available and is extracted and appended to the data warehouse. Unlike operational
databases the data values in the warehouse are not updated.

4) Data Mining
Data Mining is the art and science of discovering useful innovative patterns from data. There is a wide variety of
patterns that can be found in the data. There are many techniques, simple or complex that help with finding patterns.
Data can be analyzed at multiple levels of granularity and could lead to a large number of interesting combinations of
data and interesting patterns. Some of the patterns may be more meaningful than the others.
Some of the data mining techniques are decision trees, Regression, Cluster analysis and association rule mining.

5) Data Visualization
As data insights grew in numbers a new requirement is the ability of the executives and decision makers to absorb
this as information in real time. there is a limit to human comprehension and visualization. that is a good reason to
prioritize and manage with fewer but key variables that relate directly to the key result area of a role. Data
visualization has been an interesting problem across the disciplines many dimensions of data can be effectively
displayed on two-dimensional surface to give a rich and more insightful description of the totality of the story.

What is a dashboard? How does it help?


Dashboards are designed to provide information on select few variables for every executive they use graphs dials and list to
show the status of important parameters these dashboards have a drilldown capability to enable a root cause analysis of
exceptional situations. there are three types of dashboards that are operational dashboards analytical dashboards and
strategic dashboards.

Dashboards take all your data and turn it into an easy to understand snapshot of what's going on you can usually tailor them to
the most important matters of your business meaning you only see the performance indicators 11 to you there is also an option
to add features like pie chart graphs interactive maps but using professional dashboard services to help you with this can cut a
lot of manual work when it's up and running your dashboard will act as your virtual assistant that eliminates tedious task and
only ever focuses on your ROI. Dashboards can also be used to tell a story with data the full back to your initial performance set
a realistic plan of action and look at month on month trends when you have made some changes it's awareness like this that
can help transform your business.

Comparing database systems with data warehousing systems


Function Database Datawarehouse

purpose Data stored in databases can be used for whereas data in data warehouse is
many purposes including day-to-day cleansed data which is useful for reporting
operations and analysis
Granularity high granular data including all activity low granularity data rolled up to certain
and transaction details key dimensions of interest

complexity high complex with dozens and hundreds typically organized around large fact data
of data files linked through common data tables and many look up tables
fields

size database grows with growing volumes of grow as data from operational databases
activity and transaction old completely rolled up and appended everyday data is
transaction is deleted to reduce the size retained for long term trend analysis
architectural relational and object oriented databases star schema and snowflake schema
choices

data access primarily through high level languages accessed through SQL output is forwarded
mechanism such as SQL traditional programming to reporting tools and data visualization
access database through open database tools
connectivity interfaces

What are the different data mining techniques?

There are different data mining techniques

Decision Tree: they help classify population into classes it is said that 70% of all data mining work is
about classification solutions and that 70% of all classification works using decision tree thus decision
tree are the most popular and important data mining techniques there are many popular algorithms to
make decision tree they differ in terms of their mechanism and each technique work well for different
situations it is possible to try multiple algorithms on a data set and compare the predictive accuracy of
each tree.

Regression: this is a well understood technique from the field of statistics the goal is to find best fitting
curve through many data points best fitting curve is that which minimizes the distance between actual
data points and the values predicted by the curve that is the errors regression models can be projected
into the future for production for production for production production and forecasting purposes.

Artificial Neural Networks: originating in the field of artificial intelligence and machine learning artificial
neural networks are multiple layer non-linear information processing models that learn from past data
and predict future values these models predict well leading to their popularity the models parameters
may not be very intuitive thus neural networks are opaque like a black box these systems also require
large amount of past data to adequately train the system.

Clustered analysis: this is an important data mining technique for dividing and conquering large datasets
the data set is divided into certain number of clusters by discerning similarities and dissimilarities within
the data there is no one right answer for the number of clusters in their data the user needs to make a
decision by looking at how will the number of clusters chosen fit the data this is most commonly used
for market segmentation unlike decision tree and recreation there is no one right answer for cluster
analysis.

Association rule mining: also called as Market Basket analysis this is used in the retail industry this
technique looks for data and value and analysis of items frequently found together in a Market Basket
can help cross sell products and create product bundles.

Pattern Recognition.

A pattern is a design or a model that helps grasp something.

patterns help connect things that may not appear to be connected.

patterns help cut through complexity and reveal simpler understandable trends

a perfect pattern or a model is one that accurately describes the situation, is broadly applicable and can
be described in a simple manner.

pattern can be temporal which something that regularly occurs over a time.

pattern recognition analysis incoming data and tries to identify patterns,

explorative pattern recognition aims to identify data patterns

general descriptive pattern recognition starts categorizing and detect the patterns.

hence the pattern recognition deals with both of the scenarios and different pattern recognition
methods are applied depending on the use of the case and form of data

consequently, pattern recognition is not one technique but rather a broad collection of often loosely
related knowledge and techniques.

pattern recognition capability is often applied for intelligence system. a data inputs for pattern
recognition can be word text images or audio files hence pattern recognition is broader compared to
computer vision that focuses on image recognition. automatic and machine-based recognition
description classification and grouping of patterns are important problems in variety of engineering and
scientific disciplines including biology psychology medicine, marketing etc.

Given a pattern its recognition and classification consist of one of the following two tasks supervised
classification that identifies the input pattern as a member of predefined class and unsupervised
classification assigns the input pattern to earlier undefined class.

You might also like