Data Warehousing and Data Mining
Data Warehousing and Data Mining
Data Warehousing and Data Mining
TECHNOLOGY
A
PAPER PRESENTATION ON
DATA WAREHOUSING AND DATA MINING
AT
SUBMITTED BY:
AMOL P. NITAVE ABBAS HASHMI
B.E. (C.S.E)
B.E. (C.S.E)
[email protected]
[email protected]
GUIDED BY:
Prof. R. B. Kulkarni (CSE Dept. WIT, Solapur)
INDEX
1. ABSTRACT
2. DATA WAREHOUSING
Introduction
Characteristics
Life cycle
Architecture
Applications
3. DATA MINING
Introduction
Aim of project
Implementation
Working
Advantages
5. CONCLUSION
6. REFERENCE
DATA WAREHOUSING AND DATA MINING
ABSTRACT:
Fast, accurate and scalable data analysis techniques are needed to extract useful
information from huge pile of data. Data warehouse is a single, integrated source of decision
support information formed by collecting data from multiple sources, internal to the
organization as well as external, and transforming and summarizing this information to enable
improved decision making. Data warehouse is designed for easy access by users to large
amounts of information, and data access is typically supported by specialized analytical tools
and applications. Typical applications include decision support systems and execution
information system.
Data mining is the exploration and analysis of large quantities of data in order to
discover valid, novel, potentially useful, and ultimately understandable patterns in data. It is
“An information extraction activity whose goal is to discover hidden facts contained in
databases”.
Data mining finds patterns and subtle relationships in data and infers rules that allow
the prediction of future results. It produces output values for an assigned set of input values.
Typical applications include market segmentation, customer profiling, fraud detection,
evaluation of retail promotions, and credit risk analysis.
DATA WAREHOUSING
A large amount of the right information is the key to survival in today’s competitive
environment. And this kind of information can be made available only if there’s totally
integrated enterprise data warehouse.
• Lots of PC-based or small server systems obtaining extracts of data incapable of presenting a
holistic view of the entire gamut of information.
• Same data present on different systems, in different department and users may be unaware of
this fact.
• Less analysis by decision makers and policy planners due to non-availability of sophisticated
tools and easily decipherable, timely and comprehensive information
2) REQUIREMENT ANALYSIS
3) DESIGN
5) DEPLOYMENT
Operational
Reporting,
data source1
query,application
High development, and
Query EIS(executive
Meta-data
Operational Lightly summarized data information
Manage
system) tools
data source 2 summarized
Load Manager
Operational data
DBMS OLAP(online
Detailed data
data source n analytical
processing) tools
Operational
Main Components:
• Operational data sourcesàfor the DW is supplied from mainframe operational data held
in first generation hierarchical and network databases, departmental data held in
proprietary file systems, private data held on workstaions and private serves and external
systems such as the Internet, commercially available DB, or DB assoicated with and
organization’s suppliers or customers
• Operational datastore(ODS)àis a repository of current and integrated operational data
used for analysis. It is often structured and supplied with data in the same way as the data
warehouse, but may in fact simply act as a staging area for data to be moved into the
warehouse
• Load manageràalso called the frontend component, it performance all the operations
associated with the extraction and loading of data into the warehouse. These operations
include simple transformations of the data to prepare the data for entry into the warehouse
• Warehouse manageràperforms all the operations associated with the management of the
data in the warehouse. The operations performed by this component include analysis of
data to ensure consistency, transformation and merging of source data, creation of indexes
and views, generation of denormalizations and aggregations, and archiving and backing-up
data
• Query manageràalso called backend component, it performs all the operations
associated with the management of user queries. The operations performed by this
component include directing queries to the appropriate tables and scheduling the execution
of queries
• End-user access toolsàcan be categorized into five main groups: data reporting and
query tools, application development tools, executive information system (EIS) tools,
online analytical processing (OLAP) tools, and data mining tools.
Tools and Technologies:
• After the critical steps, loading the results into target system can be carried out either
by separate products, or by a single, categories:
• Code generators
• Database data replication tools
• Dynamic transformation engine
Applications:
Examples:
They ideally present information in graphical and tabular form, providing the user
with the ability to drill down on selected information. Note the increased detail and
data manipulation options presented.
DATA MINING
Data Mining refers to the process of analyzing the data from different perspectives and
summarizing it into useful information. Data mining software is one of the numbers of tools
used for analyzing data from many different dimensions or angles, categorize it, and
summarize the relationship identified.
Definition:
Data mining is the process of finding correlation or patterns among fields in large
relational databases. “The process of extracting valid, previously unknown, comprehensible,
and actionable information from large databases and using it to make crucial business
decision”
Different Types of Data Mining: Business, Scientific and Internet Data Mining
1. Extract, transform, & load transaction data on to the data warehouse system.
We have created an application which works as a data mining for a website developer.
The project has been implemented successfully on a local server and has given an excellent
feedback.
• Implementation:
The data warehouse that is being used for the project is information gathered by a
survey. The data has been collected to a database. This database is used in the project.
The database contains the information on many websites. This is a huge database. The
database is formed going to the questionnaires that were subtitled by the users of that
websites.
The application we created is a web based one. The application creates particular graph
like, pie chart, line chart or bar graph. These graphs are generated as per the parameters
selected by the website builders. The parameters that are selected would look as the figure
below:
These constraints entered by the user are considered to generate charts. The abstraction
of the data from the database is done in effective manner. The user will just know, for
example, a website builder wants to know where the social networking sites are used
maximum as per the database will look as below:
• Working:
Java Servlet Pages (JSP) is used for the program the application. The database is stored
in the Microsoft Access DB. For implementation purpose a local server of Tomcat 6.0 Server
is used. For generating the charts in JSP, we made use of the JFreeChart package.
The page navigation is considered for taking the inputs. The traversing is as follows:
Index.jsp à ganechhart.jsp
In index.jsp, the parameters are taken from the user. These parameters are posted to the
genechart.jsp file in the server. The SQL queries are fixed to generate the appropriate records.
These records are used to build the charts. Example of the code for SQL in JSP is as follows:
These records after getting formed, an algorithm is used to get the statistics of the
data. This algorithm will give the whole implementation of websites that can be used to
generate the chart. The charts are generated with the following code:
while( rs3.next() )
{
data.setValue(rs3.getString( 1 ), cvi[i++]);
}
final ChartRenderingInfo info = new ChartRenderingInfo(new
StandardEntityCollection());
final File file1 = new File("../piechart3.png");
ChartUtilities.saveChartAsPNG(file1, chart, 600, 400, info);
The chart when generated will be saved as ‘.png’ image file. This is then displayed as
an output to the user.
• Advantages:
The website builder can retrieve the appropriate factors that he wants to know before
creating a site.
A big survey results can be generated within records and a simple understandable chart
is prepared that can be used by the surveyors.
CONCLUSION
Data Warehousing provides the means to change the raw data into information for
making effective business decisions-the emphasis on information, not data. The Data
warehouse is the hub for decision support data.
Data mining is a useful tool with multiple algorithms that can be tuned for specific
tasks. It can benefit business, medicine, and science. It needs more efficient algorithms to
speed up data mining process.
REFERENCE