Lecture 1
Data Warehouse & Data Mining
Introduction
Data: Facts and information about something Warehouse: A location or facility for storing goods and merchandise
Data Warehouse
large electronic repository of information that is generated and updated in a structured manner by an enterprise Aid business intelligence and to support decision making. A relational database designed for query and analysis rather than for transaction processing. Contains historical data derived from tr
Copyright:
Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online from Scribd
Lecture 1
Data Warehouse & Data Mining
Introduction
Data: Facts and information about something Warehouse: A location or facility for storing goods and merchandise
Data Warehouse
large electronic repository of information that is generated and updated in a structured manner by an enterprise Aid business intelligence and to support decision making. A relational database designed for query and analysis rather than for transaction processing. Contains historical data derived from tr
Lecture 1
Data Warehouse & Data Mining
Introduction
Data: Facts and information about something Warehouse: A location or facility for storing goods and merchandise
Data Warehouse
large electronic repository of information that is generated and updated in a structured manner by an enterprise Aid business intelligence and to support decision making. A relational database designed for query and analysis rather than for transaction processing. Contains historical data derived from tr
Copyright:
Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online from Scribd
Lecture 1
Data Warehouse & Data Mining
Introduction
Data: Facts and information about something Warehouse: A location or facility for storing goods and merchandise
Data Warehouse
large electronic repository of information that is generated and updated in a structured manner by an enterprise Aid business intelligence and to support decision making. A relational database designed for query and analysis rather than for transaction processing. Contains historical data derived from tr
Copyright:
Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online from Scribd
Download as ppt, pdf, or txt
You are on page 1of 41
Lecture 1
Data Warehouse & Data Mining
Introduction • Data: Facts and information about something • Warehouse: A location or facility for storing goods and merchandise Data Warehouse • large electronic repository of information that is generated and updated in a structured manner by an enterprise • Aid business intelligence and to support decision making. • A relational database designed for query and analysis rather than for transaction processing. • Contains historical data derived from transaction data Data warehousing • The coordinated, architected, and periodic copying of data from various sources, both inside and outside the enterprise, into an environment optimized for analytical and informational processing. • data is copied (duplicated) in a controlled manner, periodically (batch-oriented processing). Features • It provides – centralization of corporate data assets. – a well-managed environment. – consistent and repeatable processes for loading data from corporate applications. – open and scalable architecture, can handle future expansion. – tools that allow its users to effectively process the data into information without a high degree of technical support. Features • Allowing business leaders to make informed decisions based on previous business data • Enable analytical and informational processing platform • breaks down the barriers created by non- enterprise, process-focused applications • consolidates information into a single view for users to access. Data asset
• In Enterprise Data can be managed in tree
groups • Run-the-business data: – Produced by corporate applications – such as company uses to fill customer orders for its products or uses to manage financial transactions. – The raw materials for a data warehouse. Data asset • Integrate-the-business data: – Built to improve the quality of and synchronize two or more corporate applications – such as a master list of customers. – integrate applications that weren’t designed to work with each other. • Monitor-the-business data: – Presented to end users for reporting and decision support – such as your financial reports. – The data is cleansed to enable users to better understand progress and evaluate cause-and effect. Data asset • A data asset is the result of taking the raw material from the run-the-business data and producing higher-quality-data end products to integrate the business and monitor the business. Guidelines (or principles) For Data Warehouse • Subject Orientation: – Data will be grouped by subject, rather than author, department, or physical location. – So, all manufacturing data goes together, and the sales data, and the promotions data, etc., regardless of where it came from. Guidelines (or principles) For Data Warehouse • Data Integration: • Even though data comes from separate applications, departments, etc., differences should be smoothed out so they have the same look and feel. • Form: When two data elements (e.g., phone numbers) have different layouts (e.g., 123-123-1234 and (123) 123-1234), one layout will be superimposed on both of them. • Function: When two data elements identify the same thing (e.g., a hammer) with two different names (e.g., part 32G and part B49), these two names will be replaced with one name. Guidelines (or principles) For Data Warehouse • Nonvolatility: – Unlike the data in operational applications, which is discarded once the company is finished using it. – the data in a data warehouse will remain in the warehouse. • Time Variant: – All data has a context at a moment in time. – A data warehouse will keep that context. – So, all data from 1995 will retain its context within 1995. Guidelines (or principles) For Data Warehouse • One Version of the Truth: – The proliferation of data in the 1980s and 1990s produce many copies of the same data. – Only the one, true gold, standard copy of each data element would be included in a data warehouse. • Long-Term Investment: – A data warehouse should be flexible enough to absorb changes in the company and the world, and scalable enough to grow with the company. Data Mining Introduction • Data mining the process of applying analytical approaches to large data sets to discover implicit, previously unknown, and potentially useful information. • This process often involves three steps: • data preprocessing, data mining and postprocessing Introduction • The first step is to transform the raw data into a more suitable format for subsequent data mining. • The second step conducts the actual mining while the last one is implemented to validate and interpret the mining results. Purpose • The main purpose of data mining is to extract knowledge from the data at hand, increasing its intrinsic value and making the data useful. Business Problems for Data Mining • Recommendation generation – Generating recommendations is an important business challenge for retailers and service providers. – Customers who are provided appropriate and timely recommendations are likely to be more valuable (because they purchase more) and more loyal (because they feel a stronger relationship to the vendor). – For example, Amazon.com – These recommendations are derived from using data mining to analyze purchase behavior of all of the retailer’s customers, and applying the derived rules to your personal information. Business Problems for Data Mining • Anomaly detection – How do you know whether your data is ‘‘good’’ or not? – Analyze your data and pick out those items that don’t fit with the rest. – Credit card companies use data mining–driven anomaly detection to determine if a particular transaction is valid. – Insurance companies also use anomaly detection to determine if claims are fraudulent. – Because these companies process thousands of claims a day, it is impossible to investigate each case, and data mining can identify which claims are likely to be false. – Anomaly detection can even be used to validate data entry— checking to see if the data entered is correct at the point of entry. Business Problems for Data Mining • Churn analysis – Which customers are most likely to switch to a competitor? – The telecom, banking, and insurance industries face severe competition. – Can help marketing managers identify the customers who are likely to leave and why, and as a result, they can improve customer relations and retain customers. Business Problems for Data Mining • Risk management – Should a loan be approved for a particular customer? – Data mining techniques are used to determine the risk of a loan application, helping the loan officer make appropriate decisions on the cost and validity of each application. Business Problems for Data Mining • Customer segmentation – Customer segmentation determines the behavioral and descriptive profiles for your customers. – These profiles are then used to provide personalized marketing programs and strategies that are appropriate for each group. Business Problems for Data Mining • Targeted ads – Web retailers or portal sites like to personalize their content for their Web customers. – Using navigation or online purchase patterns, these sites can use data mining solutions to display targeted advertisements to their Web navigators. Business Problems for Data Mining • Forecasting – How many Product will you sell next week? – What will the inventory level be in one month? – Data mining forecasting techniques can be used to answer these types of time-related questions. Schema • The word schema comes from the Greek word "σχήμα" (skhēma), which means shape, or more generally, plan. • Schema may refer to: – Model or Diagram – Schematic, a diagram that represents the elements of a system using abstract, graphic symbols Database Schema • The schema (pronounced skee-ma) is structure described in a formal language supported by the database management system (DBMS). • In a relational database, the schema defines the tables, the fields, relationships, views, indexes, packages, procedures, functions, queues, triggers, types, sequences, materialized views, synonyms and other elements. • Schemas are generally stored in a data dictionary. • defined in text database language, • often used to refer to a graphical depiction of the database structure. Logical Schema • A data model of a specific problem domain Without being specific to a particular database management product • it is in terms of either relational tables and columns, • This is as opposed to a physical data model, which describe the particular physical mechanisms used to capture data in a storage medium. Physical Schema • The technical description of a database where all the physical constructs (such as indexes) and parameters (such as page size or buffer management policy) are specified. • The physical schema of a database is the implementation of its logical schema. Entity Relationship Diagrams Entity Relationship Model • In 1976, Chen developed the (ER) model, a high-level data model that is useful in developing a conceptual design for a database • A database modeling method, used to produce a Logical schema or semantic data model of a system Entity • An entity is a real-world item or concept that exists on its own. • For example, student , team, lab section, or experiment is an entity. • The set of all possible values for an entity, such as all possible students, is the entity type. • In an ER model, we diagram an entity type as a rectangle containing the type name Attribute • Each entity has attributes, or particular properties that describe the entity. • For example, student has properties of his own Student Identification number, name, and grade. • A particular value of an attribute, such as 93 for the grade, is a value of the attribute. Attribute • Most of the data in a database consists of values of attributes. • The set of all possible values of an attribute, such as integers from 0 to 100 for a grade, is the attribute domain. • In an ER model, an attribute name appears in an oval that has a line to the corresponding entity box, Attribute Classification • A simple attribute is one component that is atomic. • A composite attribute has multiple components, each of which is atomic or composite. Attribute Classification • an entity attribute that holds exactly one value is a single-valued attribute. • A multi-valued attribute has more than one value for a particular entity. • A derived attribute can be obtained from other attributes or related entities. Attribute Classification • An attribute or set of attributes that uniquely identifies a particular entity is a key. • A composite key is a key that is a composite of several attributes. Relationship
• A relationship type is a set of associations
among entity types. • For example, the student entity type is related to the team entity type because each student is a member of a team • We use a diamond to illustrate the relationship type in an ER diagram Degree of Relation • The degree of a relationship type is the number of entity types that participate. • Binary • Ternary • Unary • role name that indicates the purpose of an entity in a relationship Degree of Relation • a relationship type can also have attributes Relationship Cardinality • One-to-One Relationships • One-to-Many Relationships • Many-to-Many Relationships