9781849685481-Chapter-12 Enhancing The Data Sample Chapter

Download as pdf or txt
Download as pdf or txt
You are on page 1of 25

Business Intelligence Cookbook: A Project Lifecycle Approach Using Oracle Technology

John Heaton

Chapter No. 12 "Enhancing the Data"

In this package, you will find:


A Biography of the author of the book A preview chapter from the book, Chapter NO.12 "Enhancing the Data" A synopsis of the books content Information on where to buy this book

About the Author


John Heaton graduated top of his class with a Diploma in Information Technology from Technikon Witwatersrand in South Africa (equivalent to a Bachelors degree in Computer Science), and worked for more than 10 years with Oracle Corporation, including as a Practice Manager. John had been co-running the North Business Intelligence and Warehouse Consulting practice, delivering business intelligence solutions to Fortune 500 clients. During this time, he steadily added business skills and business training to his technical background. In 2005, John decided to leave Oracle and become a founding member in a small business, iSeerix. This allowed John to focus on strategic partnerships with clients to design and build business intelligence and data warehouse solutions. John's strengths include the ability to communicate the benefits of introducing a business intelligence solution into a client's architecture. He has consistently become a trusted advisor to his clients. John's philosophy is based on responsibility and mutual respect. He relies on the unique abilities of individuals to ensure success in different areas, and strives to foster a teamed environment of creativity and achievement. Today, John specializes as a Solution/Technical Architect, assisting customers in designing large, complex data warehouses. Through his years, John has worked in numerous industries with differing technologies. This broad experience base allows John to bring a unique perspective and understanding when designing and developing a data warehouse. His strong business background, coupled with technical expertise, and his certification in Project Management, make John a valued asset to any data warehouse project.

For More Information: www.packtpub.com/oracle-database-11g-data-warehousing-businessintelligence-solutions-cookbook/book

Business Intelligence Cookbook: A Project Lifecycle Approach Using Oracle Technology


Business intelligence and data warehousing projects can be challenging and complex. Dealing with new technologies, processes, and different stakeholders presents an array of potential problems. To aid the project manager, there are recipes about project definition, scope, control, and risk management. Requirements, design, data analysis, security, and data enhancing will help in guiding the technical project member. The Business Intelligence Cookbook: A Project Lifecycle Approach Using Oracle Technology book offers insight and real-world experience to assist you through the business intelligence and data warehouse lifecycle. Recipes from the first six chapters of this book, focus more on processes and practices to aid with the definition and management of the project. From Chapter 7, Architecture and Design onwards, this book provides more technical recipes for the business intelligence and data warehousing project.

What This Book Covers


Chapter 1, Defining a Program, assesses your current project delivery methodology to identify areas that may need enhancing to support your business intelligence initiative. Chapter 2, Establishing the Project, reviews and enhances the project delivery phases in order to define a consistent set of work practices for the delivery of a successful project. Chapter 3, Controlling the Project, focuses on communication and control, essential to a business intelligence project. Developing efficient and effective ways to do this is the key aim of this chapter. Chapter 4, Wrapping Up the Project, focuses on business intelligence projects that continue for numerous iterations, understanding the information that needs to flow from project to project. Setting up ways to hand over that information is key to the long term success of the solution. Chapter 5, The Blueprint, journeys a roadmap needed to guide one from the start to the destination, for a business intelligence and data warehouse solution. Chapter 6, Analyzing the Requirements, talks of succinctly capturing and understanding the requirements of a project. Keeping requirements simple and providing transparency is key to demystifying the project for stakeholders.

For More Information: www.packtpub.com/oracle-database-11g-data-warehousing-businessintelligence-solutions-cookbook/book

Chapter 7, Architecture and Design, focuses on creating a successful foundation to interactively build your solution, which can save large amounts of time and money. Getting the basics right is the topic of this chapter. Chapter 8, Analyzing the Sources, talks about identifying the right source with the most correct information, which is essential to the success of the project. Gaining a deeper understanding of your source systems will enable you to make intelligent decisions in determining which system contains the most accurate information for the subject area. Chapter 9, Analyzing the Data, talks about how data profiling or data discovery can uncover a wealth of information. Identifying efficient ways and methods to interrogate information will unlock some of this wealth. Chapter 10, Constructing the Data Model, talks about the Data Model, which is the key asset of the project. Understanding how to effectively design and develop this model enables organizations to reuse this asset many times. Chapter 11, Defining the ETL/ELT, focuses on building an efficient framework and extraction, transformation, and loading routines, which leads to a simpler and easier-tomanage solution. Chapter 12, Enhancing the Data, provides information about the data gaps normally existing within organizations. Once identified, effective means to capture and contribute information into the solution are required. Chapter 13, Optimizing the Access, gives an insight into understanding the key technological capabilities within your reporting tool, allowing you to deliver information to your stakeholders in a meaningful and accurate way. Chapter 14, Security, provides information on business intelligence and data warehouse solution security. This chapter focuses on showing you how to integrate common industrial security technology and requirements into your solution.

For More Information: www.packtpub.com/oracle-database-11g-data-warehousing-businessintelligence-solutions-cookbook/book

Enhancing the Data


When building data warehouse/business intelligence solutions, it is normal for data gaps to be present. These are typically key pieces of information required to match information or perform some analysis, or could be as simple as updating descriptive information. Enhancing the source application is often not an option due to cost, and/or limitations in the application. The data gaps are normally xed through manual processes that involve submitting requests for data modications within the data warehouse/business intelligence environment, or building an application to enable users to manage their own data gaps. The following recipes will help you automate the process of enhancing the data within the data warehouse:

12

Creating your application schema Creating your application tables Developing the journal tables to track changes Dening the audit triggers Dening the APEX Upload application Creating the Upload interface

Introduction
In a business intelligence and data warehousing solution, there is nearly always a need to enhance or add information to the solution. For this reason, a simple but functional user interface is always needed to support the requirements. Oracle Application Express (APEX) is a robust, web-enabled design, development, and run environment, used to build custom applications within an Oracle database. APEX is shipped as part of an Oracle database, and is currently free. The tight integration with the Oracle database, ease of use, and cost make APEX ideal to build an automated solution to ll the data gaps.

For More Information: www.packtpub.com/oracle-database-11g-data-warehousing-businessintelligence-solutions-cookbook/book

Enhancing the Data

Creating your application schema


The APEX application will connect to a database schema. It is recommended that all database objects be placed into their own separate schema. By separating these objects, you ensure that the data cannot be compromised in other areas within the data warehouse.

Getting ready
With a separate schema for the APEX application, it segregates the objects from the source table information stored within the staging schemas. This segregation makes for easier maintenance and security of the tables.

How to do it...
When building an APEX application, you rst need to select the schema to which you want to connect. If there are additional tables in this schema, it would be possible for a screen to be built directly on the data warehouse tables. 1. Firstly, we will create a database user, app, to be the schema owner:
create user app identified by app default tablespace users temporary tablespace temp;

2. Assign roles and privileges to the schema owner:


-- Default roles for demo grant connect, resource to app; -- Included to enable the use of autotrace GRANT select_catalog_role TO app; -- Included to enable the use of autotrace GRANT SELECT ANY dictionary TO app;

How it works..
Creating a separate schema for your application allows you to segregate the information from your business intelligence and data warehousing solution. This allows you to manage and maintain the applications separately and independently from the solution.

262

For More Information: www.packtpub.com/oracle-database-11g-data-warehousing-businessintelligence-solutions-cookbook/book

Chapter 12

Creating your application tables


Application tables will contain the information that is required for the data warehouse solution, but cannot be obtained from any existing source system. The APEX application will provide a means to capture this information and store it within the application tables.

Getting ready
Log in to your database schema created in the previous recipe.

How to do it...
The following steps will create some sample tables which will be used by the APEX application for the upload screen: 1. The table we will use is a simple customer table, dened as follows:
CREATE TABLE "APP_CUSTOMER" ( "CUST_SEQ" NUMBER NOT NULL ENABLE, "CUST_NUM" VARCHAR2(50 BYTE) NOT NULL ENABLE, "CUST_NAME" VARCHAR2(50 BYTE), "CUST_ADDRESS1" VARCHAR2(50 BYTE), "CUST_ADDRESS2" VARCHAR2(50 BYTE), "CUST_CITY" VARCHAR2(50 BYTE), "CUST_PCODE_ZIP" VARCHAR2(20 BYTE), "CUST_STATE" VARCHAR2(50 BYTE), "CUST_COUNTRY" VARCHAR2(50 BYTE), "REGION" VARCHAR2(50 BYTE), "CREATE_DATE" DATE, "CREATE_BY" VARCHAR2(50 BYTE), "UPDATE_DATE" DATE, "UPDATE_BY" VARCHAR2(50 BYTE), "APPROVED_DATE" "APPROVED_BY" DATE, VARCHAR2(50 BYTE),

CONSTRAINT "CUSTOMER_PK" PRIMARY KEY ("CUST_SEQ") USING INDEX );

263

For More Information: www.packtpub.com/oracle-database-11g-data-warehousing-businessintelligence-solutions-cookbook/book

Enhancing the Data 2. Create a sequence to support an automated primary key:


CREATE SEQUENCE CUSTOMER_SEQ START WITH 1 INCREMENT BY 1 NOCACHE NOCYCLE;

How it works...
By adding application tables, you are creating a source for the business intelligence and data warehousing solution. These tables will support the application, which will enhance your data.

Developing the journal tables to track changes


The application tables are becoming a source of information for the data warehouse solution. Changes against these tables should be recorded so that a complete audit log can be constructed.

Getting ready
In order to track changes, a journal table is needed. This table is identical to the denition of the original table, but has a few additional columns to record information on actions, the date, and the user who made the change.

How to do it...
Journal tables are useful for auditing the changes that are made to information, and can be used to roll back information, should it be required. The journal table we will use is a track change in the customer table, as shown in the following code snippet:
CREATE TABLE "APP_CUSTOMER_JRNL" ( "CUST_SEQ" NUMBER NOT NULL ENABLE, "CUST_NUM" VARCHAR2(50 BYTE) NOT NULL ENABLE, "CUST_NAME" VARCHAR2(50 BYTE), "CUST_ADDRESS1" VARCHAR2(50 BYTE), "CUST_ADDRESS2" VARCHAR2(50 BYTE), "CUST_CITY" VARCHAR2(50 BYTE), "CUST_PCODE_ZIP" VARCHAR2(20 BYTE),
264

For More Information: www.packtpub.com/oracle-database-11g-data-warehousing-businessintelligence-solutions-cookbook/book

Chapter 12 "CUST_STATE" VARCHAR2(50 BYTE), "CUST_COUNTRY" VARCHAR2(50 BYTE), "REGION" VARCHAR2(50 BYTE), "CREATE_DATE" DATE, "CREATE_BY" VARCHAR2(50 BYTE), "UPDATE_DATE" DATE, "UPDATE_BY" VARCHAR2(50 BYTE), "JRNL_DATE" DATE, "JRNL_ACTION" VARCHAR2(50 BYTE), "JRNL_BY" VARCHAR2(50 BYTE), "APPROVED_DATE" DATE, "APPROVED_BY" VARCHAR2(50 BYTE) PRIMARY KEY ("CUST_SEQ","JRNL_DATE") USING INDEX );

How it works..
Journal tables allow you to track changes within the application. These journals are required to identify who made changes when. This allows for full auditing of all changes, and the capability to roll back the changes.

Dening the audit triggers


Changes to the application tables should be recorded in real time, in an automated manner. Oracle database triggers will be used to accomplish this.

Getting ready
Before creating the triggers to track the changes, rst ensure that you have created the application table.

How to do it..
The triggers will track any changes to the data in the APP_CUSTOMER table and transfer the information to the journal table: 1. Delete Trigger: This trigger will track all deletes and place the deleted record into the journal table:
CREATE OR REPLACE TRIGGER "TRI_APP_CUSTOMER_DEL" BEFORE DELETE ON APP_CUSTOMER REFERENCING NEW AS New OLD AS Old FOR EACH ROW BEGIN
265

For More Information: www.packtpub.com/oracle-database-11g-data-warehousing-businessintelligence-solutions-cookbook/book

Enhancing the Data


INSERT INTO APP_CUSTOMER_JRNL (APPROVED_BY, APPROVED_DATE, JRNL_ACTION, JRNL_DATE, JRNL_BY, CUST_SEQ, CUST_NUM, CUST_NAME, CUST_ADDRESS1, CUST_ADDRESS2 , CUST_CITY, CUST_PCODE_ZIP, CUST_STATE, CUST_COUNTRY, REGION, CREATE_DATE, CREATE_BY, UPDATE_DATE, UPDATE_BY ) VALUES (:old.APPROVED_BY, :old.APPROVED_DATE, 'Delete', SYSDATE, upper(nvl(apex_custom_auth.get_username,user)), :old.CUST_SEQ, :old.CUST_NUM, :old.CUST_NAME, :old.CUST_ADDRESS1, :old.CUST_ADDRESS2 , :old.CUST_CITY, :old.CUST_PCODE_ZIP, :old.CUST_STATE, :old.CUST_COUNTRY, :old.REGION, :old.CREATE_DATE, :old.CREATE_BY, :old.UPDATE_DATE, :old.UPDATE_BY); EXCEPTION WHEN OTHERS THEN dbms_output.put_line('Error code:' ||sqlcode); dbms_output.put_line('Error msg:' ||sqlerrm); END; ALTER TRIGGER "TRI_APP_CUSTOMER_DEL" ENABLE;
266

For More Information: www.packtpub.com/oracle-database-11g-data-warehousing-businessintelligence-solutions-cookbook/book

Chapter 12

2. Update Trigger: This trigger will track all changes and place the old record into the journal table. It will also check to see if the current user can approve the information; if not, it will set the approved attributes to null:
CREATE OR REPLACE TRIGGER "TRI_APP_CUSTOMER_UPD" BEFORE UPDATE ON APP_CUSTOMER REFERENCING NEW AS New OLD AS Old FOR EACH ROW BEGIN DECLARE v_APPROVER APP_VAL_APPR.USERNAME%TYPE; CURSOR c_Get_Approver IS SELECT DISTINCT USERNAME FROM APP_VAL_APPR WHERE TABLE_NAME = 'APP_CUSTOMER' AND USERNAME = upper(nvl(apex_custom_auth.get_username,user)) AND APPROVE_FLAG = 'Y'; BEGIN INSERT INTO APP_CUSTOMER_JRNL (APPROVED_BY, APPROVED_DATE, JRNL_ACTION, JRNL_DATE, JRNL_BY, CUST_SEQ, CUST_NUM, CUST_NAME, CUST_ADDRESS1, CUST_ADDRESS2 , CUST_CITY, CUST_PCODE_ZIP, CUST_STATE, CUST_COUNTRY, REGION, CREATE_DATE, CREATE_BY, UPDATE_DATE, UPDATE_BY ) VALUES (:old.APPROVED_BY, :old.APPROVED_DATE, 'Update', SYSDATE, upper(nvl(apex_custom_auth.get_username,user)), :old.CUST_SEQ,
267

For More Information: www.packtpub.com/oracle-database-11g-data-warehousing-businessintelligence-solutions-cookbook/book

Enhancing the Data


:old.CUST_NUM, :old.CUST_NAME, :old.CUST_ADDRESS1, :old.CUST_ADDRESS2 , :old.CUST_CITY, :old.CUST_PCODE_ZIP, :old.CUST_STATE, :old.CUST_COUNTRY, :old.REGION, :old.CREATE_DATE, :old.CREATE_BY, :old.UPDATE_DATE, :old.UPDATE_BY ); OPEN c_Get_Approver ; FETCH c_Get_Approver INTO v_APPROVER; IF v_APPROVER = upper(nvl(apex_custom_auth.get_username,user)) AND (NVL(:old.approved_by,'*') = '*' AND NVL(:old.approved_date,SYSDATE) = SYSDATE) THEN :new.update_by := upper(nvl(apex_custom_auth.get_username,user)); :new.update_date := sysdate; ELSE :new.approved_by := NULL; :new.approved_date := NULL; :new.update_by := upper(nvl(apex_custom_auth.get_username,user)); :new.update_date := sysdate; END IF; CLOSE c_Get_Approver; EXCEPTION WHEN OTHERS THEN dbms_output.put_line('Error code:' ||sqlcode); dbms_output.put_line('Error msg:' ||sqlerrm); :new.approved_by := NULL; :new.approved_date := NULL; :new.update_by := upper(nvl(apex_custom_auth.get_username,user)); :new.update_date := sysdate; END; END; ALTER TRIGGER "TRI_APP_CUSTOMER_UPD" ENABLE;

268

For More Information: www.packtpub.com/oracle-database-11g-data-warehousing-businessintelligence-solutions-cookbook/book

Chapter 12

3. Insert Trigger: This trigger is for all inserts; it sets all the values within the audit columns:
CREATE OR REPLACE TRIGGER "TRI_APP_CUSTOMER_INS" BEFORE INSERT ON APP_CUSTOMER REFERENCING NEW AS New OLD AS Old FOR EACH ROW BEGIN :new.approved_by := NULL; :new.approved_date := NULL; :new.create_by := upper(nvl(apex_custom_auth.get_ username,user)); :new.create_date := sysdate; EXCEPTION WHEN OTHERS THEN dbms_output.put_line('Error code:' ||sqlcode); dbms_output.put_line('Error msg:' ||sqlerrm); :new.approved_by := NULL; :new.approved_date := NULL; :new.create_by := upper(nvl(apex_custom_auth.get_ username,user)); :new.create_date := sysdate; END;

ALTER TRIGGER "TRI_APP_CUSTOMER_INS" ENABLE;

With all the triggers in place, any data changes to the application tables will be audited. Insert the default data into APP_CUSTOMER, as shown in the following code:
Insert into APP_CUSTOMER (CUST_SEQ,CUST_NUM,CUST_NAME,CUST_ ADDRESS1,CUST_ADDRESS2,CUST_CITY,CUST_PCODE_ZIP,CUST_STATE,CUST_ COUNTRY,REGION) values (1,'1','Starbucks','680 Monroe Avenue',null,'Ro chester','14607','NY','USA','EAST'); Insert into APP_CUSTOMER (CUST_SEQ,CUST_NUM,CUST_NAME,CUST_ ADDRESS1,CUST_ADDRESS2,CUST_CITY,CUST_PCODE_ZIP,CUST_STATE,CUST_ COUNTRY,REGION) values (2,'2','Walmart','1902 Empire Boulevard',null,' Webster','14580','NY','USA','EAST'); Insert into APP_CUSTOMER (CUST_SEQ,CUST_NUM,CUST_NAME,CUST_ ADDRESS1,CUST_ADDRESS2,CUST_CITY,CUST_PCODE_ZIP,CUST_STATE,CUST_ COUNTRY,REGION) values (3,'3','Walmart','3838 South Semoran Boulevard' ,null,'Orlando','32822','FL','USA','SOUTH'); Insert into APP_CUSTOMER (CUST_SEQ,CUST_NUM,CUST_NAME,CUST_ ADDRESS1,CUST_ADDRESS2,CUST_CITY,CUST_PCODE_ZIP,CUST_STATE,CUST_ COUNTRY,REGION) values (4,'4','Starbucks','110 W. Main Street',null,'V isalia','93291','CA','USA','WEST');
269

For More Information: www.packtpub.com/oracle-database-11g-data-warehousing-businessintelligence-solutions-cookbook/book

Enhancing the Data Notice that the values of the audit column are corrected after the insert operation.

How it works...
The trigger tracks any update, insert, or delete commands. As APEX connects to the database as a single user, the procedure call to determine who is making the update is accomplished by using the apex_custom_auth.get_username procedure call in the code at the database level. These triggers are created on the application table. They place all the changes made through Update or Delete into the journal table.

Dening the APEX Upload application


APEX is installed within an Oracle database. The development environment and runtime environment are accessible through a web browser.

Getting ready
When APEX was installed, you should have had an APEX workspace created. Also, your administrator would have created a username with a password for you. You will need these before you begin. In addition to the username and password, your application schema will need to be associated with your workspace. If you do not have access to an environment, you can use an Oracle-hosted environment, and request for a workspace at http://apex.oracle.com. Note that this should not be used for a production environment, but can be used as a training or limited development environment.

How to do it...
Before starting, ensure you have the correct URLs for your web browser to connect to the APEX development environment. Open a compatible web browser and navigate to the URL for Oracle Application Express. For example, http://machine_name:port/apex. 1. Log in to Oracle Application Express:

270

For More Information: www.packtpub.com/oracle-database-11g-data-warehousing-businessintelligence-solutions-cookbook/book

Chapter 12

2. On the Workspace home page, click the Application Builder icon:

3. Click Create:

271

For More Information: www.packtpub.com/oracle-database-11g-data-warehousing-businessintelligence-solutions-cookbook/book

Enhancing the Data 4. For Method, select Database and click Next:

5. Select From Scratch and click Next:

272

For More Information: www.packtpub.com/oracle-database-11g-data-warehousing-businessintelligence-solutions-cookbook/book

Chapter 12

6. Specify the following:


For Name, enter CSV Uploader. Accept the remaining defaults and click Next:

7.

Add a blank page: Under Select Page Type, select Blank. At the bottom, type Interface for the Page Name and click Add Page. Click Create on the top panel to create the application:

273

For More Information: www.packtpub.com/oracle-database-11g-data-warehousing-businessintelligence-solutions-cookbook/book

Enhancing the Data 8. Continue through the wizard, selecting the defaults for the values:

How it works...
The Upload application is initially created with a blank page. This page will be completed at a later stage.

Creating the Upload interface


Before APEX 4.1, uploading CSV les into an Oracle database using APEX was very manual. With the latest release, APEX has built-in functionality to cater for the uploading and parsing of CSV les into an Oracle database.

Getting ready
Log on to APEX and open the application.

How to do it...
Creating the upload interface in APEX 4.1 is signicantly easier than in previous versions. One of the most common requests in a data warehouse is the ability for end users to upload data from a CSV le. In APEX 4.1, this is the default functionality:

274

For More Information: www.packtpub.com/oracle-database-11g-data-warehousing-businessintelligence-solutions-cookbook/book

Chapter 12

1. Click the Create Page > button:

2. Select Data Loading and click Next >:

275

For More Information: www.packtpub.com/oracle-database-11g-data-warehousing-businessintelligence-solutions-cookbook/book

Enhancing the Data 3. Enter the Data Load Denition Name; select the Owner, Table Name, and the required Unique Column elds. Click Next >:

4. You have the ability to add lookups and transformation rules if required. For this example, leave the settings as default, and click Next > on the Table Lookups and Transformation Rules pages:

276

For More Information: www.packtpub.com/oracle-database-11g-data-warehousing-businessintelligence-solutions-cookbook/book

Chapter 12

5. Review the data load steps and click Next >:

6. Select Use an existing tab set and create a new tab within the existing tab set. Enter Customer Upload for New Tab Label. Click Next >:

277

For More Information: www.packtpub.com/oracle-database-11g-data-warehousing-businessintelligence-solutions-cookbook/book

Enhancing the Data 7. Enter 1 for the Branch to Page eld. Click Next >:

8. Review the page denition and click Finish:

278

For More Information: www.packtpub.com/oracle-database-11g-data-warehousing-businessintelligence-solutions-cookbook/book

Chapter 12

9. Click Run Page to run the application and test the upload:

10. Enter username and password, then review the application:

279

For More Information: www.packtpub.com/oracle-database-11g-data-warehousing-businessintelligence-solutions-cookbook/book

Enhancing the Data

How it works..
This page will allow the user to identify the CSV le and enter the le specication. Once selected, the information will be uploaded into the upload table and then parsed. Upon parsing, the information is placed into the application table. This is a common request in a data warehouse application, and with APEX 4.1, this is a default functionality.

280

For More Information: www.packtpub.com/oracle-database-11g-data-warehousing-businessintelligence-solutions-cookbook/book

Where to buy this book


You can buy Business Intelligence Cookbook: A Project Lifecycle Approach Using Oracle Technology from the Packt Publishing website: http://www.packtpub.com/oracle-database-11g-data-warehousingbusiness-intelligence-solutions-cookbook/book.
Free shipping to the US, UK, Europe and selected Asian countries. For more information, please read our shipping policy.

Alternatively, you can buy the book from Amazon, BN.com, Computer Manuals and most internet book retailers.

www.PacktPub.com

For More Information: www.packtpub.com/oracle-database-11g-data-warehousing-businessintelligence-solutions-cookbook/book

You might also like