A Techinical Paper: Tupimakadia1@yahoo - Co.in Yamu - 4u1985@yahoo - Co.in

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 14

A TECHINICAL PAPER ON

BY:
TRUPTI MAKADIA (5TH IT) YAMINI PATEL (5TH CE) C.U.SHAH COLLEGE OF ENGG. & TECH. WADHAWAN CITY-363030 E-MAIL: [email protected].

[email protected]

INDEX:

Abstract Overview 1. What is data mining? 2.1continuous innovations 3. Data warehouses 4. Application of data mining 5. What can data mining do? 6. How it works? 7. Elements 8. Level of analysis 9. Integration with object relational database system 9.1 Significance of data mining 9.2 The object relational perspective 9.3 Database integration 9.4 Problems and difficulties in integration 10. Conclusion

Data Mining techniques, based on statistics and machine learning can significantly boost the ability to analyze large amounts of data. Despite its potential, this technology is destined be a niche technology unless an effort is made to integrate it with the new evolving Object-Relational Database Systems. The traditional database systems are not well suited to meet the challenges of the future. Relational models lack support for the complex data needed by todays enterprises whereas the object models suffer from scalability problems. Object-Relational Model combines the advantages of the traditional models while overcoming their deficiencies. This technical research paper explores the key issues, challenges and methods to enable the seamless integration of data mining technology within the framework of the Object-Relational Database Systems. This integration is the key to making it convenient to use, easy to deploy in real applications, and to growing its user base.

: DATA MINING :

1. OVERVIEW:-

Mining has always been associated with dark, bottomless pits and workers who didn't see the light of day for hours at a time.Data mining derives its name from the similarities between searching for valuable business information in a large database and mining a mountain for a vein of valuable ore. Both processes require either sifting through an immense amount of material, or intelligently probing it to find exactly where the value resides. It is a powerful new technology with great potential to help companies focus on the most important information in their data warehouses. Data mining tools predict future trends and behaviors, allowing businesses to make proactive, knowledge-driven decisions. The automated, prospective analyses offered by data mining move beyond the analyses of past events provided by retrospective tools. Data mining tools can answer business questions that traditionally were too time-consuming to resolve. They scour databases for hidden patterns, finding predictive information that experts may miss because it lies outside their expectations. Most major organizations have data warehouses containing information about their clients, competitors and products. These huge data warehouses contain gigabytes with "hidden" information that can't be easily found using typical database queries, giving rise to the myth that the more data you have, the less you know. Data mining algorithms change all that by finding interesting patterns that an enterprise didnt even know were there. Data mining (sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it into useful information - information that can be used to increase revenue, cuts costs, or both. Data mining software is one of a number of analytical tools for analyzing data. It allows users to analyze data from many different dimensions or angles, categorize it, and summarize the relationships identified. Technically, data mining is the process of finding correlations or patterns among dozens of fields in large relational databases

2. WHAT IS DATAMINING?

2.1 Continuous Innovation: - Although data mining is a relatively new term, the
technology is not. Companies have used powerful computers to sift through volumes of supermarket scanner data and analyze market research reports for years. However, continuous innovations in computer processing power, disk storage, and statistical software are dramatically increasing the accuracy of analysis while driving down the cost. For example, one Midwest grocery chain used the data mining capacity of Oracle software to analyze local buying patterns. They discovered that when men bought diapers on Thursdays and Saturdays, they also tended to buy beer. Further analysis showed that these shoppers typically did their weekly grocery shopping on Saturdays. On Thursdays, however, they only bought a few items. The retailer concluded that they purchased the beer to have it available for the upcoming weekend. The grocery chain could use this newly discovered information in various ways to increase revenue. For example, they could move the beer display closer to the diaper display. And, they could make sure beer and diapers were sold at full price on Thursdays.

3. DATA WAREHOUSE: Dramatic advances in data capture, processing power, data transmission, and storage capabilities are enabling organizations to integrate their various databases into data warehouses. Data warehousing is defined as a process of centralized data management and retrieval. Data warehousing, like data mining, is a relatively new term although the concept itself has been around for years. Data warehousing represents an ideal vision of maintaining a central repository of all organizational data. Centralization of data is needed to maximize user access and analysis. Dramatic technological advances are making this vision a reality for many companies. And, equally dramatic advances in data analysis software are allowing users to access this data freely. The data analysis software is what supports data mining.

4. APPLICATION OF DATA MINING: -

Applications of data mining include fraud detection, credit card scoring and personal profile marketing. Skillful interpretation of data can enhance customer relations, direct marketing, trend analysis, financial market forecasting and international criminal investigations. Web mining, through which data is analyzed from the Web, helps business understand customer "click- stream" behavior online.Data Mining techniques, based on statistics and machine learning can significantly boost the ability to analyze large Amounts of data. Despite its potential, this technology is destined be a niche technology unless an effort is made to integrate it with the new evolving ObjectRelational Database Systems. The traditional database systems are not well suited to meet the challenges of the future. Relational models lack support for the complex data needed by todays

5. WHAT CAN DATA MINING DO?


Companies with a strong consumer focus - retail, financial, communication, and marketing organizations, primarily use data mining today. It enables these companies to determine relationships among "internal" factors such as price, product positioning, or staff skills, and "external" factors such as economic indicators, competition, and customer demographics. And, it enables them to determine the impact on sales, customer satisfaction, and corporate profits. Finally, it enables them to "drill down" into summary information to view detail transactional data. With data mining, a retailer could use pointof-sale records of customer purchases to send targeted promotions based on an individual's purchase history. By mining demographic data from comment or warranty cards, the retailer could develop products and promotions to appeal to specific customer segments. For example, Blockbuster Entertainment mines its video rental history database to recommend rentals to individual customers. American Express can suggest products to its cardholders based on analysis of their monthly expenditures.

6. HOW IT WORKS?

While large-scale information technology has been evolving separate transaction and analytical systems, data mining provides the link between the two. Data mining software analyzes relationships and patterns in stored transaction data based on openended user queries. Several types of analytical software are available: statistical, machine learning, and neural networks. Generally, any of four types of relationships are sought:

Classes: Stored data is used to locate data in predetermined groups. For example,
a restaurant chain could mine customer purchase data to determine when customers visit and what they typically order. This information could be used to increase traffic by having daily specials.

Clusters: Data items are grouped according to logical relationships or consumer


preferences. For example, data can be mined to identify market segments or consumer affinities.

Associations: Data can be mined to identify associations. The beer-diaper


example is an example of associative mining.

Sequential patterns: Data is mined to anticipate behavior patterns and trends.


For example, an outdoor equipment retailer could predict the likelihood of a backpack being purchased based on a consumer's purchase of sleeping bags and hiking shoes.

7. ELEMENTS:

Extract, transform, and load transaction data onto the data warehouse system. Store and manage the data in a multidimensional database system. Provide data access to business analysts and information technology professionals.

Analyze the data by application software. Present the data in a useful format, such as a graph or table.

8. LEVEL OF ANALYSIS: -

Artificial neural networks: Non-linear predictive models that learn


through training and resemble biological neural networks in structure.

Genetic algorithms: Optimization techniques that use processes such as


genetic combination, mutation, and natural selection in a design based on the concepts of natural evolution.

Decision trees: Tree-shaped structures that represent sets of decisions.


These decisions generate rules for the classification of a dataset. Specific decision tree methods include Classification and Regression Trees (CART) and Chi Square Automatic Interaction Detection (CHAID) . CART and CHAID are decision tree techniques used for classification of a dataset.

Nearest neighbor method: A technique that classifies each record in a


dataset based on a combination of the classes of the k record(s) most similar to it in a historical dataset (where k 1). Sometimes called the k-nearest neighbor technique.

Rule induction: The extraction of useful if-then rules from data based on
statistical significance.

Data visualization: The visual interpretation of complex relationships in


multidimensional data. Graphics tools are used to illustrate data relationships.

9. INTEGRATION WITH OBJECT RELATIONAL DATABASE SYSTEM: 9.1 Significance of Data Mining A recent Gartner Group Advanced Technology Research Note listed data mining and artificial intelligence at the top of the five key technology areas that will clearly have a major impact across a wide range of industries within the next 3 to 5 years. It also observed that, With the rapid advance in data capture, transmission and storage, large-systems users will increasingly need to implement new and innovative ways to mine the after- market value of their vast stores of detail data, employing MPP [massively parallel processing] systems to create new sources of business advantage. Within the next 2-3 years, at least half of the Fortune 1000 companies worldwide will be using data mining technology. The way in which companies interact with their customers has changed dramatically over the past few years. A customers continuing

business is no longer guaranteed. As a result companies have found that they need to understand their customers better, and to quickly respond to their wants and needs. In addition, time frame in which these responses need to be made has been shrinking. It is no longer possible to wait 9.2 The Object-Relational Perspective The complexity and richness of data to be handled by business applications is constantly increasing. The explosion of the World Wide Web has made it possible to publish content that involves text, image, audio and video data. Intranets and extranets help drive the data workflow both within a company and externally with its partners, suppliers and customers to support its business processes. We are in a period of intensive change and innovation regarding database technology and related products. The pressures of a competitive marketplace are driving corporations to build and evolve their applications in a timely and cost-effective manner. Increasingly, companies need to build applications that closely match their business models and processes. Modern database applications need to store and manipulate objects that are neither small nor simple, and to perform operations on these objects. Formally defined A system that includes both object infrastructure and a set of relational extenders that exploit it is called an Object-Relational Database System. Object-Relational Database Systems (ORDBMS) are able to easily store complex, structured data and large, unstructured, domain-specific data, such as text, image, audio and video. ORDBMS allows the users to define their own types, thus extending the type set of the database. Complex data like graphics, images, videos and songs can be stored in the database directly. Furthermore they also provide features such as encapsulation of data, inheritance between types and polymorphism. 9.3 Database Integration Companies spend millions of dollars to build data warehouses to hold their data and data mining techniques must take advantage of this. Besides saving significant

manual effort and storage space, this integration allows data mining applications to access the most up- to-date information available. Many leading vendors like IBM, Oracle etc. have taken positive steps in this regard, but there is stillroom for improvement. Success of data mining as an enterprise technology crucially depends on seamless integration of this technology with enterprise databases, and more specifically the newly emerging Object-Relational Database Systems. Data mining applications must be smoothly integrated within the Object-Relational Database Systems in order to get the maximum benefit out its inherent object-orientation. The present era is seeing sweeping changes at an unprecedented rate. In such a fast paced world, technologies need to keep up to date with developments in related fields or they would be rendered obsolete. 9.4 Problems and Difficulties in the Integration Data Mining normally involves operations on very large sets of data. ObjectRelational Database Systems suffer a performance loss when dealing with such large amounts of persistent objects. Independently both Data Mining and Object-Relational Database Systems are still evolving and this continues to pose problems to bring them to a common framework. For Object-Relational Databases and only the relational model should be extended to incorporate user-defined types. Many of the data mining algorithms use complex mathematical and statistical algorithms that are not easily mapped into human terms. We found this to be a strong deterrent for understanding.

10. CONCLUSION: Object-relational database systems are fast gaining popularity in the industry and are replacing the traditional relational databases.Data mining techniques are currently optimized for relational database systems and must evolve to work with object-relational database systems. Independently both of these are technologies are becoming the prerequisites of doing business in the new economy and their integration is the next logical step The concept of database and its extension to data warehousing and mining such a data warehousing by different mining technique with their implementation organize data in more meaningful way with integration and not only drawing report but extracting vital and valuable information for managing the process in an effective manner and help knowledge driven decision making. Further the architecture will make easy path to extends its scope covering wider areas touching administration, management, teaching, research, student counseling, content creation & delivering system, self learning virtual environment etc. The system will open-up doors for optimizing inter-linked processes to enhance efficiency & effectively of the working patterns. .

You might also like