Database Integration

Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

Database Integration

Lecture 10
Instructor: Mehwashma Amir
A database system is composed of two elements:
• DBMS
• Database
 A schema describes the actual data
structures and organization within
the system.
History
• During the decade of the 1970, centralized databases were
predominant, but recent innovations in communications and
database technologies have engendered a revolution in data
processing, giving rise to a new generation of decentralized database
systems i.e. distributed database.
• A fundamental distinction must first be drawn between distributed,
heterogeneous, and multidatabase systems.
• Distributed database: A distributed database system is made up of a single
logical database that is physically distributed across a computer network,
together with a distributed database management system that answers
consistent queries and updates. Its homogenous (all its physical
components run the same distributed database management system)
• Heterogeneous database: a heterogeneous database system is a
distributed database system that includes heterogeneous components at
the database level; these may include a variety of data models, query
languages, schemas, and access heterogeneities.
• Multi database: A Multi database system is a collection of loosely coupled
element databases, with no unified schema applied for their integration
Motivation for Multi Database
• A large organization has several departments each making
autonomous decisions.
• Widespread heterogeneity arises naturally from a free market of
ideas and products, some of which prove to be more widely adapted
than others to specific applications.
SO Database integration is:
• Combining information from multiple autonomous information sources
• And answering queries using the combined information
• Database integration conceptually combines participating databases
to form a single cohesive interoperable Multi database. Such a Multi
database Is Capable of providing uniform user access interfaces to
the component heterogeneous distributed database systems.
• Multi database systems combine autonomous and heterogeneous
component (or local) database systems into a global database system

Database Integration
• The important task in the integration process is how to merge
together two different databases through the different data
models.
Three types
• System integration: enables data to be accessed from more than
one data base.
• Schema integration: provides a uniform global conceptual view of
the multi database.
• Semantic integration: resolves data conflicts which might exist
between component databases.
Schema matching
• Fundamental problem:
schema matching, which takes two (or more) database schemas to
produce a mapping between elements (or attributes) of the two
(or more) schemas that correspond semantically to each other.
• Objective: merge the schemas into a single global schema
Integrating Two Schema
• Represent the mapping with a similarity relation, , over the power
sets of S1 and S2, where each pair in represents one element of
the mapping. E.g.,

Cust.CNo Customer.CustID
Cust.CompName Customer.Company
{Cust.FirstName, Cust.LastName} Customer.Contact

Different types of matching
• Schema-level only matching: only schema information is considered.
• Domain and instance-level only matching: some instance data (data
records) and possibly the domain of each attribute are used. This
case is quite common on the Web.
• Integrated matching of schema, domain and instance data: Both
schema and instance data (possibly domain information) are
available.
Schema level matching
• Schema level matching relies on information such as name,
description, data type, relationship type
• Match cardinality:
• 1:1 match: one element in one schema matches one element of another
schema.
• 1:m match: one element in one schema matches m elements of another
schema.
• m:n match: m elements in one schema matches n elements of another
schema.

Example:
m:1 match is similar to 1:m match. m:n match is complex, and there is little work on it.

What does schema matching do


• Given 2 schemas
• Returns how each element from each schema is related (= , <= , is-a,
part-of, overlap (set), contain (set) .. etc)
• It is impossible to determine fully automatically all matches. At best,
what we can do is to infer match candidates which users can accept,
reject or change.
Issues
• When matching a large number of schemas, statistical approaches
such as data mining can be used, rather than only doing pair-wise
match.
Schema matching tools
• IBM Rational Data Architect
• Microsoft Biztalk
• COMA++
Motivation
• If Microsoft takes over Yahoo! Successfully
Tons of DB schemas will be mediated! Integration would take several
weeks or months if done manually.

You might also like