Database Integration
Database Integration
Database Integration
Lecture 10
Instructor: Mehwashma Amir
A database system is composed of two elements:
• DBMS
• Database
A schema describes the actual data
structures and organization within
the system.
History
• During the decade of the 1970, centralized databases were
predominant, but recent innovations in communications and
database technologies have engendered a revolution in data
processing, giving rise to a new generation of decentralized database
systems i.e. distributed database.
• A fundamental distinction must first be drawn between distributed,
heterogeneous, and multidatabase systems.
• Distributed database: A distributed database system is made up of a single
logical database that is physically distributed across a computer network,
together with a distributed database management system that answers
consistent queries and updates. Its homogenous (all its physical
components run the same distributed database management system)
• Heterogeneous database: a heterogeneous database system is a
distributed database system that includes heterogeneous components at
the database level; these may include a variety of data models, query
languages, schemas, and access heterogeneities.
• Multi database: A Multi database system is a collection of loosely coupled
element databases, with no unified schema applied for their integration
Motivation for Multi Database
• A large organization has several departments each making
autonomous decisions.
• Widespread heterogeneity arises naturally from a free market of
ideas and products, some of which prove to be more widely adapted
than others to specific applications.
SO Database integration is:
• Combining information from multiple autonomous information sources
• And answering queries using the combined information
• Database integration conceptually combines participating databases
to form a single cohesive interoperable Multi database. Such a Multi
database Is Capable of providing uniform user access interfaces to
the component heterogeneous distributed database systems.
• Multi database systems combine autonomous and heterogeneous
component (or local) database systems into a global database system
•
Database Integration
• The important task in the integration process is how to merge
together two different databases through the different data
models.
Three types
• System integration: enables data to be accessed from more than
one data base.
• Schema integration: provides a uniform global conceptual view of
the multi database.
• Semantic integration: resolves data conflicts which might exist
between component databases.
Schema matching
• Fundamental problem:
schema matching, which takes two (or more) database schemas to
produce a mapping between elements (or attributes) of the two
(or more) schemas that correspond semantically to each other.
• Objective: merge the schemas into a single global schema
Integrating Two Schema
• Represent the mapping with a similarity relation, , over the power
sets of S1 and S2, where each pair in represents one element of
the mapping. E.g.,
•
Cust.CNo Customer.CustID
Cust.CompName Customer.Company
{Cust.FirstName, Cust.LastName} Customer.Contact
•
Different types of matching
• Schema-level only matching: only schema information is considered.
• Domain and instance-level only matching: some instance data (data
records) and possibly the domain of each attribute are used. This
case is quite common on the Web.
• Integrated matching of schema, domain and instance data: Both
schema and instance data (possibly domain information) are
available.
Schema level matching
• Schema level matching relies on information such as name,
description, data type, relationship type
• Match cardinality:
• 1:1 match: one element in one schema matches one element of another
schema.
• 1:m match: one element in one schema matches m elements of another
schema.
• m:n match: m elements in one schema matches n elements of another
schema.
•
Example:
m:1 match is similar to 1:m match. m:n match is complex, and there is little work on it.