0% found this document useful (0 votes)
60 views

Chapter 4

The document discusses database integration, which involves integrating existing local databases into a single global conceptual schema (GCS). It describes the bottom-up design methodology, where the GCS is defined by integrating parts of the local conceptual schemas (LCSs). The integration process involves schema translation, schema generation through schema matching, integration and mapping, and mapping the LCSs to the GCS. Schema matching determines correspondences between schemas, integration combines schemas into a GCS, and mapping specifies how to translate data between the LCSs and GCS.

Uploaded by

Anella Sun
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
60 views

Chapter 4

The document discusses database integration, which involves integrating existing local databases into a single global conceptual schema (GCS). It describes the bottom-up design methodology, where the GCS is defined by integrating parts of the local conceptual schemas (LCSs). The integration process involves schema translation, schema generation through schema matching, integration and mapping, and mapping the LCSs to the GCS. Schema matching determines correspondences between schemas, integration combines schemas into a GCS, and mapping specifies how to translate data between the LCSs and GCS.

Uploaded by

Anella Sun
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Chapter – 4: Database Integration

Content

1. Bottom-Up Design Methodology


2. Database Integration Process
3. Schema Translation
4. Schema Generation
i. Schema Matching
ii. Schema Integration
iii. Schema Mapping
Bottom-Up Design Methodology

• A number of databases already exist, and the design task involves


integrating them into one database.
• Given existing databases with their Local Conceptual Schemas (LCSs),
how to integrate the LCSs into a Global Conceptual Schema (GCS)
– GCS is also called mediated schema
Integration Alternatives
• Physical integration
• Logical integration

1. Physical integration
– Source databases are integrated and the integrated database is materialized
– Data warehouses
– Extract-transform-load (ETL) tools that enable extraction of data from
sources, their transformation to match the GCS, and their materialization.
Data Warehouse Approach

Figure 4.1 Data warehouse Approach


Integration Alternatives

2. Logical integration

– Global conceptual schema is virtual and not materialized


– the GCS may be defined bottom-up, by “integrating” parts of the LCSs of
the local operational databases
– Enterprise Information Integration (EII)
Bottom-up Design Methodology
• Two alternative approaches
1. GCS (also called mediated schema) is defined first
– Map LCSs to this GCS schema
– As in data warehouses
2. GCS is defined as an integration of parts of LCSs
– Generate GCS and map LCSs to this GCS
GCS/LCS Relationship
• Local-as-view
– The GCS definition is assumed to exist, and each LCS is
treated as a view definition over it
• Global-as-view
– The GCS is defined as a set of views over the LCSs
Database Integration Process
1. Schema translation
– Component database schemas translated to a common intermediate
canonical representation
– Necessary only if the component databases are heterogeneous and local
schemas are defined using different data models.
2. Schema generation
– Intermediate schemas are used to create a global conceptual schema
– Three steps:
1. Schema Matching
2. Schema Integration
3. Schema Mapping
Database Integration Process
• The schema generation process consists of the
following steps:
1. Schema mapping that determines how to map the
elements of each LCS to the other elements of the
GCS.
2. Integration of the common schema elements into
a global conceptual (mediated) schema if one has
not yet been defined.
3. Schema matching to determine the syntactic and
semantic correspondences among the translated LCS
elements or between individual LCS elements and
the pre-defined GCS elements.
Database Integration Process: Example

Figure 4.4: Relational engineering Database Representation

Figure 4.6: Relational Mapping of ER Schema

Figure 4.5: Entity-Relationship Database


Running Example
E-R Model
Relational
EMP(ENO, ENAME, TITLE)
PROJ(PNO, PNAME, BUDGET,
LOC, CNAME)
ASG(ENO, PNO, RESP, DUR)
PAY(TITLE, SAL)

WORKER(WNUMBER, NAME, TITLE, SALARY)


PROJECT(PNUMBER, PNAME, BUDGET)
CLIENT(CNAME, ADDRESS)
WORKS IN(WNUMBER, PNUMBER, RESPONSIBILITY, DURATION)
CONTRACTED BY(PNUMBER, CNAME, CONTRACTNO
Recall Access Architecture
1. Schema Translation
• What is the canonical data model?
– Relational
– Entity-relationship
• DIKE
– Object-oriented
• ARTEMIS
– Graph-oriented
• DIPE, TranScm, COMA, Cupid
• Preferable with emergence of XML
• No common graph formalism
• Mapping algorithms
– These are well-known
2. Schema Generation
• Schema matching
– Finding the correspondences between multiple schemas
• Schema integration
– Creation of the GCS (or mediated schema) using the correspondences
• Schema mapping
– How to map data from local databases to the GCS
– mapping constraint generation and transformation generation.
• Important: sometimes the GCS is defined first and schema matching and
schema mapping is done against this target GCS
Schema Matching
• Determines which concepts of one schema match those of another
• Schema heterogeneity
– Structural heterogeneity
• Type conflicts
• Dependency conflicts
• Key conflicts
• Behavioral conflicts
– Semantic heterogeneity
• More important and harder to deal with
• Synonyms, homonyms, hypernyms
• Different ontology
• Imprecise wording
Schema Matching (cont’d)

• Other complications
– Insufficient schema and instance information
– Unavailability of schema documentation
– Subjectivity of matching
• Issues that affect schema matching
– Schema versus instance matching
– Element versus structure level matching
– Matching cardinality
Schema Matching Approaches
Schema Integration
• Use the correspondences to create a GCS
• Mainly a manual process, although rules can help

Fig. 4.9 Example Integrated GCS


Binary Integration Methods
N-ary Integration Methods
Schema Mapping

• Mapping data from each local database (source) to GCS (target) while preserving
semantic consistency as defined in both source and target.
• Data warehouses ⇒ actual translation
• Data integration systems ⇒ discover mappings that can be used in the query processing
phase
• Two issues :
– Mapping creation
– Mapping maintenance
Mapping Creation
Given
– A source LCS

– A target GCS

– A set of value correspondences discovered during


schema matching phase

Produce a set of queries that, when executed, will create GCS data instances from
the source data.
We are looking, for each Tk, a query Qk that is defined on a (possibly proper) subset
of the relations in S such that, when executed, will generate data for Ti from the
source relations
Conclusion

• Database integration
• The process of creating a GCS (or a mediated schema) and
determining how each LCS maps to it.
• Data warehouses where the GCS is instantiated and materialized
• Data integration systems where the GCS is merely a virtual view

You might also like