Chapter 4
Chapter 4
Content
1. Physical integration
– Source databases are integrated and the integrated database is materialized
– Data warehouses
– Extract-transform-load (ETL) tools that enable extraction of data from
sources, their transformation to match the GCS, and their materialization.
Data Warehouse Approach
2. Logical integration
• Other complications
– Insufficient schema and instance information
– Unavailability of schema documentation
– Subjectivity of matching
• Issues that affect schema matching
– Schema versus instance matching
– Element versus structure level matching
– Matching cardinality
Schema Matching Approaches
Schema Integration
• Use the correspondences to create a GCS
• Mainly a manual process, although rules can help
• Mapping data from each local database (source) to GCS (target) while preserving
semantic consistency as defined in both source and target.
• Data warehouses ⇒ actual translation
• Data integration systems ⇒ discover mappings that can be used in the query processing
phase
• Two issues :
– Mapping creation
– Mapping maintenance
Mapping Creation
Given
– A source LCS
– A target GCS
Produce a set of queries that, when executed, will create GCS data instances from
the source data.
We are looking, for each Tk, a query Qk that is defined on a (possibly proper) subset
of the relations in S such that, when executed, will generate data for Ti from the
source relations
Conclusion
• Database integration
• The process of creating a GCS (or a mediated schema) and
determining how each LCS maps to it.
• Data warehouses where the GCS is instantiated and materialized
• Data integration systems where the GCS is merely a virtual view