GIS Data Model

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 97

DATA MODELS IN GIS

Mahesh K. Jat
Department of civil Engineering Malaviya National Institute of Technology Jaipur

DATA MODELS IN GIS


OUTLINE: Overview of models Data and levels of measurements Raster and vector models Conversion between models Databases

GIS Analysis
Much of GIS analysis and description

consists of investigating the properties of geographic features and determining the relationships between them.

Geographic information
Characteristics of Geographic Information Location! Volume Dimensionality
Point Line Area

Continuity
Feature field

Building complex features


Simple geographic features can be

used to build more complex ones. Areas are made up of lines which are made up of points represented by their coordinates. Areas = {Lines} = {Points}

DIGITAL INFORMATION

GIS requires that both data and maps be represented as numbers


physical data structure (i.e. files and directories).

GIS places data into the computers memory in a

files can be written in binary or as ASCII text.


binary is faster to read and smaller, ASCII can be

read by humans and edited but uses more space.


sent through a pipe consisting of 0s and 1s
stored on devices that can store only 0s and 1s processed as 0s and 1s

DATA

locational and attribute data in a GIS attribute type: discrete vs continuous

discrete: presumed to occur at distinct locations with empty locations having a value of zero for the attribute in question continuous: feature occurs throughout geographical region; no locations are empty

Properties of Features
size distribution pattern contiguity neighborhood shape scale

orientation.

Basic properties of geographic features

DATA
Levels of Measurement:

four levels are commonly recognized nominal, ordinal, interval and ratio each subsequent level includes all characteristics of preceding levels data available at higher levels can be reduced to lower levels; opposite is not true

LEVEL OF MEASUREMENTS
Nominal Scale

objects are classed into groups; groups possess arbitrary labels (numbers/names)

i.e. religion, land use/cover

discrete variable

LEVEL OF MEASUREMENTS
Ordinal Scale

categorization plus an ordering/ranking of data i.e. country road, street, highway

can identify larger/smaller but can not comment on degree between variables

K=5, L=3, M=1 equivalent to K=500, L=300, M=10

discrete variables

LEVEL OF MEASUREMENTS
Interval Scale

measurements arranged in rank and distance between measurements is known

no true zero point


i.e. elevation/topographic lines, temperature in oC

discrete or continuous

LEVEL OF MEASUREMENTS
Ratio Scale

like interval scaling: both rank and separation are known, but there is also a known, fixed starting point i.e. temperature on Kelvin scale; speed

continuous and discrete

DATA MODELS REPRESENTING DATA


1. Reality total phenomena as they actually exist 2. Conceptual Data Model describes and defines included entities (how they will be represented)

3. Logical Data Model logical organization of the database elements


4. Physical Data Model or File Structure how information will be structured for access

DATA MODELS

logical data model is how data are organized for use by the GIS.
GISs have traditionally used either raster or vector for maps.

raster based on pixels vector based on points, lines and polygons

while most GIS systems can handle raster and vector, only one is used for the internal organization of spatial data.

DATA MODELS

rasters and vectors can be flat files if they are simple


Raster-based line Vector-based line

Flat File
0000000000000000 0001100000100000 1010100001010000 1100100001010000 0000100010001000 0000100010000100 0001000100000010 0010000100000001 0111001000000001 0000111000000000 0000000000000000

Flat File
4753456 4753436 4753462 4753432 4753405 4753401 4753462 4753398 623412 623424 623478 623482 623429 623508 623555 623634

RASTER DATA MODELS


basic unit is cells or pixels which are uniformly spaced each cell/pixel has spatial and spectral information. i.e. digital elevation data and digital images

spatially exhaustive sampling of the area of interest

every cell has a value, even if it is missing.

cell has a resolution, given as the cell size in ground units.

higher resolution, smaller cell dimensions

RASTER DATA MODELS


Grid extent

Resolution
Columns

Rows

Grid cell

Generic structure for a grid.

RASTER DATA MODELS

RASTER DATA MODELS

Fining of Resolution

RASTER DATA MODELS

Sources of Raster Data


Satellite data LANDSAT SPOT IRS
Scanned aerial photography Digital Orthophotography Scanned maps and documents

From where do we get Raster Data?


SCANNED Aerial photographs photographs are NOT raster images but SCANNED images ARE SCANNED maps
Satellite images

CREATING RASTER DATA MODELS

creating raster is like laying a grid over a map


code each cell with a value representing attribute every cell has a value, even if null or zero (integers, ratios, etc.)

values for each cell are written into a file


spreadsheet, data base, word processor imported into GIS so it can be reformatted

each pixel presumably has one value in reality is this correct? mixed pixel issue

RASTER AND MISSING DATA

GIS data layer as a grid with a large section of missing data, in this case, the zeros in the ocean off of New York and New Jersey.

MIXED PIXEL ISSUE

Water dominates W W W W W W G G G

Winner takes all W G W W W G G G G

Edges separate W E W E E E G G G

MIXED PIXEL ISSUE

Largest share

Presence/Absence

35% Water Land Central point

70%

80%

100%

Percent occurrence

CREATING RASTER DATA MODELS

raster data visualized as map layers

map layer: data describing a single characteristic for a location multiple items of information require multiple layers

creates problems raster databases can become enormous

each map layer has thousands of cells

RASTER DATA MODELS


Advantages

simple data structures each cell can be owned by only one feature. overlay and combination of maps and remote sensed images easy simulation easy, because cells have the same size and shape technology is cheap

RASTER DATA MODELS


Advantages

some spatial analysis methods simple to perform


local: cell by cell calculations focal: models cell value based on neighbours zonal: models cell value based on geographical areas global: models cell value based on all cells

RASTER DATA MODELS


Disadvantages

volumes of graphic data use of large cells to reduce data volumes poor at representing points, lines and areas; good at surfaces must often include redundant or missing data network linkages are difficult to establish projection transformations are time consuming

COMPRESSION TECHNIQUES

raster compression techniques used in GIS are runlength encoding and quad trees

Run-length Encoding more efficient


values often occur in runs across several cells form of spatial autocorrelation e.g. array 0 0 0 1 1 0 0 1 1 1 0 0 1 1 1 would be entered as 3 0 2 1 2 0 3 1 2 0 3 1

RUN-LENGTH CODING
Row-by-row coding: CCCCCBBDCCCCBBDCCCBBBDDCBBA ADDDDBAADDBBBAADDDAAAADDDA AAA Run-length coding: 5C 2B 1D 4C 2B 1D 3C 3B 2D 1C 2B 2A 4D 1B 2A 2D 3B 2A 3D 4A 3D 4A A. Mixed Conifer B. Douglas Fir C. Oak Savannah D. Grassland

56 entries for 7x8 array, or


22 pairs (44 entries) for 7x8 array

COMPRESSION TECHNIQUES
Quadtree Compression

hierarchical data model using a variable-sized grid cell finer subdivisions are used in areas requiring finer detail (higher resolution) pixel in each higher layer is derived from average or majority of 4 pixels from the lower layer not as efficient for more variable or complex data used primarily as a way to store data for rapid retrieval on display devices

QUAD TREE STRUCTURE

RASTER DATA FORMAT

most raster formats are digital image formats.


most GISs accept TIF, GIF, JPEG or encapsulated PostScript, which are not georeferenced. DEMs are true raster data formats.

RASTER DATA FORMAT

VECTOR DATA MODELS

think of world as a space populated by discrete features of various shapes and kinds points, lines, areas.
any location in space may be empty or occupied by one or more point, line or area.

VECTOR DATA MODELS


point

zero-dimensional abstraction of an object represented by a single X,Y co-ordinate. normally represents a geographic feature too small to be displayed as a line or area stored by their real (earth) coordinates

VECTOR DATA MODELS


line

set of ordered co-ordinates that represent the shape of geographic features too narrow to be displayed as an area at the given scale or linear features with no area lines and areas are built from sequences of points in order. lines have a direction to the ordering of the points.

VECTOR DATA MODELS


polygon

feature used to represent areas. defined by the lines that make up its boundary and a point inside its boundary for identification. have attributes that describe the geographic feature they represent.

Areas are lines are points are coordinates

VECTOR DATA MODELS

vector data evolved the arc/node model in the 1960s.

an area consist of lines and a line consists of points.

points, lines, and areas can each be stored in their own files, with links between them. endpoint of a line (arc) is called a node; arc junctions are only at nodes. stored with the arc is the topology (i.e. the connecting arcs and left and right polygons).

Topology
A set of rules on how objects relate to

each other
Major difference in file formats

Higher level objects have special

topology rules

Topology Definition
The Science of mathematics of relationships

used to validate the geometry of vector entities, and for operations such as network tracing and tests of polygon adjacency.
The study of geometric properties that do

not change when the forms are bent, stretched or under go similar geometric transformations.

Why Topology Matters


Error Detection

open polygons unlabeled polygons slivers polygons that cannot exist next to each other Network Modeling

Show Placitas
Arc Node Topology Cover# Lpoly# and Rpoly# Tnode fnode
Label errors

Higher Level Object


Regions
Networks TIN Triangulated irregular network Dynamic Segmentation

Regions
Overlapping areas with different attributes Fire history Disconnected areas with the same attributes Hawaii

Networks
Road systems, power grids, water

supply sewerage systems, drainage network


Continuous connected networks Rules for displacement in a network Attribute value accumulations due to

displacements

TIN
Vector Surface Model
Triangulated Irregular Network A set of nonoverlapping triangles each

with a constant gradient A TIN can honor original input elevations

TOPOLOGY

topological data structures dominate GIS software. stored explicitly allows automated error detection and elimination. rarely are maps topologically clean when digitized or imported. GIS has to be able to build topology from unconnected arcs.

TOPOLOGY
2
9 10

12
7

POLYGON A 5 4

1
2 8 1 3

File of Arcs by Polygon A: 1,2 , Area, Attributes 1 1,2,3,4,5,6,7 2 1,8,9,10,11,12,13,7 Arcs File

Arc/Node Map Data Structure with Files.

Poi nts Fi le

11

13

1xy 2xy 3xy 4xy 5xy 6xy 7xy 8xy 9xy 10 x y 11 x y 12 x y 13 x y

TOPOLOGY

relationship between nodes, arcs and polygons. topologically structured database for ease of retrieval and implementation of spatial-relational operations. advantages:

simple, elegant and efficient relational database construction and analysis complete topology makes map overlay feasible. topology allows many GIS operations to be done without accessing the point files.

VECTOR DATABASE CREATION

database creation involves several stages:


input of the spatial data input of the attribute data linking spatial and attribute data

spatial data is entered via digitized points and lines, scanned and vectorized lines or directly from other digital sources once the spatial data has been entered, much work is still needed before it can be used

VECTOR DATABASE CREATION


Building Topology

once points are entered and geometric lines are created, topology must be "built"

this involves calculating and encoding relationships between the points, lines and areas
this information may be automatically coded into tables of information in the database

Topological Model
Topology: mathematical method to

define spatial relationships


Arc-node data model Arc: a series of points that start and end at a node Node: an intersection point where two or more arcs meet

Topological Data Spatial Operations


Contiguity: spatial relationship of

adjacency
i.e., Bus stand adjacent to railway station

Connectivity: interconnected pathways

or networks
i.e., street and trail networks, stream

networks

Basic arc topology


n2 A n1 Arc 1 2 1 B Topological Arcs File From To PL PR n1x n1y n2x n2y n1 n2 A B x y x y

A topological structure for the arcs.

TOPOLOGY
Topological data structures dominate GIS software.
Topology allows automated error detection and

elimination. Rarely are maps topologically clean when digitized or imported. A GIS has to be able to build topology from unconnected arcs. Nodes that are close together are snapped. Slivers due to double digitizing and overlay are eliminated.

Slivers

Sliver

Unsnapped node

Topology Matters
The tolerances controlling snapping,

elimination, and merging must be considered carefully, because they can move features. Complete topology makes map overlay feasible. Topology allows many GIS operations to be done without accessing the point files.

VECTOR DATABASE CREATION


Editing

during topology generation process, problems such as overshoots, undershoots and spikes are either flagged for editing by the user or corrected automatically

automatic editing involves the use of a tolerance value which defines the width of a buffer zone around objects within which adjacent objects should be joined

VECTOR DATA MODELS


Advantages

good representation of structures (points, lines, polygons)


compact and more efficient

topology can be completely described


accurate graphics retrieval, updating and generalization of graphics and attributes possible work well with pen and light-plotting devices and tablet digitizers.

VECTOR DATA MODELS


Disadvantages

complex data structures


combination of several vector polygon maps or polygon and raster maps through overlay creates difficulties

simulation is difficult
display and plotting can be expensive technology is expensive

not good at continuous coverage or plotters that fill areas.


TIN must be used to represent volumes.

VECTOR DATA FORMATS

vector formats are either page definition languages or preserve ground coordinates.
page languages are HPGL, PostScript, and Autocad DXF.

true vector GIS data formats include ArcView Shapefiles and ArcGIS Interchange Files (E00) which has topology.

VECTOR DATA MODELS


List of coordinates spaghetti

simple
easy to manage no topology lots of duplication, hence need for large storage space very often used in CAC (computer assisted cartography)

VECTOR DATA MODELS


Vertex Dictionary

no duplication, but still this model does not use topology

VECTOR DATA MODELS


Dual Independent Map Encoding (DIME)

developed by US Bureau of the Census nodes (intersections of lines) are identified with codes assigns a directional code in the form of a "from node" and a "to node"

both street addresses and UTM coordinates are explicitly defined for each link

VECTOR TO RASTER EXCHANGE

data exchange by translation (export and import) can lead to significant errors in attributes and in geometry. efficient data exchange is important for the future of GIS.

VECTOR TO RASTER EXCHANGE

ADVANCED DATA MODELS - TIN

triangulated irregular network is a set of elevation

points which have been connected to form a network of triangles.

developed in early 1970s as a simple way to build a surface the sample points are connected by lines to form triangles; within each triangle the surface is usually represented by a plane triangles fit together in a manner which simulates the face of the land.

ADVANCED DATA MODELS - TIN

ADVANCED DATA MODELS - TIN

Ir-regularly spaced sample points can be adapted to the terrain


rough terrain - more points smooth terrain - less points an irregularly spaced sample is more efficient

ADVANCED DATA MODELS - TIN

TINs can be seen as polygons having attributes of

slope, aspect and area,

three vertices having elevation attributes

TIN model work best in areas with sharp breaks in slope

ADVANCED DATA MODELS - TIN

ADVANCED DATA MODELS - TIN


Advantages ability to describe the surface at different level of resolution efficiency in storing data allows simple calculation of basin areas, slopes, channels, and many other geometric parameters Disadvantages in many cases require visual inspection and manual control of the network

DATABASES

a spatial database is a collection of spatially referenced data that acts as a model of reality these selected phenomena are deemed important enough to represent in digital form the digital representation might be for some past, present or future time period

DIGITIAL DATABASES

Scaleless - data can be stored at the level of detail found in the environment
cartographer is responsible for choosing the content and resolution scale critical factor:

level of resolution set by field instruments

digitizing - resolution of instrument and abstraction and production factors

DIGITIAL DATABASES

problems when using data sets of different resolutions i.e. roads may not line up resolved using ancillary source materials additional problems when using data sets of different themes i.e. combing elevation and drainage data water running uphill or non-level lakes

DIGITIAL DATABASES
Value of databases:

Cost of creation cheaper to get data from an


existing database

Appropriateness of use
Lack of alternative data sources Graphic output

METADATA

data about the data could include data elements that: identify the data, identify the custodians and access conditions to the data, describe projection, content, quality of data describes the action taken when handling databases of varying scale

Dataset information
Title Abstract Ortofotos'95 Ortofotos'95 is a collection of ortho-rectified aerial photographs. These aerial photographs cover Portugal and were obtained in August 1995 in false color infra red film at scale 1:40 000. CNIG, The Directorate General of Forests and The Paper Mill industry are the owners of the aerial photographs (in paper format). Airborne data>Aerial photos

Type of dataset

Locations
Temporal Range Dataset scales Dataset resolution Dataset quality remarks Information creation date

Portugal
19951:25 000-1:50 000 1 - 3 meters Aquisition of data: aerial photographs, the film is scanned at very high resolution and ortho-rectified using DTM derived from topographic cartography at scale 1:25 000 1999-10-29

DATABASES

pre-1970s, command line based with read and write to hard disk, tapes, diskettes database approach all reading and writing through simple interface (no need to care about tapes, etc.)

small GIS projects sufficient to store geographic information as simple files.


with large data volumes and number of data users best to use a database management system (DBMS) relational design has been the most useful (since 1980s)

DATABASE MANAGEMENT SYSTEMS

contain tables or feature classes in which:


rows: entities, records, observations, features

all information about one occurrence of a feature

columns: attributes, fields, data elements, varaibles

one type of information for all features

key field is an attribute whose values uniquely identify each row Parcel Table
entity
Parcel # 8 9 36 75 Address 501 N Hi 590 N Hi 1001 W. Main 1175 W. 1st Block 1 2 4 12 $ Value 105,450 89,780 101,500 98,000

Key field

Attribute

DATABASES - RDBM

tables are related or joined using a common record identifier (column variable) present in both tables

Example:

goal: produce map of values by distinct/neighbourhood problem: no distance code available in parcel table
Parcel # 8 9 36 75 Parcel Table Address Block 501 N Hi 1 590 N Hi 2 1001 W. Main 4 1175 W. 1st 12 $ Value 105,450 89,780 101,500 98,000

DATABASES - RDBM

solution: join parcel table containing values with geography table containing location codings, using Block as key field
Parcel # 8 9 36 75 Parcel Table Address Block 501 N Hi 1 590 N Hi 2 1001 W. Main 4 1175 W. 1st 12 $ Value 105,450 89,780 101,500 98,000

Secondary or foreign key


Block 1 2 4 12 Geography Table District Tract A 101 B 101 B 105 E 202 City Dallas Dallas Dallas Garland

DATABASES - RDBM
Relational Linkages
Spatial Attributes

Water Right Locations

Descriptive Attributes

DATABASES
Advantage

very flexible export data to another system easily

enables simple operations


i.e. search for records satisfying some condition

Description New Ice Nilas, Ice Rind Young Ice Grey Ice Grey-White Ice First-Year Ice Thin First-Year Ice Thin First-Year Ice, first stage Thin First-Year Ice, second stage Medium First-Year Ice Thick First-Year Ice Old Ice Second-Year Ice Multi-Year Ice

Thickness <10 cm 0-10 cm 10-30 cm 10-15 cm 15-30 cm 30-200 cm 30-70 cm 30-50 cm 50-70 cm 70-120 cm 120-200 cm

Code 1 2 3 4 5 6 7 8 9 1. 4. 7. 8. 9.

You might also like