GIS Data Model
GIS Data Model
GIS Data Model
Mahesh K. Jat
Department of civil Engineering Malaviya National Institute of Technology Jaipur
GIS Analysis
Much of GIS analysis and description
consists of investigating the properties of geographic features and determining the relationships between them.
Geographic information
Characteristics of Geographic Information Location! Volume Dimensionality
Point Line Area
Continuity
Feature field
used to build more complex ones. Areas are made up of lines which are made up of points represented by their coordinates. Areas = {Lines} = {Points}
DIGITAL INFORMATION
DATA
discrete: presumed to occur at distinct locations with empty locations having a value of zero for the attribute in question continuous: feature occurs throughout geographical region; no locations are empty
Properties of Features
size distribution pattern contiguity neighborhood shape scale
orientation.
DATA
Levels of Measurement:
four levels are commonly recognized nominal, ordinal, interval and ratio each subsequent level includes all characteristics of preceding levels data available at higher levels can be reduced to lower levels; opposite is not true
LEVEL OF MEASUREMENTS
Nominal Scale
objects are classed into groups; groups possess arbitrary labels (numbers/names)
discrete variable
LEVEL OF MEASUREMENTS
Ordinal Scale
can identify larger/smaller but can not comment on degree between variables
discrete variables
LEVEL OF MEASUREMENTS
Interval Scale
discrete or continuous
LEVEL OF MEASUREMENTS
Ratio Scale
like interval scaling: both rank and separation are known, but there is also a known, fixed starting point i.e. temperature on Kelvin scale; speed
DATA MODELS
logical data model is how data are organized for use by the GIS.
GISs have traditionally used either raster or vector for maps.
while most GIS systems can handle raster and vector, only one is used for the internal organization of spatial data.
DATA MODELS
Flat File
0000000000000000 0001100000100000 1010100001010000 1100100001010000 0000100010001000 0000100010000100 0001000100000010 0010000100000001 0111001000000001 0000111000000000 0000000000000000
Flat File
4753456 4753436 4753462 4753432 4753405 4753401 4753462 4753398 623412 623424 623478 623482 623429 623508 623555 623634
basic unit is cells or pixels which are uniformly spaced each cell/pixel has spatial and spectral information. i.e. digital elevation data and digital images
Resolution
Columns
Rows
Grid cell
Fining of Resolution
code each cell with a value representing attribute every cell has a value, even if null or zero (integers, ratios, etc.)
spreadsheet, data base, word processor imported into GIS so it can be reformatted
each pixel presumably has one value in reality is this correct? mixed pixel issue
GIS data layer as a grid with a large section of missing data, in this case, the zeros in the ocean off of New York and New Jersey.
Water dominates W W W W W W G G G
Edges separate W E W E E E G G G
Largest share
Presence/Absence
70%
80%
100%
Percent occurrence
map layer: data describing a single characteristic for a location multiple items of information require multiple layers
simple data structures each cell can be owned by only one feature. overlay and combination of maps and remote sensed images easy simulation easy, because cells have the same size and shape technology is cheap
local: cell by cell calculations focal: models cell value based on neighbours zonal: models cell value based on geographical areas global: models cell value based on all cells
volumes of graphic data use of large cells to reduce data volumes poor at representing points, lines and areas; good at surfaces must often include redundant or missing data network linkages are difficult to establish projection transformations are time consuming
COMPRESSION TECHNIQUES
raster compression techniques used in GIS are runlength encoding and quad trees
values often occur in runs across several cells form of spatial autocorrelation e.g. array 0 0 0 1 1 0 0 1 1 1 0 0 1 1 1 would be entered as 3 0 2 1 2 0 3 1 2 0 3 1
RUN-LENGTH CODING
Row-by-row coding: CCCCCBBDCCCCBBDCCCBBBDDCBBA ADDDDBAADDBBBAADDDAAAADDDA AAA Run-length coding: 5C 2B 1D 4C 2B 1D 3C 3B 2D 1C 2B 2A 4D 1B 2A 2D 3B 2A 3D 4A 3D 4A A. Mixed Conifer B. Douglas Fir C. Oak Savannah D. Grassland
COMPRESSION TECHNIQUES
Quadtree Compression
hierarchical data model using a variable-sized grid cell finer subdivisions are used in areas requiring finer detail (higher resolution) pixel in each higher layer is derived from average or majority of 4 pixels from the lower layer not as efficient for more variable or complex data used primarily as a way to store data for rapid retrieval on display devices
think of world as a space populated by discrete features of various shapes and kinds points, lines, areas.
any location in space may be empty or occupied by one or more point, line or area.
zero-dimensional abstraction of an object represented by a single X,Y co-ordinate. normally represents a geographic feature too small to be displayed as a line or area stored by their real (earth) coordinates
set of ordered co-ordinates that represent the shape of geographic features too narrow to be displayed as an area at the given scale or linear features with no area lines and areas are built from sequences of points in order. lines have a direction to the ordering of the points.
feature used to represent areas. defined by the lines that make up its boundary and a point inside its boundary for identification. have attributes that describe the geographic feature they represent.
points, lines, and areas can each be stored in their own files, with links between them. endpoint of a line (arc) is called a node; arc junctions are only at nodes. stored with the arc is the topology (i.e. the connecting arcs and left and right polygons).
Topology
A set of rules on how objects relate to
each other
Major difference in file formats
topology rules
Topology Definition
The Science of mathematics of relationships
used to validate the geometry of vector entities, and for operations such as network tracing and tests of polygon adjacency.
The study of geometric properties that do
not change when the forms are bent, stretched or under go similar geometric transformations.
open polygons unlabeled polygons slivers polygons that cannot exist next to each other Network Modeling
Show Placitas
Arc Node Topology Cover# Lpoly# and Rpoly# Tnode fnode
Label errors
Regions
Overlapping areas with different attributes Fire history Disconnected areas with the same attributes Hawaii
Networks
Road systems, power grids, water
displacements
TIN
Vector Surface Model
Triangulated Irregular Network A set of nonoverlapping triangles each
TOPOLOGY
topological data structures dominate GIS software. stored explicitly allows automated error detection and elimination. rarely are maps topologically clean when digitized or imported. GIS has to be able to build topology from unconnected arcs.
TOPOLOGY
2
9 10
12
7
POLYGON A 5 4
1
2 8 1 3
File of Arcs by Polygon A: 1,2 , Area, Attributes 1 1,2,3,4,5,6,7 2 1,8,9,10,11,12,13,7 Arcs File
Poi nts Fi le
11
13
TOPOLOGY
relationship between nodes, arcs and polygons. topologically structured database for ease of retrieval and implementation of spatial-relational operations. advantages:
simple, elegant and efficient relational database construction and analysis complete topology makes map overlay feasible. topology allows many GIS operations to be done without accessing the point files.
input of the spatial data input of the attribute data linking spatial and attribute data
spatial data is entered via digitized points and lines, scanned and vectorized lines or directly from other digital sources once the spatial data has been entered, much work is still needed before it can be used
once points are entered and geometric lines are created, topology must be "built"
this involves calculating and encoding relationships between the points, lines and areas
this information may be automatically coded into tables of information in the database
Topological Model
Topology: mathematical method to
adjacency
i.e., Bus stand adjacent to railway station
or networks
i.e., street and trail networks, stream
networks
TOPOLOGY
Topological data structures dominate GIS software.
Topology allows automated error detection and
elimination. Rarely are maps topologically clean when digitized or imported. A GIS has to be able to build topology from unconnected arcs. Nodes that are close together are snapped. Slivers due to double digitizing and overlay are eliminated.
Slivers
Sliver
Unsnapped node
Topology Matters
The tolerances controlling snapping,
elimination, and merging must be considered carefully, because they can move features. Complete topology makes map overlay feasible. Topology allows many GIS operations to be done without accessing the point files.
during topology generation process, problems such as overshoots, undershoots and spikes are either flagged for editing by the user or corrected automatically
automatic editing involves the use of a tolerance value which defines the width of a buffer zone around objects within which adjacent objects should be joined
simulation is difficult
display and plotting can be expensive technology is expensive
vector formats are either page definition languages or preserve ground coordinates.
page languages are HPGL, PostScript, and Autocad DXF.
true vector GIS data formats include ArcView Shapefiles and ArcGIS Interchange Files (E00) which has topology.
simple
easy to manage no topology lots of duplication, hence need for large storage space very often used in CAC (computer assisted cartography)
developed by US Bureau of the Census nodes (intersections of lines) are identified with codes assigns a directional code in the form of a "from node" and a "to node"
both street addresses and UTM coordinates are explicitly defined for each link
data exchange by translation (export and import) can lead to significant errors in attributes and in geometry. efficient data exchange is important for the future of GIS.
developed in early 1970s as a simple way to build a surface the sample points are connected by lines to form triangles; within each triangle the surface is usually represented by a plane triangles fit together in a manner which simulates the face of the land.
rough terrain - more points smooth terrain - less points an irregularly spaced sample is more efficient
DATABASES
a spatial database is a collection of spatially referenced data that acts as a model of reality these selected phenomena are deemed important enough to represent in digital form the digital representation might be for some past, present or future time period
DIGITIAL DATABASES
Scaleless - data can be stored at the level of detail found in the environment
cartographer is responsible for choosing the content and resolution scale critical factor:
DIGITIAL DATABASES
problems when using data sets of different resolutions i.e. roads may not line up resolved using ancillary source materials additional problems when using data sets of different themes i.e. combing elevation and drainage data water running uphill or non-level lakes
DIGITIAL DATABASES
Value of databases:
Appropriateness of use
Lack of alternative data sources Graphic output
METADATA
data about the data could include data elements that: identify the data, identify the custodians and access conditions to the data, describe projection, content, quality of data describes the action taken when handling databases of varying scale
Dataset information
Title Abstract Ortofotos'95 Ortofotos'95 is a collection of ortho-rectified aerial photographs. These aerial photographs cover Portugal and were obtained in August 1995 in false color infra red film at scale 1:40 000. CNIG, The Directorate General of Forests and The Paper Mill industry are the owners of the aerial photographs (in paper format). Airborne data>Aerial photos
Type of dataset
Locations
Temporal Range Dataset scales Dataset resolution Dataset quality remarks Information creation date
Portugal
19951:25 000-1:50 000 1 - 3 meters Aquisition of data: aerial photographs, the film is scanned at very high resolution and ortho-rectified using DTM derived from topographic cartography at scale 1:25 000 1999-10-29
DATABASES
pre-1970s, command line based with read and write to hard disk, tapes, diskettes database approach all reading and writing through simple interface (no need to care about tapes, etc.)
key field is an attribute whose values uniquely identify each row Parcel Table
entity
Parcel # 8 9 36 75 Address 501 N Hi 590 N Hi 1001 W. Main 1175 W. 1st Block 1 2 4 12 $ Value 105,450 89,780 101,500 98,000
Key field
Attribute
DATABASES - RDBM
tables are related or joined using a common record identifier (column variable) present in both tables
Example:
goal: produce map of values by distinct/neighbourhood problem: no distance code available in parcel table
Parcel # 8 9 36 75 Parcel Table Address Block 501 N Hi 1 590 N Hi 2 1001 W. Main 4 1175 W. 1st 12 $ Value 105,450 89,780 101,500 98,000
DATABASES - RDBM
solution: join parcel table containing values with geography table containing location codings, using Block as key field
Parcel # 8 9 36 75 Parcel Table Address Block 501 N Hi 1 590 N Hi 2 1001 W. Main 4 1175 W. 1st 12 $ Value 105,450 89,780 101,500 98,000
DATABASES - RDBM
Relational Linkages
Spatial Attributes
Descriptive Attributes
DATABASES
Advantage
Description New Ice Nilas, Ice Rind Young Ice Grey Ice Grey-White Ice First-Year Ice Thin First-Year Ice Thin First-Year Ice, first stage Thin First-Year Ice, second stage Medium First-Year Ice Thick First-Year Ice Old Ice Second-Year Ice Multi-Year Ice
Thickness <10 cm 0-10 cm 10-30 cm 10-15 cm 15-30 cm 30-200 cm 30-70 cm 30-50 cm 50-70 cm 70-120 cm 120-200 cm
Code 1 2 3 4 5 6 7 8 9 1. 4. 7. 8. 9.