0% found this document useful (0 votes)
21 views

GIS Unit 3

Uploaded by

dhanushbabu363
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views

GIS Unit 3

Uploaded by

dhanushbabu363
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 75

Please read this disclaimer before

proceeding:
This document is confidential and intended solely for the educational purpose of
RMK Group of Educational Institutions. If you have received this document
through email in error, please notify the system manager. This document
contains proprietary information and is intended only to the respective group /
learning community as intended. If you are not the addressee you should not
disseminate, distribute or copy through e-mail. Please notify the sender
immediately by e-mail if you have received this document by mistake and delete
this document from your system. If you are not the intended recipient you are
notified that disclosing, copying, distributing or taking any action in reliance on
the contents of this information is strictly prohibited.
OCE552
GEOGRAPHIC
INFORMATION SYSTEM
Department: CSE
Batch/Year: 2021-25/ III Yr

Created by:
Dr.M.Hemalatha, Associate Professor, ADS/RMKEC

Ms.ROHINI S, Assistant Professor , CSE/RMKEC

Ms. DEEPA R, Assistant Professor, AD/RMKEC


Date: 24.08.2023
Table of Contents
S.NO Topic Page No.

1. Course Objectives 6

2. Pre-Requisites 7

3. Syllabus 8

4. Course outcomes 9

5. CO- PO/PSO Mapping 10

6. Unit -III Lecture Plan 12

7. Activity based learning 13

8. Lecture notes 14-49

9. Assignments 50

10. Part A Q & A 51-55

11. Part B Qs 55

12. Supportive online Certification courses 56

13. Real time Applications in day to day life 57


and to Industry

14. Contents beyond the Syllabus


Course Objectives

• To introduce the fundamentals and components of Geographic Information System

• To provide details of spatial data structures and input, management and output
processes.
Pre-Requisites

Semester V

OCE552
Geographic Information System

Semester IV

CS8492
Database Management Systems

Semester III

CS8391
Data Structures

Semester II

GE8291
Environmental Science and
Engineering

Semester I

GE8152
Engineering Graphics
Syllabus

20CE003 GEOGRAPHIC INFORMATION SYSTEM LTPC


3003

UNIT I INTRODUCTION TO MAPS AND GIS 9


Maps- Definition – Scale – Types of maps – Elements of map –Projection – Purpose –
Types – Coordinate Systems: Geographic, Rectangular and Polar – Transformations –
Types and Application- GIS: Introduction – History – Components – Applications of GIS –
Popular GIS software – Opensource GIS software.
UNIT II DBMS AND GIS DATA MODEL 9
Database Management System –Function – Types – Advantages – Entity Relationship
model – Normalization – GIS Data model – Introduction – Data Encoding – Vector Data
Structure –Raster Data Structures – Network Data Structures - Comparison of Vector and
Raster Data Structure -Open Database Connectivity(ODBC).

UNIT III GIS DATA INPUT 9


Sources for GIS Data – Vector Data Input – Georeferencing – Topology – Topological
Relationship – Raster Data Input – Errors in input – Data Editing – Linking Attribute Data -
Raster File Formats – Vector File Formats – Raster to Vector and Vector to Raster
Conversion – OGC standards.

UNIT IV GIS DATA ANALYSIS 9


Introduction to Spatial analysis – Raster Data Spatial Analysis: Local, Neighbourhood,
Zonal Operations - Vector Operations and Analysis: Topological and Non- Topological
Operations - Network Analysis – DEM – Surface Analysis.

UNIT V GIS OUTPUT DESIGN AND PRESENTATIONS 9


Introduction – Spatial and Non-spatial Data presentation – Map Layout – Charts, Graphs
and Multimedia output – Elements of Spatial Data Quality – Meta Data – Introduction to
Web GIS – Applications in Civil Engineering.

TOTAL :45 PERIODS


Course outcomes

Upon completion of the course, the students will be able to:

CO320.1 - Outline the basic idea about fundamentals of GIS.

CO320.2 - Understand the types of spatial data models.

CO320.3 - Discuss about the data input and topology.

CO320.4 - Understand the data management functions and data output.

CO320.5 - Outline the application of GIS.

CO320.6 - Apply the GIS tools to develop real time applications.


CO- PO/PSO Mapping

Program

Program Outcomes Specific


Cour Outcomes
Le
se
vel
Out K3,
of
Com K3 K4 K4 K5 K5, A3 A2 A3 A3 A3 A3 A2
CO PS PS PS
K6
es O- O- O-
P 1 2 3
PO PO PO PO PO PO PO PO PO PO PO
O-
-2 -3 -4 -5 -6 -7 -8 -9 -10 -11 -12
1
C32
K2 2 1 1 - - - - - - - - - 1 1 1
0.1

C32
0.2 K2 2 1 1 - - - - - - - - - 1 1 1

C32
0.3 K2 2 1 1 - - - - - - - - - 1 1 1

C32
0.4 K2 2 1 1 - - - - - - - - - 1 1 1

C32
0.5 K3 3 - - - 1 - - - - - - - 1 1 1

C32
0.6 K3 3 - - - 1 - - - - - - - 1 2 1
UNIT - III
GIS DATA INPUT
Lecture Plan

UNIT – III DATA INPUT AND TOPOLOGY

Proposed Actual
Lecture After successful
Lecture High Mode Deliv completion of
Pertain est ery
S. of the course, the Rem
Topic ing Cogn Reso
No itive Deliv students should arks
Period Period CO(s) urce
Level ery be able to
s
(LU Outcomes)

Scanner - Discuss the


1 19.09.23 Raster 19.09.23 CO3 K2 MD2 T1 various types of
data input scanners
Raster data Outline the
file formats , various raster &
2 20.09.23 Vector data 20.09.23 CO3 K2 MD2 T1
vector file
file formats formats
Discuss the
3 20.09.23 Digitiser 20.09.23 CO3 K2 MD2 T1 various types of
digitisers
Topology -
adjacency, Discuss about
4 23.09.23 connectivity 23.09.23 CO3 K2 MD2 T1 the topology
and concepts
containment
Illustrate the
Topological various topology
5 25.09.23 consistency 25.09.23 CO3 K2 MD2 T1 consistency
rules - rules related to
Points points
Illustrate the
Topological
various topology
consistency
6 26.09.23 rules 26.09.23 CO3 K2 MD2 T1 consistency
– rules related to
Lines & lines & polygons
Polygons
Attribute data Outline the
7 27.09.23 linking , 27.09.23 CO3 K2 MD2 T1 architecture of
ODBC ODBC
Global
Discuss about
8 30.09.23 Positioning 30.09.23 CO3 K2 MD2 T1
GPS
System
Concept Illustrate the
9 of GPS concept of GPS
03.10.23 03.10.23 CO3 K3 MD2 T1
based based mapping
mapping
Activity based learning

1. Quiz - GIS Unit III –


https://quizizz.com/print/quiz/5d5a1ae854e105001a0e4cc1
UNIT – III GIS DATA INPUT

Sources for GIS Data – Vector Data Input – Georeferencing – Topology –


Topological Relationship – Raster Data Input – Errors in input – Data Editing –
Linking Attribute Data - Raster File Formats – Vector File Formats – Raster to
Vector and Vector to Raster Conversion – OGC standards.

INTRODUCTION:
SOURCES FOR GIS DATA:
Data encoding is the process of getting data into the computer. It is a process that is
fundamental to almost every GIS project. For example:
 An archaeologist may encode aerial photographs of ancient remains to
integrate with newly collected field data.
 A planner may digitize outlines of new buildings and plot these on existing
topographical data.
 An ecologist may add new remotely sensed data to a GIS to examine changes
in habitats.

 A historian may scan historical maps to create a virtual city from the past.
 A utility company may encode changes in pipeline data to record changes and
upgrades to their pipe network.
A GIS without data can be likened to a car without fuel – without fuel
you cannot go anywhere; without data a GIS will not produce output. However, this is
perhaps where the similarity ends, as there is only one place to obtain fuel (a petrol
station) and only one method of putting fuel into a car (using a petrol pump).
Spatial data, on the other hand, can be obtained from many different
sources, in different formats, and can be input to GIS using a number of different
methods.
Maps, which may come as paper sheets or digital files, may be input by
digitizing, scanning or direct file transfer; aerial photographs may be scanned into a
GIS; and satellite images may be downloaded from digital media. In addition, data can
be directly input to GIS from field equipment such as GPS, or from sources of ready-
prepared data from data ‘retailers’ or across the Internet.
In a GIS, data almost always need to be corrected and manipulated to ensure that they can
be structured according to the required data model.
Problems that may have to be addressed at this stage of a GIS project include:
• The re-projection of data from different map sources to a common projection;
• The generalization of complex data to provide a simpler data set; or
• The matching and joining of adjacent map sheets once the data are in digital form.

The following methods are available to get data into a GIS. These include
 keyboard entry
 Digitizing
 Scanning and
 Electronic data transfer.

Then, methods of data editing and manipulation are reviewed, including re-projection,
transformation and edge matching.
The whole process of data encoding and editing is often called the ‘data stream’. This is
outlined in the below Figure.
Before further explanation of the stages in the data stream, it is necessary to make a
distinction between analogue (non-digital) and digital sources of spatial data.
Analogue data are normally in paper form, and include paper maps, tables of statistics and
hard-copy (printed) aerial photographs.
These data all need to be converted to digital form before use in a GIS, thus the data
encoding and correction procedures are longer than those for digital data.
If data were all of the same type, format, scale and resolution, then data encoding and
integration would be simple.
However, since the characteristics of spatial data are as varied as their sources, the task is
complex. This variety has implications for the way data are encoded and manipulated
to develop an integrated GIS database.
Much effort has been made in recent years to develop universal GIS data standards and
common data exchange formats.
METHODS OF DATA INPUT
Data in analogue or digital form need to be encoded to be compatible with the GIS being
used. This would be a relatively straightforward method if all GIS packages used the same
spatial and attribute data models.
However, there are many different GIS packages and many different approaches to the
handling of spatial and attribute data.
All data in analogue form need to be converted to digital form before they can be input into
GIS. Four methods are widely used: keyboard entry, manual digitizing, automatic digitizing
and scanning.
Keyboard entry may be appropriate for tabular data, or for small numbers of co-ordinate
pairs read from a paper map source or pocket GPS.
Digitizing is used for the encoding of paper maps and data from interpreted air
photographs.
Scanning represents a faster encoding method for these data sources, although the
resulting digital data may require considerable processing before analysis is possible.
Digital data must be downloaded from their source media (diskette, CD-ROM or the Internet)
and may require reformatting to convert them to an appropriate format for the GIS being
used.
Reformatting or conversion may also be required after analogue data have been converted
to digital form. For example, after scanning a paper map, the file produced by the scanning
equipment may not be compatible with the GIS, so reformatting may be necessary.
Keyboard entry:
Keyboard entry, often referred to as key coding, is the entry of data into a file at a computer
terminal. This technique is used for attribute data that are only available on paper.
If details of the hotels in Happy Valley were obtained from a tourist guide, both spatial data
(the locations of the hotels – probably given as postal codes) and attributes of the
hotels (number of rooms, standard and full address) would be entered at a keyboard.
For a small number of hotels keyboard entry is a manageable task, although typographical
errors are very likely.
The co-ordinates of spatial entities can be encoded by keyboard entry, although this method
is used only when co-ordinates are known and there are not too many of them. If the
locations of the Happy Valley hotels were to be entered as co-ordinates then
these could be read from a paper map and input at the keyboard. Where there are large
numbers of coordinates and features to be encoded it is more common to use manual or
automatic digitizing.
Manual digitizing:
The most common method of encoding spatial features from paper maps is manual
digitizing. It is an appropriate technique when selected features are required from a paper
map.
For example, the Happy Valley road network might be required from a topographical map of
the area. Manual digitizing is also used for map encoding where it is important to reflect the
topology of features, since information about the direction of line features can be included.
Manual digitizing requires a digitizing table that is linked to a computer workstation. The
digitizing table is essentially a large flat tablet, the surface of which is underlain by a very
fine mesh of wires.
A cursor is attached to the digitizer via a cable can be moved freely over the surface of the
table. Buttons on the cursor allow the user to send instructions to the computer. The position
of the cursor on the table is registered by reference to its position above the wire mesh.
Most manual digitizers may be used in one of two modes: point mode or stream mode.
In point mode, the user begins digitizing each line segment with a start node, records each
change in direction of the line with a digitized point and finishes the segment with an end
node.
Thus, a straight line can be digitized with just two points, the start and end nodes. For more
complex lines, a greater number of points are required between the start and end nodes.
The user must choose a sensible number of points to represent the curve (a form of user
generalization). In addition, the digitizing equipment will have a minimum resolution
governed by the distance between the wires in the digitizing table.
Point and Stream mode digitizing
In stream mode, the digitizer is set up to record points according to a stated time
interval or on a distance basis.
Once the user has recorded the start of a line the digitizer might be set to record a point
automatically every 0.5 seconds and the user must move the cursor along the line to
record its shape.
An end node is required to stop the digitizer recording further points. The speed at which
the cursor is moved along the line determines the number of points recorded.
The choice between point mode and stream mode digitizing is largely a matter of personal
preference. Stream mode digitizing requires more skill than point mode digitizing, and for
an experienced user may be a faster method.
Automatic digitizing:
Manual digitizing is a time-consuming and tedious process. If large numbers of complex
maps need to be digitized, two automatic digitizing methods are considered here:
scanning and automatic line following.
Scanning is the most commonly used method of automatic digitizing. Scanning is an
appropriate method of data encoding when raster data are required, since this is the
automatic output format from most scanning software.
Thus, scanning may be used to input a complete topographic map that will be used as a
background raster data set for the over-plotting of vector infrastructure data such as
pipelines or cables.
A scanner is a piece of hardware for converting an analogue source document into digital
raster format. The accuracy of scanned output data depends on the quality of the scanner,
the quality of the image processing software used to process the scanned data, and the
quality (and complexity) of the source document. The resolution of the scanner used
affects the quality, and quantity of output data.
Automatic line follower:
This encoding method might be appropriate where digital versions of clear, distinctive lines
on a map are required (such as country boundaries on a world map, or clearly
distinguished railways on a topographic map).
Scanners are raster devices, the automatic line follower is a vector device and produces
output as (x,y) co-ordinate strings.
The data produced by this method are suitable for vector GIS. Automatic line followers are
not as common as scanners, largely due to their complexity.
Scanning:
Scanning provides a faster means of data entry compared to manual digitizing.
The process of conversion of paper maps into digital format usable by computer is known
as scanning.
It is used to convert an analog map into a scanned file, which is again converted to vector
format through tracing.
Scanning automatically captures map features, text and symbols as individual cells or
pixels
Electronic data transfer:
If a digital copy of the data required is available in a form compatible with GIS, inputting
these data into GIS is merely a question of electronic data transfer.
However, if the data you require is in a different digital format to that recognized by GIS.
Therefore, the process of digital data transfer often has to be followed by data conversion.
During conversion the data are changed to an appropriate format for use in your GIS.
Spatial data may be collected in digital form and transferred from devices such as GPS
receivers, total stations (electronic distance-metering theodolites), and data loggers
attached to all manner of scientific monitoring equipment.
Thus, electronic data transfer is an appropriate method of data encoding where the data
are already available in digital form (from a data collection device or another organization)
in a format compatible with GIS software.
Data Editing Stage:
Detecting and correcting errors:
Errors in input data may derive from three main sources: errors in the source data; errors
introduced during encoding; and errors propagated during data transfer and conversion.
Errors in source data may be difficult to identify. For example, there may be
subtle errors in a paper map source used for digitizing because of the methods used by
particular surveyors, or there may be printing errors in paper based records used as
source data.
During encoding a range of errors can be introduced. During keyboard encoding it is easy
for an operator to make a typing mistake; during digitizing an operator may encode the
wrong line; and folds and stains can easily be scanned and mistaken for real geographical
features.
Errors in attribute data are relatively easy to spot and may be identified using manual
comparison with the original data.

Re-projection, transformation and generalization:


Once spatial and attribute data have been encoded and edited, it may be necessary to
process the data geometrically in order to provide a common framework of reference.
Re-Projection:
Data derived from maps drawn on different projections will need to be converted to a
common projection system before they can be combined or analyzed.
If not re-projected, data derived from a source map drawn using one projection will not
plot in the same location as data derived from another source map using a different
projection system.
Transformation:
Data derived from different sources may also be referenced using different co-ordinate
systems. The grid systems used may have different origins, different units of
measurement or different orientation.
If so, it will be necessary to transform the co-ordinates of each of the input data sets onto
a common grid system.
Generalization:
Data may be derived from maps of different scales. The accuracy of the output from a GIS
analysis can only be as good as the worst input data. Thus, if source maps of widely
differing scales are to be used together, data derived from larger-scale mapping should be
generalized to be comparable with the data derived from smaller-scale maps.
Edge matching and rubber sheeting:
When a study area extends across two or more map sheets small differences or
mismatches between adjacent map sheets may need to be resolved. Normally, each map
sheet would be digitized separately and then the adjacent sheets joined after editing, re-
projection, transformation and generalization. The joining process is known as edge
Matching.
Updating and maintaining spatial databases:
The world is a very dynamic place and things change, often rapidly and especially in urban
areas where new buildings and roads are being built, meaning that spatial data can go out
of date and so needs regular updating.

3.2. RASTER DATA INPUT:


3.2.1. SCANNER:
Scanning coverts paper maps into digital format by capturing features as individual
cells, or pixels, producing an automated image. Maps are generally considered the
backbone of any GIS activity.
But most of the time paper maps are not easily available in a form that
can be readily used by the computers. Most of the paper maps had been prepared on
the basis of old conventional surveys. New maps can be produced using improved
technologies but this requires time as it increases the volume of work.
Thus, these paper maps have to be first converted into a digital format usable by the
computer. This is a critical step as the quality of the analog document must be preserved in
the transition to the computer domain.
The technology used for this kind of conversions is known as scanning and the instrument
used for this kind of operation is known as a scanner.
A scanner can be thought of as an electronic input device that converts analog information
of a document like a map, photograph or an overlay into a digital format that can be used
by the computer.
Scanning automatically captures map features, text, and symbols as individual cells,
or pixels, and produces an automated image.
The scanned file shows map features as raster lines( a series of connected pixels)
and must be vectorized to complete the process of digitizing.
A variety of scanning devices exist for the automatic capture of spatial data. While
several different technical approaches exist in scanning technology, all have the advantage
of being able to capture spatial features from a map at a rapid rate of speed.
Scanners are generally expensive to acquire and operate. Most scanning devices
have limitations with respect to the capture of selected features , e.g. text and symbol
recognition.

Working of a Scanner:
The most important component inside a scanner is the scanner head which can
move along the length of the scanner. The scanner head contains either a charged-couple
device (CCD) sensor or a contact image (CIS) sensor.
A CCD consists of a number of photosensitive cells or pixels packed together on
a chip. The most advanced large format scanners use CCD’s with 8000 pixels per chip for
providing a very good image quality.
While scanning a bright white light from the scanner strikes the image to be
scanned and is reflected onto the photosensitive surface of the sensor placed on the
scanner head. Each pixel transfers a gray tone value (values given to the different
shades of black in the image ranging from 0 (black) – 255 (white) i.e. 256 values to
the scan board (software).
The software interprets the value in terms of 0 (Black) or 1 (white), thereby, forming a
monochrome image of the scanned portion. As the head moves ahead, it scans the
image in tiny strips and the sensor continues to store the information in a sequential
fashion. The software running the scanner pierces together the information from the
sensor into a digital form of the image. This type of scanning is known as one pass
scanning.
Scanning a colour image is slightly different in which the scanner head has to scan
the same image for three different colours i.e. red, green, and blue. In older colour
scanners, this was accomplished by scanning the same area three times over for the
three different colours. This type of scanner is known as three-pass scanner.
The primary function of any scanner is to convert measured quantities of light to electrical
analogs. The light that is measured may be light that has been transmitted through the
material.
For GIS and other computer applications, the electrical analogs are subsequently
converted to a binary form suitable for computer processing. If the output of the scanner
is to be measured as input to a GIS, care must be taken to preserve the spatial integrity of
the item being scanned.
Preservation of spatial integrity is normally accomplished by describing the scanned
document as an orthogonal array of grid cells(raster array). Each grid cell represents an
instantaneous field of view within which the scanner makes a measurement. The manner
in which the grid cell is defined depends upon the particular scanner being used. The
following 4 types of scanner are commonly used in GIS and Remote Sensing.

Mechanical scanner:
It is called drum scanner since a map or an image placed on a drum is digitized
mechanically with rotation of the drum and shift of the sensors. It is accurate but slow.
Video Scanner:
Video camera with CRT( Cathode Ray Tube) is often used to digitize a small part of map
of firm. This is not very accurate but cheap.
CCD Camera:
Area CCD camera( Called digital still camera ) instead of video camera will be also
convenient to acquire digital image data. It is not more accurate than video camera.

CCD Scanner:
Flat bed type or roll feed type scanner with linear CCD (Charge coupled Device) is now
commonly used to digitize analog maps in raster format, either in mono-tone or color
mode. It is accurate but expensive.

Types of Scanners:

There are several different types of scanners performing the same job but handling
the job differently using different technologies and producing results depending on
their varying capabilities.

Flatbed scanner:

The most commonly used scanner is a flatbed scanner also known as desktop
scanner. It has a glass plate on which the picture or the document is placed. The
scanner head placed beneath the glass plate moves across the picture and the
result is a good quality scanned image. For scanning large maps or top sheets wide
format flatbed scanners can be used.

Drum scanner:
There are the drum scanners which are mostly used by the printing
professionals. In this type of scanner, the image or the document is placed on a glass
cylinder that rotates at very high speeds around a centrally located sensor containing
photo-multiplier tube instead of a CCD to scan. Prior to the advances in the field of
sheet fed scanners, the drum scanners were extensively used for scanning maps and
other documents.
Types of scanner
Hand-held scanner:
Hand-held scanners although portable, can only scan images up to about four inches
wide. They require a very steady hand for moving the scan head over the document.
They are useful for scanning small logos or signatures and are virtually of no use for
scanning maps and photographs.

Types or methods of Scanning:


Scanning captures map features, text, and symbols as individual cells, or pixels, and
produce an automated image. Based on the document to be scanned there are different
scanning procedures followed.
Black and White Raster Scanning:
Image is scanned in Black and White. It is the simplest method of converting any
document and can be performed on line drawings, reduced media, text or any one colour
document. This is the appropriate solution for archiving and storage projects, in which
the documents will be viewed and printed but never changed. It is, therefore, an ideal
solution as the first stage in a planned document conversion project.
Grey Scale and Colour Raster Scanning:
Image is scanned in greyscale. Image scanned in color Gray scale and (especially)
colour images can be quite large. It must be made sure that the system is capable of
handling files whose size is often measured in tens of megabytes. Because virtually
every pixel is populated with a value, an attempt to compress the file results in little or
no reduction in file size.
Limitations in use of Scanners
• Hard copy maps are often unable to be removed to where a scanning device is
available, e.g. most companies or agencies cannot afford their own scanning device
and therefore must send their maps to a private firm for scanning.
• Hard copy data may not be in a form that is viable for effective scanning, e.g. maps
are of poor quality, or are in poor condition;
• Geographic features may be too few on a single map to make it practical, cost-
justifiable to scan.
• Often on busy maps a scanner may be unable to distinguish the features to be
captured from the surrounding graphic information, e.g. dense contours with labels.
• With raster scanning there it is difficult to read unique labels (text) for a geographic
features effectively.
• Scanning is much more expensive than manual digitizing, considering all the
cost/performance issues.
• Raster data provides a matrix of cells with values representing a coordinate and
sometimes linked to an attribute table and it is much simpler for many layers
combinations. Raster data is very easy to modify or program due to simple data
structure.
• Rasters are in part defined by their pixel depth. Pixel depth defines the range of
distinct values the raster can store. For example, a 1-bit raster can only store 2
distinct value: 0 and 1.
Vector Data Input:
Digitizer:
Digitizing is the process of interpreting and converting paper map or image data to
vector digital data.
Digitizing is the process by which coordinates from a map, image, or other sources of
data are converted into a digital format in a GIS. This process becomes necessary when
available data is gathered in formats that cannot be immediately integrated with other
GIS data.
Digitization results in shape files, which are vector features.
Manual digitization is a tedious job and if operator is not efficient it may lead to several
digitizing errors. Hence, it has to be done with most skill and caution.
Manual digitizing is a tedious job. Operator fatigue (eye strain, back soreness,
etc.)seriously degrade the data quality. Managers must limit the number of hours an
operator works at one time. A commonly used quality check is to produce a verification
plot of the digitized data that is visually compared with the map from which the data
were originally digitized.
Tablet digitizers with a free cursor connected with a personal computer are the most
common device for digitizing spatial features with the plainmetric coordinates from
analog maps. The analog map is placed on the surface of the digitizing tablet as shown
in figure. The size of digitizer usually ranges from A3 to A0 size.
Fig: Tablet Digitizer
Digitizing operation is as follows:
step 1: A map is affixed to a digitizing table.
Step 2 : Control points or tics at four corners of this map sheet should be digitized by the
digitizer and input to PC together with the map coordinates of the four corners,
Step 3 : Map contents are digitized according to the map layers and map code system in
either point mode or stream mode at short time interval.
Step 4: Editing errors such as small gaps at line junctions, overshoots, duplicates etc.
should be made for a clean dataset without errors.
Step 5 : Conversion from digitizer coordinates to map coordinates to store in a spatial
database.
Major problems of map digitization are :-
The map will stretch or shrink day by day which makes the newly digitized points slightly off
from the previous points.
The map itself has errors.
Discrepancies across neighbouring map sheets will produce disconnectivity.
Manual digitizing has many advantages. These include :
• Low capital cost, e.g. digitizing tables are cheap
• Low cost of labour
• Flexibility and adaptability to different data types and sources
• Easily taught in a short amount of time - an easily mastered skill
• Generally the quality of data is high;
• Digitizing devices are very reliable and most often offer a greater precision that the data
warrants;
• Ability to easily register and update existing data.

Heads-up digitization:
This method uses scanned copy of the map or image and digitization is done on the screen
of the computer monitor. The scanned map lays vertical which can be viewed without
bending the head down and therefore is called as heads up digitization. Semi-automatic and
automatic methods of digitizing requires post processing but saves lot of time and
resources compared to manual method.
Heads-down digitization:
Digitizers are used to capture data from hardcopy maps. Heads down digitization is done on
a digitizing table using a magnetic pen known as Puck. The position of a cursor or puck is
detected when passed over a table inlaid with a fine mesh of wires. The function of a
digitizer is to input correctly the coordinates of the points and the lines. Digitization can be
done in two modes.
Point mode :
In this mode, digitization is started by placing a point that marks the beginning of the
feature to be digitized and after that more points are added to trace the particular feature
(line or polygon). The number of points to be added to trace the feature and the space
interval between two consecutive points are decided by the operator.
Stream mode :
In stream digitizing, the cursor is placed at the beginning of the feature, a command is then
sent to the computer to place the points at either equal or unequal intervals as per the
position of the cursor moving over the image of the feature.
Georeferencing:
Raster data is obtained from many sources, such as satellite images, aerial cameras, and
scanned maps. Modern satellite images and aerial cameras tend to have relatively accurate
location information, but might need slight adjustments to line up all your GIS data.
Scanned maps and historical data usually do not contain spatial reference information. In
these cases you will need to use accurate location data to align or georeference your raster
data to a map coordinate system.
A map coordinate system is defined using a map projection method by which the curved
surface of the earth is portrayed on a flat surface.
When georeferencing raster data, define its location using map coordinates and assign the
coordinate system of the map frame. Georeferencing raster data allows it to be viewed,
queried, and analyzed with other geographic data. In general, there are four steps to
georeference your data:
1.Add the raster dataset that you want to align with your projected data.
2.Use the Georeference tab to create control points, to connect raster to known positions
in the map.
3.Review the control points and the errors.
4.Save the georeferencing result, when you are satisfied with the alignment.

Aligning the raster with control points:


The process involves identifying a series of ground control points (known x,y coordinates )
that link locations on the raster dataset with locations in the spatially referenced data.
Control points are locations that can be accurately identified on the raster dataset and in
real-world coordinates. Many different types of features can be used as identifiable
locations, such as road or stream intersections, the mouth of a stream, rock outcrops, the
end of a jetty of land, the corner of an established field, street corners, or the intersection
of two hedgerows.
The control points are used in conjunction with the transformation to shift and warp the
raster dataset from its existing location to the spatially correct location. The connection
between one control point on the raster dataset (the from point) and the corresponding
control point on the aligned target data (the to point) is a control point pair.
The number of links needed to create depends on the complexity of the transformation
planned to use to transform the raster dataset to map coordinates. However, adding more
links will not necessarily yield a better registration. If possible, spread the links over the
entire raster dataset rather than concentrating them in one area. Typically, having at least
one link near each corner of the raster dataset and a few throughout the interior produces
the best results.
Generally, the greater the overlap between the raster dataset and target data, the better the
alignment results, because you'll have more widely spaced points with which to georeference
the raster dataset. For example, if the target data only occupies one-quarter of the area of
raster dataset, the points used to align the raster dataset would be confined to that area of
overlap. Thus, the areas outside the overlap area are not likely to be properly aligned.
Georeferenced data is only as accurate as the data to which it is aligned.
Transforming the Raster:
When enough control points are created, transform the raster dataset to the map
coordinates of the target data. Among the several types of transformations, such as
polynomial, spline, adjust, projective, or similarity, determine the correct map coordinate
location for each cell in the raster.
The polynomial transformation uses a polynomial built on control points and a least-squares
fitting (LSF) algorithm. It is optimized for global accuracy but does not guarantee local
accuracy.
The polynomial transformation yields two formulas: one for computing the output x-
coordinate for an input (x,y) location and one for computing the y-coordinate for an input
(x,y) location. The goal of the least-squares fitting algorithm is to derive a general formula
that can be applied to all points, usually at the expense of slight movement of the to
positions of the control points. The number of the noncorrelated control points required for
this method must be 1 for a zero-order shift, 3 for a first order affine, 6 for a second order,
and 10 for a third order. The lower order polynomials tend to give a random type error, while
the higher order polynomials tend to give an extrapolation error.
A zero-order polynomial is used to shift your data. This is commonly used when data is
already georeferenced, but a small shift will better line up the data. Only one control point is
required to perform a zero-order polynomial shift.
The first-order polynomial transformation is commonly used to georeference an image.
Use a first-order or affine transformation to shift, scale, and rotate a raster dataset. This
generally results in straight lines on the raster dataset mapped as straight lines in the
warped raster dataset. Thus, squares and rectangles on the raster dataset are commonly
changed into parallelograms of arbitrary scaling and angle orientation. The below equation
transform a raster dataset using the affine (first order) polynomial transformation and see
how six parameters define how a raster's rows and columns transform into map
coordinates.

The higher the transformation order, the more complex the distortion that can be
corrected. However, transformations higher than third order are rarely needed. Higher-
order transformations require more links and, thus, will involve progressively more
processing time. In general, if a raster dataset needs to be stretched, scaled, and
rotated, use a first-order transformation. If, however, the raster dataset must be bent or
curved, use a second- or third-order transformation.
Interpret the root mean square error:
When the general formula is derived and applied to the control point, a measure of the
residual error is returned. The error is the difference between where the from point
ended up as opposed to the actual location that was specified. The total error is
computed by taking the root mean square (RMS) sum of all the residuals to compute
the RMS error. This value describes how consistent the transformation is between the
different control points. When the error is particularly large, remove and add control
points to adjust the error.
All residuals closer to zero are considered more accurate. You can permanently
transform raster dataset after georeferencing it.

Raster File Format:


Raster data represents the world as a surface divided into regular grid of cells. Raster
data models are useful for storing data that varies continuously, as in an aerial
photograph, a satellite image or an elevation surface.
There are two types of raster data: continuous and discrete. Raster stores the data in
the type of digital image represented by reducible and enlargeable grids and these grid
of cells contains a value representing information, such as temperature, discrete data
represents features such as land-use or soils data.
Raster data provides a matrix of cells with values representing a coordinate and
sometimes linked to an attribute table and it is much simpler for many layers
combinations. Raster data is very easy to modify or program due to simple data
structure.
Rasters are in part defined by their pixel depth. Pixel depth defines the range of distinct
values the raster can store. For example, a 1-bit raster can only store 2 distinct values: 0
and 1.
There is a wide range of raster file formats used in the GIS world. Some of the most
popular ones are listed below.

Graphic Interchange Format (GIF):


Graphic Interchange Format. A file format for image files, commonly used on the
Internet. It is well-suited for images with sharp edges and relatively few gradations of
color. A bitmap image format generally used for small images.

Tagged Image File Formats (TIFF):


This format is associated with scanners. It saves the scanned images and reads them.
TIFF can use run length and other image compression schemes. It is not limited to
256 colors like a GIF. Widespread use in the desktop publishing world. It serves as an
interface to several scanners and graphic arts packages. TIFF supports black-and-
white, grayscale, pseudo color, and true color images, all of which can be stored in a
compressed or decompressed format.
Geo Tagged Image File Formats (GeoTIFF):
TIFF variant enriched with GIS relevant metadata, As part of a header in a TIFF
format it puts Lat/Long at the edges of the pixels. GeoTIFF driver supports reading,
creation and update of internal overviews. Internal overviews can be created on
GeoTIFF files opened in update mode (with gdaladdo for instance).
If the GeoTIFF file is opened as read only, the creation of overviews will be done in an
external .ovr file. The GeoTIFF format is fully compliant with TIFF 6.0, so software
incapable of reading and interpreting the specialized metadata will still be able to open a
GeoTIFF format file.

RS Landsat:
Landsat satellite imagery and BIL information are used in RS Landsat. In one format,
using BIL, pixel values from each band are pulled out and combined. Programs that use
this kind of information include IDRISI, GRASS, and MapFactory. It is fairly easy to
exchange information from within these raster formats.

Joint Photographic Experts Group (JPEG2000):


Open-source raster format. A compressed format, allows both lossy and lossless
compression. JPEG 2000 is a non-proprietary image compression format based on ISO
standards, and typically uses .jp2 as the file extension. It’s advantages are that it offers
lossy and lossless compression, and world files (.j2w) can be used to georeference an
image in GIS software. Compression ratios are similar to MrSID and ECW formats.

Portable Network Graphics (PNG):


Provides a well-compressed, lossless compression for raster files. It supports a large
range of bit depths from monochrome to 64-bit color. Its features include indexed color
images of up to 256 colors and effective 100 percent lossless images of up to 16 bits per
pixel.

Digital Elevation Models(DEM):


The representation of continuous elevation values over a topographic surface by a regular
array of z-values, referenced to a common vertical datum. DEM is sometimes used as a
generic term for DSMs and DTMs, only when DEM representing height information
without any further definition about the surface.
A DEM can be represented as a raster data(a grid of squares, also known as a heightmap
when representing elevation) or as a vector-based triangular irregular network (TIN).
Digital Elevation Models or DEM have two types of displays
The first is 30-meter elevation data from 1:24,000 seven-and-a-half minute quadrangle
map. The second is the 1:250,000 3 arc-second digital terrain data. DEMs are produced
by the National Mapping Division of USGS.

JPEG File Interchange Format (JFIF):


A standard compression technique for storing full-color and grayscale images. Support
for JPEG compression is provided through the JFIF file format.

VECTOR FILE FORMAT:


Geospatial data are stored in many different file formats. Each geographic information
system (GIS) software package, and each version of these software packages, supports
different formats. This is true for both vector and raster data.
The most common vector file format is the shapefile. Shapefiles, developed by ESRI in
the early 1990s are simple, nontopological files developed to store the geometric
location and attribute information of geographic features. Shapefiles are incapable of
storing null values, as well as annotations or network features. Field names within the
attribute table are limited to ten characters, and each shapefile can represent only point,
line, or polygon feature sets. Supported data types are limited to floating point, integer,
date, and text. Shapefiles are supported by almost all commercial and open-source GIS
software.
Shapefiles
The Shapefile format is a popular geospatial vector data format for geographic
information system (GIS) software for storing the location, shape, and attributes of
geographic features. It is developed and regulated by Esri as a (mostly) open
specification for data interoperability among Esri and other GIS software products.
A Shapefile is stored in a set of related files and contains one feature class. The Shapefile
is the most common geospatial file type. It become the industry standard and need a
complete set of files that are mandatory to make up a Shapefile.

The required files are –


.shp is a mandatory Esri file that gives features their geometry. Every Shapefile has its
own .shp file that represent spatial vector data. For example, it could be points, lines and
polygons in a map.
.shx are mandatory Esri and AutoCAD shape index position. This type of file is used to
search forward and backwards.
.dbf is a standard database file used to store attribute data and object IDs. A .dbf file is
mandatory for shape files. You can open .DBF files in Microsoft Access or Excel.
.prj is an optional file that contains the metadata associated with the shapefiles
coordinate and projection system. If this file does not exist, you will get the error
“unknown coordinate system”. If you want to fix this error, you have to use the “define
projection” tool Which generates .prj files.
.xml file types contains the metadata associated with the shapefile. If you delete this file,
you essentially delete your metadata. You can open and edit this optional file type (.xml)
in any text editor.
.sbn is an optional spatial index file that optimizes spatial queries. This file type is saved
together with a .sbx file. These two files make up a shape index to speed up spatial
queries.
.sbx are similar to .sbn files in Which they speed up loading times. It works with .sbn
files to optimize spatial queries. We tested .sbn and .sbx extensions and found that there
were faster load times When these files existed. It was 6 seconds faster (27.3 sec versus
33.3 sec) compared with/without .sbn and .sbx files.
.cpg are optional plain text files that describes the encoding applied to create the
Shapefile. If your Shapefile doesn’t have a cpg file, then it has the system default
encoding.

The earliest vector format file for use in GIS software packages, which is still in use
today, is the ArcInfo coverage. This georelational file format supports multiple features
types (e.g., points, lines, polygons, annotations) while also storing the topological
information associated with those features. Attribute data are stored as multiple files in
a separate directory labeled “Info.” Due to its creation in an MS-DOS environment,
these files maintain strict naming conventions. File names cannot be longer than
thirteen characters, cannot contain spaces, cannot start with a number, and must be
completely in lowercase.
RASTER GIS FILE FORMATS:
Raster data is made up of pixels (also referred to as grid cells). They are
usually regularly-spaced and square but they don’t have to be. Raster have
pixel that are associated with a value (continuous) or class (discrete).

File
Extension Description
Type
ERDAS Imagine IMG files are a
proprietary file format developed by
Hexagon Geospatial. IMG files are
commonly used for raster data to store
single and multiple bands of satellite
data.

IMG files use hierarchical formats (HFA)


that are optional to store basic
ERDAS Imagine (IMG) .IMG information about the file. For example,
this can include file information, ground
control points and sensor type.

Each raster layer as part of an IMG file


contains information about its data
values. For example, this includes
projection, statistics, attributes,
pyramids and whether or not it’s a
continuous or discrete type of raster.
ASCII uses a set of numbers (including
floats)
between 0 and 255 for information storage
and
processing. They also contain
header
information with a set of keywords.
American Standard
Code for Information In their native form, ASCII text files store
Interchange ASCII Grid .ASC GIS data in a delimited format. This could
be comma, space or tab-delimited format.
Going from non-spatial to spatial data, you
can run a conversion process tool like
ASCII to raster.

The GeoTIFF has become an industry image


standard file for GIS and satellite remote
sensing applications. GeoTIFFs may be
accompanied by other files:

TFW is the world file that is required to


give your raster geolocation.

XML optionally accompany GeoTIFFs and


are your metadata.

.TIF AUX auxiliary files store projections and


.TIFF other information.
OVR pyramid files improve performance for
GeoTIFF .OVR
raster display.

IDRISI assigns RST extensions to all raster


layers. They consist of numeric grid cell
values as integers, real numbers, bytes
and RGB24.

.RST The raster documentation file (RDC) is a


IDRISI Raster .RDC companion text file for RST files. They
assign the number of columns and rows
to RST files.
Further to this, they record the file type,
coordinate system, reference
units and positional error.

Band Interleaved files are a raster storage


extension for single/multi-band
aerial and satellite imagery.

Band Interleaved for Line (BIL)


stores pixel information based on rows for
all bands in an image.

Whereas Band interleaved by pixel


.BIL (BIP) assigns pixel values for each band
.BIP by
Envi RAW Raster
.BSQ rows.

Finally, Band sequential format (BSQ)


stores separate bands by rows.

BIL files consist of a header file (HDR) that


describes the number of columns, rows,
bands, bit depth and layout in an image.

Grid files are a proprietary format


developed by
Esri. Grids have no extension and are
unique because they can hold attribute
data in a raster file. But the catch is that
you can only add attributes to integer
grids.

Attributes are stored in a value attribute


tables (VAT) – one record for each unique
value in the grid, and the count
representing the number of cells.
Esri Grid
The two types of Esri Grid files are integer
and floating point grids. Land cover
would be an example of a discrete grid.
Each class has a unique integer cell value.
Elevation data is an example of a floating
point grid. Each cell represents an
elevation floating value.

VECTOR GIS FILE FORMATS:

Vector data is not made up of grids of pixels. Instead, vector graphics are
comprised of vertices and paths. The three basic symbol types for vector data are
points, lines and polygons (areas).

Extension File Type Description


The shapefile is BY FAR the most common
geospatial file type you’ll encounter. All
commercial and open sources accept
shapefile as a GIS format. It’s so ubiquitous
that it’s become the industry standard.

But you’ll need a complete set of three files


.SHP, that are mandatory to make up a shapefile.
Esri Shapefile
.DBF, The three required files are:
.SHX
 SHP is the feature geometry.

 SHX is the shape index position.

 DBF is the attribute data.


The GeoJSON format is mostly for web-
based mapping. GeoJSON stores
coordinates as text in JavaScript Object Notation
(JSON) form.

This includes vector points, lines and


polygons as well as tabular information.

GeoJSON store objects within curly braces


.GEOJSON
Geographic JavaScript Object {} and in general have less markup overhead
Notation (GeoJSON) .JSON (compared to GML). GeoJSON has straightforward

syntax that you can modify in any text editor.

Webmaps browsers understand JavaScript so by


default GeoJSON is a common web format. But
JavaScript only understands binary objects.
Fortunately, JavaScript can convert JSON to
binary.

GML allows for the use of geographic


coordinates extension of XML. And eXtensible
Markup Language (XML) is both human-readable
and machine-readable.

GML stores geographic entities (features) in the


form of text. Similar to GeoJSON, GML can be
updated in any text editor. Each feature has a list
Geography Markup
.GML of properties, geometry (points, lines, curves,
Language (GML)
surfaces and polygons) and spatial reference
system.

There is generally more overhead when compare


GML with GeoJSON. This is because GML results in
more data for the same amount of information.
KML stands for Keyhole Markup Language.

This GIS format is XML-based and is

primarily used for Google Earth. KML was


developed by Keyhole Inc. which was later
acquired by Google.

KMZ (KML-Zipped) replaced KML as being the


default Google Earth geospatial format because it
is a compressed version of the file. KML/KMZ
Google Keyhole Markup .KML
became an international standard of the Open
Language (KML/KMZ)
.KMZ Geospatial Consortium in 2008.

The longitude, latitude components (decimal


degrees) are as defined by the World Geodetic
System of 1984 (WGS84). The vertical component
(altitude) is measured in meters from the WGS84
EGM96 Geoid vertical datum.

GPS Exchange format is an XML schema

that describes waypoints, tracks and routes

captured from a GPS receiver. Because GPX is an


exchange format, you can openly transfer GPS
data from one program to another based on its
GPS eXchange Format (GPX) .GPX description properties.

The minimum requirements for GPX are


latitude and longitude coordinates. In

addition, GPX files optionally stores


location properties including
time, elevation and geoid height as tags.
Data Editing:
During data encoding, one cannot expect to input an error free data set into your GIS.
Data may include errors derived from the original source data, as well as errors that have
been introduced during the encoding process.
There may be errors in co-ordinate data as well as inaccuracies and uncertainty in
attribute data.
It is better to intercept errors before they contaminate the GIS database and go on to
infect (propagate) the higher levels of information that are generated. The process is
known as data editing or ‘cleaning’.
Data editing can be likened to the filter between the fuel tank and the engine that keeps
the fuel clean and the engine running smoothly.
The following four phases constitute the data editing:
Detection and correction of errors; Re-projection,transformation and generalization; Edge
matching and rubber sheeting; and Updating of spatial databases.
Detecting and Correcting Errors:
Errors in input data may derive from three main sources: errors in the source data; errors
introduced during encoding; and errors propagated during data transfer and conversion.
Errors in source data may be difficult to identify.
For example, there may be subtle errors in a paper map source used for digitizing because
of the methods used by particular surveyors, or there may be printing errors in paper
based records used as source data.
During encoding a range of errors can be introduced. During keyboard encoding it is easy
for an operator to make a typing mistake; during digitizing an operator may encode the
wrong line; and folds and stains can easily be scanned and mistaken for real
geographical features.
During data transfer, conversion of data between different formats required by different
packages may lead to a loss of data. Errors in attribute data are relatively easy to spot
and may be identified using manual comparison with the original data. For example, if the
operator notices that a hotel has been coded as a café, then
the attribute database may be corrected accordingly.
Various methods, in addition to manual comparison,
exist for the correction of attribute errors.
These are described in Box 5.7.
TOPOLOGY:
Topology is the mathematical representation of the physical relationships that
exists between the geographical elements. Topology has long been a key GIS
requirement for data management and integrity.
In general, a topological data model manages spatial relationships by representing
spatial objects (point, line, and area features) as an underlying graph of topological
primitives—nodes, faces, and edges. These primitives, together with their relationships to
one another and to the features whose boundaries they represent, are defined by
representing the feature geometries in a planar graph of topological elements.
Topology is useful in GIS because many spatial modeling operations don’t require
coordinates, only topological information. For example, to find an optimal path between
two points requires a list of the arcs that connect to each other and the cost to traverse
each arc in each direction. Coordinates are only needed for drawing the path after it is

calculated.
The topological structure supports three major topological concepts:

 Connectivity: Arcs connect to each other at nodes.

 Area definition: Arcs that connect to surround an area define a polygon.

 Contiguity: Arcs have direction and left and right sides.

Connectivity
Connectivity is defined through arc-node topology. This is the basis for many
network tracing and path finding operations. Connectivity allows you to identify a route to
the airport, connect streams to rivers, or follow a path from the water treatment plant
to a house.

In the arc-node data structure, an arc is defined by two endpoints: the from- node
indicating where the arc begins and a to-node indicating where it ends. This is
called arc-node topology.
Figure: Arc-Node topology example
Arc-node topology is supported through an arc-node list. The list identifies the

from- and to-nodes for each arc. Connected arcs are determined by searching

through the list for common node numbers. In the above example, it is

possible to determine that arcs 1, 2, and 3 all intersect because they share node

11. The computer can determine that it is possible to travel along arc 1 and turn

onto arc 3 because they share a common node (11), but it's not possible to turn

directly from arc 1 onto arc 5 because they don't share a common node.

Containment:
Many of the geographic features that may be represented cover a distinguishable
area on the surface of the earth, such as lakes, parcels of land, and census tracts. An area
is represented in the vector model by one or more boundaries
defining a polygon. Although this sounds counterintuitive, consider a lake with an
island in the middle. The lake actually has two boundaries: one that defines its outer
edge and the island that defines its inner edge. In the terminology of the vector
model, an island defines an inner boundary (or hole) of a polygon.

The arc-node structure represents polygons as an ordered list of arcs rather


than a closed loop of x and y coordinates. This is called polygon-arc topology. In the
illustration below, polygon F is made up of arcs 8, 9, 10, and 7 (the 0 before the 7
indicates that this arc creates an island in the polygon).

Figure: Polygon-Arc topology


example
Each arc appears in two polygons (in the above example, arc 6 appears in the list
for polygons B and C). Since the polygon is simply the list of arcs defining its boundary,
arc coordinates are stored only once, thereby reducing the amount of data and ensuring
that the boundaries of adjacent polygons don't overlap.

Contiguity:
Two geographic features that share a boundary are called adjacent. Contiguity is
the topological concept that allows the vector data model to determine adjacency.
Polygon topology defines contiguity. Polygons are contiguous to each other if they
share a common arc. This is the basis for many neighbor and overlay operations.

Recall that the from-node and to-node define an arc. This indicates an arc's
direction so the polygons on its left and right sides can be determined. Left-right
topology refers to the polygons on the left and right sides of an arc. In the below
example, polygon B is on the left of arc 6, and polygon C is on the right. Thus we

know that polygons B and C are adjacent.

Figure: Left-Right topology example


Notice that the label for polygon A is outside the boundary of the area.
This polygon is called the external, or universe, polygon and represents the
world outside the study area. The universe polygon ensures that each arc
always has a left and right side defined.
Topology Rules:
There are many topology rules you can implement in your geodatabase,
depending on the spatial relationships that are most important for your organization
to maintain. You should carefully plan the spatial relationships you will enforce on
your features. Some topology rules govern the relationships of features within a
given feature class, while others govern the relationships between features in two
different feature classes or subtypes. Topology rules can be defined between sub
types of features in one or another feature class. This could be used, for example,
to require street features to be connected to other street features at both ends, except
in the case of streets belonging to the cul-de-sac or dead-end subtypes.

Many topology rules can be imposed on features in a geodatabase. A well-


designed geodatabase will have only those topology rules that define key spatial
relationships needed by an organization. Most topology violations have fixes that you can
use to correct errors.

Topology rules based on points:


Must Coincide With:
It is required that the points in one feature class (or subtype) be coincident
with points in another feature class (or subtype). This is useful for cases where
points must be covered by other points, such as transformers must coincide with power
poles in electric distribution networks and observation points must coincide with stations.
Must Be Disjoint:
It is required that the points be separated spatially from other points in the same
feature class (or subtype). Any points that overlap are errors. This is useful for
ensuring that points are not coincident or duplicated within the same feature class,
such as in layers of cities, parcel lot ID points, wells, or streetlamp poles.

Must Be Covered By Endpoint of:


It is required that the points in one feature class must be covered by the
endpoints of lines in another feature class. This rule is similar to the line rule

Endpoint Must Be Covered By except that, in cases where the rule is violated, it is
the point feature that is marked as an error rather than the line. Boundary corner
markers might be constrained to be covered by the endpoints of boundary lines.

Point Must Be Covered By Line:


Requires that points in one feature class be covered by lines in another feature
class. It does not constrain the covering portion of the line to be an endpoint. This rule
is useful for points that fall along a set of lines, such as highway signs along highways.
Must Be Properly Inside Polygons:

Requires that points fall within area features. This is useful when the point
features are related to polygons, such as wells and well pads or address points and
parcels.

Must Be Covered By Boundary of:


Requires that points fall on the boundaries of area features. This is useful when the
point features help support the boundary system, such as boundary markers, which must
be found on the edges of certain areas.

Topology rules based on Lines:

Must Not Have Dangles:


Requires that a line feature must touch lines from the same feature class (or
subtype) at both endpoints. An endpoint that is not connected to another line is called a
dangle. This rule is used when line features must form closed loops, such as when they
are defining the boundaries of polygon features. It may also be used in cases where lines
typically connect to other lines, as with streets. In this case, exceptions can be used
where the rule is occasionally violated, as with cul-de-sac or dead-end street segments.
Must Not Overlap:
Requires that lines not overlap with lines in the same feature class (or subtype).
This rule is used where line segments should not be duplicated, for
example, in a stream feature class. Lines can cross or intersect but cannot share
segments.

Must Not Self-Overlap:


Requires that line features not overlap themselves. They can cross or touch them
but must not have coincident segments. This rule is useful for features, such as
streets, where segments might touch in a loop but where the same street should not
follow the same course twice.

Must Not Self-Intersect:


Requires that line features not cross or overlap themselves. This rule is useful for
lines, such as contour lines, that cannot cross themselves.
Must Not Intersect:

Requires that a line in one feature class (or subtype) must only touch other lines of
the same feature class (or subtype) at endpoints. Any line segment in which features overlap
or any intersection not at an endpoint is an error. This rule is useful where lines must only be
connected at endpoints, such as in the case of plot lines, which must split (only connect to
the endpoints of) back lot lines and cannot overlap each other.

Must Not Have Pseudo Nodes:


Requires that a line connect to at least two other lines at each endpoint. Lines that
connect to one other line (or to themselves) are said to have pseudo nodes. This rule is used
where line features must form closed loops, such as when they define the boundaries of
polygons or when line features logically must connect to two other line features at each end,
as with segments in a stream network, with exceptions being marked for the originating ends
of first-order streams.
Must Be Larger Than Cluster Tolerance:
Requires that a feature does not collapse during a validate process. This rule is
mandatory for a topology and applies to all line and polygon feature classes. In instances
where this rule is violated, the original geometry is left unchanged.

Topology rules based on Polygons:

Must Not Overlap:


It is required that the interior of polygons does not overlap. The polygons can share
edges or vertices. This rule is used when an area cannot belong to two or more polygons.
It is useful for modeling administrative boundaries, such as ZIP
Codes or voting districts, and mutually exclusive area classifications, such as land
cover or landform type.

Must Not Have Gaps:


This rule requires that there are no voids within a single polygon or between adjacent
polygons. All polygons must form a continuous surface. An error will always
exist on the perimeter of the surface. You can either ignore this error or mark it as an
exception. Use this rule on data that must completely cover an area. For

example, soil polygons cannot include gaps or form voids—they must cover an entire
area.
Contains Point:
It is required that a polygon in one feature class contain at least one point from another
feature class. Points must be within the polygon, not on the boundary.

This is useful when every polygon should have at least one associated point, such as
when parcels must have an address point.

Contains One Point:


It is required that each polygon contains one point feature and that each point feature
falls within a single polygon. This is used when there must be a one-to- one correspondence
between features of a polygon feature class and features of a
point feature class, such as administrative boundaries and their capital cities. Each point
must be properly inside exactly one polygon and each polygon must properly contain exactly
one point. Points must be within the polygon, not on the boundary.

Must Not Overlap With:


Requires that the interior of polygons in one feature class (or subtype) must not
overlap with the interior of polygons in another feature class (or subtype).
Polygons of the two feature classes can share edges or vertices or be completely
disjointed. This rule is used when an area cannot belong to two separate feature
classes. It is useful for combining two mutually exclusive systems of area classification,
such as zoning and water body type, where areas defined within the

zoning class cannot also be defined in the water body class and vice versa.

Must Cover Each Other:

Requires that the polygons of one feature class (or subtype) must share
all of their area with the polygons of another feature class (or subtype). Polygons may share
edges or vertices.
Any area defined in either feature class that is not shared with the other is an error. This rule
is used when two systems of classification are used for the same geographic area, and any
given point defined in one system must also be defined in the other. One such case occurs
with nested hierarchical datasets, such as census blocks and block groups or small
watersheds and large drainage basins. The rule can also be applied to non-hierarchically
related polygon feature classes, such as soil type and slope class.

Area Boundary Must Be Covered By Boundary of:

Requires that boundaries of polygon features in one feature class (or subtype)
be covered by boundaries of polygon features in another feature class (or subtype).
This is useful when polygon features in one feature class, such as subdivisions, are composed
of multiple polygons in another class, such as parcels, and the shared boundaries must be
aligned.

ATTRIBUTE DATA LINKING:

There are two types of GIS data: spatial data (coordinate and projection
information for spatial features) and attribute data. Attribute data is additional
information appended in tabular format linked with spatial features. The attribute data is
linked with spatial data through unique id (i.e. feature ID). The spatial data contains
information about where and attribute data can contain information about what, where, and
why. Attribute data provides characteristics about spatial data.
Figure: Attribute data and spatial data linking

Joins:

When our data was all in a single table, we could easily retrieve a particular row from
that table. But if the data we are looking for is available in two or more tables then
joins can be used to retrieve those data. Join is used to fetch data from two or more
tables, which is joined to appear as single set of data. It is used for combining column
from two or more tables by using values common to both tables. There are several
types of JOINs: INNER, LEFT OUTER and RIGHT OUTER; they all do slightly different
things, but the basic theory behind them all is the same.

Inner Join:

An INNER JOIN returns a result set that contains the common elements of the tables,
i.e. the intersection where they match on the joined condition. An INNER JOIN focuses
on the commonality between two tables. When using an INNER JOIN, there must be
at least some matching data between two (or more) tables that are being compared.
INNER JOINs are the most frequently used JOIN operation.
Left Outer join:
A LEFT JOIN or a LEFT OUTER JOIN takes all the rows from one table, defined as the
left table, and joins it with a second table. A LEFT JOIN will always include the rows from the
LEFT table, even if there are no matching rows in the table it is joined with.

Left outer
join

Right Outer Join:


A RIGHT OUTER JOIN is similar to a LEFT OUTER JOIN except that the roles between
the two tables are reversed, and all the rows on the second table are
included along with any matching rows from the first table i.e. A RIGHT JOIN will
always include the rows from the RIGHT table, even if there are no matching rows in the
table it is Joined with.
Relates:
Relates can help us to discover specific information within our data. A relate (also
called a table relate) is a property of a layer. We can create a table relate so

that we can query and select features in one layer and see all
the related features in another layer or table. Unlike joining tables,
relating tables simply defines a relationship between two tables. The associated data isn't
appended to the layer's attribute table like it is with a join. Instead, we can access the
related data through selected features or records in your layer or table.

Relation Class:

A relationship class is an object in a geo-database that stores information about a


relationship between two feature classes, between a feature class and a non-spatial table,
or between two non-spatial tables. Both participants in a relationship class must be
stored in the same geo-database.

A relationship class stores information about associations among features and


records in a geo-database and can help ensure your data's integrity. Relates that are
added to a layer or table in a map are essentially the same as simple relationship classes
defined in a geo- database, except that they are saved with the map instead of in a geo-
database.
E-References
GIS Unit III Quiz –
https://quizizz.com/print/quiz/5d41149e8f3224001a75849c

Introduction to Geographic Information Systems


https://youtu.be/WYy-owOcFsY

What is Geographic Information System


https://youtu.be/vJAQHA5XQWI

Different types of vector data and concept of topology


https://youtu.be/DQLkdS7omcg

Raster data models and comparisons with vector

https://youtu.be/Dwho7aF4yH

Errors in GIS and Key Elements of Maps

https://youtu.be/LFUaNBg7oVY
ASSIGNMENT -II

Different types of vector data and


concept of topology
UNIT III - DATA INPUT AND TOPOLOGY

1.Define data input.


The creation of digital spatial data.

2. Different methods of data input?


_ Key board entry
_ O.C.R.
_ Digitizing
_ Manual digitizing
_ Automatic digitizing
_ Scanning
_ Automatic line follower
_ Electronic data transfer

3. What is digitizing?
The most common method employed in encoding data from a paper map.
_ Manual digitizing
_ Automatic digitizing
_ Scanning
_ Automatic line follower
• A light source.
• A back ground.
• A lens.
4. Write the errors in digitizing?
_ Scale and resolution of the source/base map.
_ Quality of the equipment and the software used.
_ Incorrect registration.
_ A shaky hand.
_ Line thickness.
_ Overshoot.
_ Under shoot.
_ Spike.
_ Displacement.
_ Polygonal knot.
_ Psychological errors.
5. What is scanning?
A piece of hard ware for converting an analogue source of document into digital
raster format (a light sensitive device).

_ Most commonly used method.


_ When raster data are there to be encoded scanning is the most appropriate
option.

6. Write the practical problems in scanning?


_ Possibility of optical distortion associated with the usage of flat bed scanners.
_ Automatic scanning of unwanted information.
_ Selection of appropriate scanning tolerance to ensure important data are encoded,
and background data ignored.

_ The format of files produced and the input of data into G.I.S. software.
_ The amount of editing required to produce data suitable for analysis.
7. What are the different types of Scanners?
The following four types of scanner are commonly used in GIS and remote sensing.
a. Mechanical Scanner b. Video Camera with CRT (cathode ray tube) c. CCD Camera.
d. CCD Scanner
8. What is GPS?
GPS or Global Positioning System is a constellation of 27 satellites orbiting the earth at
about 12000 miles. These satellites are continuously transmitting a signal and anyone
with a GPS receiver on earth can receive these transmissions at no charge.
9.What are the different types of grid based models ?
The Grid based models can be broadly classified as
1) Weighted Summation models `
2) Weighted mean model
3) Unique combination model
10.Write the component of scanner
• A light source.
• A back ground.
• A lens.
11.Write short notes on topographical in GIS
In geo-databases, a topology is a set of rules that defines how point, line, and polygon
features share coincident geometry. Topology describes the means whereby lines,
borders, and points meet up, intersect, and cross. This includes how street centrelines
and census blocks share common geometry, and adjacent soil polygons share their
common boundaries.
Another example could be how two counties that have a common boundary between
them will share an edge, creating a spatial relationship. Common terms used when
referring to topology include: dimensionality, adjacency, connectivity, and containment,
with all but dimensional dealing directly with the spatial relationships of features.
Dimensionality - the distinction between point, line, area, and volume, which are said to
have topological dimensions of 0, 1, 2, and 3 respectively.
12.What is overlying?
Map overlay is the process by which it is possible to take two or more different thematic
map layers of the same area and overlay them on top of the other and form a composite
new layer this techniques is used to overlay vector data on a raster image. In Vector base
systems map overlay is time consuming, complex and computationally expensive. In raster
based systems it is quick, straightforward and efficient
13.What data is collected by GPS?
GPS Data Collection. The Global Positioning System (GPS) is a satellite-based navigation
system. A GPS unit determines its position using satellites that orbit the earth. Each
satellite's position, as well as the current time, is transmitted via radio signals.

14.What are the applications of GPS?


Some of the industries that use this application include;
• Aviation. Most of the modern aircraft use GPS receivers to provide the pilots and
passenger with real-time aircraft position.
• Marine.
• Farming.
• Science.
• Surveying.
• Military.
• Financial Services.
• Telecommunications
15.What is the difference between GPS remote sensing and GIS?

First do remote sensing than do GNSS (GPS) and at last GIS. GPS (global positioning system)
is a way to assign a location to a point on the Earth. Remote sensing is the use of sensors on
board either planes or satellites to collect data usually in a grid like pattern of pixels called
raster data
16.What is Conflation?

Conflation occurs when the identities of two or more individuals, concepts, or places, sharing
some characteristics of one another, seem to be a single identity; the differences appear to
become lost. In logic, it is the practice of treating two distinct concepts as if they were one,
which produces errors or misunderstandings as a fusion of distinct subjects, tends to obscure
analysis of relationships which are emphasized by contrasts. However, if the distinctions
between the two concepts appear to be superficial, intentional conflation M/J is desirable for
the sake of conciseness and recall.
17.List out the types of scanner.

1) Flat bed scanner


2) Rotating drum scanner
3) Large format feed scanner

18.What are the different GPS systems?

The four global GNSS systems are – GPS (US), GLONASS (Russia), Galileo (EU),
BeiDou (China). Additionally, there are two regional systems – QZSS (Japan) and
IRNSS or NavIC (India)

19.Define Topology.
Topology is a mathematical approach that allows us to structure data based on the
principles of feature adjacency and feature connectivity. It is in fact the mathematical
method used to define spatial relationships.
20.Why is topology important in GIS?
Topology is very important in GIS because it effectively models the relationship of spatial
entities. ... Topology facilitates the editing of shared features between different spatial
layers and is a mechanism to ensure integrity with spatial data

21. What is topological data model?

The topology data model of Oracle Spatial lets you work with data about nodes, edges,
and faces in a topology. For example, United States Census geographic data is provided in
terms of nodes, chains, and polygons, and this data can be represented using the Spatial
topology data model.
22.Do shapefiles have topology?
In GIS, topology is implemented through data structure. A shapefile is a non-
topological data structure that does not explicitly store topological relationships.
However, unlike other simple graphic data structures, shapefile polygons are
represented by one or more rings.

23. List topology rules in GIS.


There are many topology rules that you can use in creating spatial datasets. : Point in
Feature Class X must lie within polygons in Feature Class Y, Polygons in Feature Class
X must completely cover polygons in Feature Class Y, Lines in Feature Class X must
intersect lines in Feature Class Y. For a more complete list of rules, see the ArcGIS
Desktop Help (or Online Help), open the index and navigate to Topology Rules :
Topology Rules (Editing in ArcMap).

24.How topology implemented in GIS?


1. Topology exists between Feature Classes in a Feature Dataset
2.A Feature Dataset can be considered an association of Feature Classes, but a Feature
Dataset itself has spatial properties such as a spatial reference system and XY domain.
Any Feature Class in a Feature Dataset must adhere to the same spatial properties as
defined in the Feature Dataset.
3.Topology rules reside within a Feature Dataset. These rules have two characteristics:
1) the type of rule (within, intersecting, overlapping...) and 2) the feature classes that
are members of that rule.
4.Most of the topological operations in ArcMap are available from two resources a.
ArcToolbox : Data Management : Topology b. ArcMap : Editor toolbar : Advanced
Editing : Topology
26. State the advantages and disadvantages of topological data model.
Advantages:
The topological model is utilized because it effectively models the relationship of
spatial entities. Accordingly, it is well suited for operations such as contiguity and
connectivity analyses. Contiguity involves the evaluation of feature adjacency, e.g.
features that touch one another, and proximity, e.g. features that are near one
another. The primary advantage of the topological model is that spatial analysis can be
done without using the coordinate data. Many operations can be done largely, if not
entirely, by using the topological definition alone. This is a significant advantage over
the CAD or spaghetti vector data structure that requires the derivation of spatial
relationships from the coordinate data before analysis can be undertaken.

Disadvantages:
The major disadvantage of the topological data model is its static nature. It can be a
time consuming process to properly define the topology depending on the size and
complexity of the data set. For example, 2,000 forest stand polygons will require
considerably longer to build the topology that 2,000 municipal lot boundaries. This is
due to the inherent complexity of the features, e.g. lots tend to be rectangular while
forest stands are often long and sinuous. This can be a consideration when evaluating
the topological building capabilities of GIS software

27. What is raster format?


Raster data formats. Two common data formats based on the raster data model are
grids and images. Grids. Grids are an ESRI file format used to store both discrete
features such as buildings, roads, and parcels, and continuous phenomena such as
elevation, temperature, and precipitation.

28. What are the types of raster and vector data formats?
Raster data is cell-based and this data category also includes aerial and satellite
imagery. There are two types of raster data: continuous and discrete. An example
of discrete raster data is population density. Continuous data examples are
temperature and elevation measurements.
29.What are raster file formats?
Raster graphics are the most common type of image files. While some raster image
formats are uncompressed, most use some type of image compression. Common raster
image file extensions include .BMP, .TIF, .JPG, .GIF, and .PNG. Other image file
categories include Vector Graphic and 3D Image files.

30. Define in GPS Satellite Navigation


System GPS is a satellite navigation system used to determine the ground position of an
object. GPS technology was first used by the United States military in the 1960s and
expanded into civilian use over the next few decades.
PART-B

1. Explain about scanner in detail.


2. List Raster data file formats and explain in detail.
3. Describe about topology and its consistency rules.
4. Explain the concept on ODBC in detail.
5. Explain the concept GPS based mapping in detail.

6.Briefly explain the Scanners for Raster Data Input in GIS.


7.Write detailed notes on raster data input system in GIS
8. Briefly explain the digitization processes in GIS.
9. Write detailed notes on raster data file formats.
10.Describe the vector data input in GIS.
11.Briefly explain the digitization processes in GIS.
12.Briefly explain the Topology in GIS.
13.Write detailed notes on linking the attribute data to the spatial data.
14. Briefly explain the Open Database Connectivity (ODBC)
15. Describe the GPS, Uses of GPS, structure of GPS and GPS positioning etc.
16. Briefly explain the Concept of GPS based mapping
Supportive online Certification courses

NPTEL : https://nptel.ac.in/courses/105/107/105107155/

Swayam : https://swayam.gov.in/nd1_noc20_ce20/preview

coursera : https://www.coursera.org/learn/gis

Udemy : https://www.udemy.com/course/gis-for-everyone/

Youtube : https://www.youtube.com/channel/UC5U9dlGMhR2qXk5u8Vx5fcg
Real time Applications in day to day life and to Industry

1. Any real time applications which handles the type of input of data and
Topology System needs the support of a GIS tool.
Assessment Schedule

Internal Assessment Test I


: 09.09.2023

Internal Assessment Test II:


26.10.2023
:
Model Examination
15.11.2023

End Semester 05.12.2023


Prescribed Textbooks & Reference Books

TEXT BOOKS:

1. Kang - Tsung Chang, Introduction to Geographic Information Systems, McGraw Hill


Publishing, 2nd Edition, 2011.
2.Ian Heywood, Sarah Cornelius, Steve Carver, Srinivasa Raju, “An Introduction
Geographical Information Systems, Pearson Education, 2nd Edition,2007.

REFERENCE:

1. Lo.C.P., Albert K.W. Yeung, Concepts and Techniques of Geographic Information


Systems, Prentice-Hall India Publishers, 2006
Mini Project suggestions

Create an Interactive GIS Project for Your


Favorite Bookstore, Restaurant Location, etc
Thank you

Disclaimer:

This document is confidential and intended solely for the educational purpose of RMK Group of Educational
Institutions. If you have received this document through email in error, please notify the system manager. This
document contains proprietary information and is intended only to the respective group / learning community as
intended. If you are not the addressee you should not disseminate, distribute or copy through e-mail. Please
notify the sender immediately by e-mail if you have received this document by mistake and delete this document
from your system. If you are not the intended recipient you are notified that disclosing, copying, distributing or
taking any action in reliance on the contents of this information is strictly prohibited.

You might also like