An Introduction To Spatial Database Spatial Data Types and Spatial Index Analysis Using Multidimensional Geodatabase
An Introduction To Spatial Database Spatial Data Types and Spatial Index Analysis Using Multidimensional Geodatabase
An Introduction To Spatial Database Spatial Data Types and Spatial Index Analysis Using Multidimensional Geodatabase
Abstract — This paper covers the overview of Spatial Database System which offers Spatial Data Structure along with the
Spatial Datatypes. It is most commonly used in Geographic Information Systems (GIS) and other applications. Many
applications operate on spatial data which include lines, points, regions and polygons etc. Spatial data are large in size and
complex in structure and relationship hence spatial indexes are required to retrieve the desired result-set from the large
dataset in optimum timeframe. The main emphasis on hierarchical data structures, including the number of indexing
techniques like HHCode, R-tree, R+tree, Quadtree, Octree, UBtree etc. which are often used to improve query processing
time in spatial database. Such techniques are also known as spatial indexing methods. Comparative study also covered in
the paper for the R*-Tree and R-Tree indexing techniques, and results shown that the better performance of the queries
using appropriate index.
Keywords- Spatial Index, GIS, Quadtree, R-tree, R*tree.
I. INTRODUCTION
Spatial data denotes to all types of data objects or elements that are present in a geographical space or horizon. It enables
the tracing of any locating, individuals or devices anywhere in the earth. Spatial data is also known as geospatial data, spatial
information or geographic information. Spatial data is used in GIS and other location or Positioning Services. Spatial data
consists of points, lines, polygons and other geographic and geometric data primitives, which can be mapped by location,
stored with an object as metadata or used by a communication system to detect user position. Spatial data can be classified as
scalar or vector data. Each one provides different information related to geographical or spatial positions.
The data that contains the particulars of any location {latitude and longitude, or height and depth} objects is the spatial data.
When the object is rendered, this spatial data is used to project the locations of the objects on a 2-dimensional piece of paper.
A GIS is frequently used to store, retrieve, and render geographic spatial data. Other types of spatial data which can be stored
using the Spatial Data Option besides GIS data. It includes data from Computer Aided Design (CAD) and Computer Aided
Manufacturing (CAM) systems. Instead of operating on objects on a geographic scale, CAD/CAM systems work on a smaller
scale such as for an auto-mobile machine or much smaller scale as for printed circuit boards.
A. Point Data
Point is basically a location of any particular place which can be identify with the name or landmark. Generally point data is
used to denote unconnected locations. Points have no dimensions; hence it can neither be used to measure length nor the area
of the location. Points have various examples like: hotel, School and training center etc. The example below shows the
location of ATM and Bank Branch. Point features are also used to represent abstract points. For example, point locations
could represent city locations or name of any places as mentioned in the picture (Fig 1) below:
Fig 1. GIS Point data showing Location of ATM & Bank branch
B. Line Data
Line segment is a distance between two distinct points. It has beginning point and ending point. Line features have one
dimension i.e. length. Line data is used to represent collinear points; some of the common examples are national highways,
rivers and canals etc. Line types (solid lines versus dashed lines) and combinations can be representing using colors, line type
and line thicknesses [1, 2]. In the below example roads are distinguished from the internal roads designating roads as a dashed
blue line and the national highways as solid yellow line.
Fig 2. Internal roads are shown as dashed blue lines and National Highways major roads as solid yellow lines.
The data of both the line and point entities represent polygon data on a much smaller scale. It helps reduce confusion by
simplifying data positions. As the functions approach, the location of a school point is more realistically represented by a
series of building footprints that show the physical location of the campus. The line features of a line file of a road represent
only the physical location of the road. If a greater degree of spatial resolution is required, a file of the width of the road curve
will be used to show the width of the road and any characteristics such as median and rights of way (or sidewalks).
C. Polygon Data
Polygons are used to represent areas such as the border of a city (on a large-scale map), the lake or the forest. The
characteristics of the polygon are two-dimensional and, therefore, can be used to measure the area and the perimeter of a
geographical feature. The characteristics of the polygon are most commonly distinguished using a thematic mapping
symbology (color schemes), models or, in the case of numerical gradation, a color gradation scheme that could be used.
Fig 3. In polygon based dataset, different region areas are shown in color symbology
With the HHCode being an open-source data format, several spatial data and software companies have adopted it in various
products targeted at very large corporate data users, namely Helical Systems Inc. and CubeWerx.
c) Z-order (curve)
The Z-order can be used to efficiently construct a quadtree for a set of points [5]. The main idea is to sort the input set
according to the Z-order. Once classified, the points can be stored in a binary search tree and used directly, which is called
linear quadtree or can also be used to build a quadtree based on the pointer.
d) Quadtree
A quadtree is a tree data structure in which each internal node has exactly four children. It is basically has two-dimensional
analog of the most frequently used cotrees to divide a two-dimensional space, subdividing it recursively into four quadrants or
regions [6]. The data associated with a child cell varies depending on the application, but the child cell represents an
"interesting spatial information unit".
The Quadtrees can be classified according to the type of data they represent, including areas, points, lines and curves etc. The
Quadtrees can also be classified according to whether the shape of the tree is independent of the order in which the data is
processed.
©2018, IJCSMA All Rights Reserved, www.ijcsma.com 32
Dr. P. G. Khot et al, International Journal of Computer Science and Mobile Applications,
Vol.6 Issue. 3, March- 2018, pg. 29-37 ISSN: 2321-8363
Impact Factor: 5.515
Following are some common types of dials.
Region Quadtree
Point Quadtree
Point-Region Quadtree
Edge Quadtree
Compressed Quadtree
e) Octree
The Octrees are a natural extension of the concept of quadtree. To describe any object that we wish to represent within a given
space, the best approach seems to be an approximate of the position of the object, followed by subsequent refinements that
increase the resolution of the objects' details. Such combinations of global and detailed specifications produce a hierarchical
manifestation of the object's space.
An octree is a tree data structure in which each internal node has exactly eight children [7]. Octrees are often used to divide
three-dimensional spaces by recursively dividing them into eight octants. The octrees are the three-dimensional analog of
quadtrees. The Octrees are often used in the graphics and in the 3D game engine.
f) UB-tree
The UB tree is used to efficiently store and retrieve the multidimensional data proposed by Rudolph Bayer and Volker Mark
[8]. Basically it is a B + tree (information only in the leaves) with the records memorized according to the Z-order, also called
Morton's order. The Z-order is simply calculated by the bit-level interlacing of the keys.
Insertion, deletion, and point query are done as with ordinary B+ trees. To perform range searches in multidimensional point
data, however, an algorithm must be provided for calculating, from a point encountered in the data base, the next Z-value
which is in the multidimensional search range.
g) R-tree
Spatial data is two dimensional, so it is not possible to use B-tree index for spatial data. R-tree is the indexing methods which
organize data in a tree shaped structure with grouped objects at nodes. Using R-tree index, search would be more efficient and
less time taking because it eliminates the objects which are outside the area of interest and perform the search only on selected
are of interest.
h) R+ tree
The R+ tree is a tree data structure, a variant of the R tree, used to index spatial information. The R+ tree is a method to
search for data using a location, often (P, Q), and often for positions on the surface of the Earth. Searching in a number is a
solved problem; observing two or more and asking for close positions in the P and Q directions, requires more advanced
algorithms [9].
i) R* tree
In data processing, the R* trees are a variant of the R trees used to index spatial information. The R* trees have a slightly
higher construction cost than standard R-trees, as the data may need to be reinserted; but the resulting tree will generally have
a better query performance. Like the standard R-tree, it can store point and spatial data. It was proposed by Norbert
Beckmann, Hans-Peter Kriegel, Ralf Schneider and Bernhard Seeger in 1990 [10].
j) Hilbert R tree
Hilbert R-tree, an R-tree variant, is an index for multidimensional objects such as lines, regions, 3D objects or parametric
objects based on high-dimensional features. It can be considered an extension of the B + tree for multidimensional objects.
The performance of the R-trees depends on the quality of the algorithm that groups the data rectangles into a node. The
Hilbert R-trees use space-filling curves, and in particular the Hilbert curve, to impose a linear order on the data rectangles.
l) kd tree
K-d tree (abbreviation of the k-dimensional tree) is a space-partitioning data structure to organize the points in a k-
dimensional space. K-d trees are a data structure useful for different applications, such as searches that involve a
multidimensional search key (for example, search for intervals and searches for a nearer neighbor). The k-d trees are a special
case of binary partition trees.
m) m tree
M-tree index can be utilized for the effective determination of similar queries on complex objects as compared using an
arbitrary metric.
The primary filter allows a quick selection of candidate records to be passed to the secondary filter. The primary filter
compares geometric approximations to reduce the complexity of the calculation and is considered a lower cost filter. Because
the primary filter compares geometric approximations, it returns a superset of the exact result set. The secondary filter applies
exact calculations to the geometries resulting from the primary filter. The secondary filter produces a precise response to a
spatial query. The secondary filter operation is computationally expensive, but only applies to the primary filter results, not to
the entire data set.
After creating database in PostgreSQL, sample data populated into the tables. After polluting the data into the tables. We tried
to run 3 queries (complex queries using SQL joins) without using any index and captured the execution time of the queries
without and with indexes for the experiment purpose.
Now it is the time to create the index of all the tables using R-Tree and R*-Tree indexing structure. Then again ran the same
queries using both indexes separately.
Creating Index:
CREATE INDEX (“index_name”) on (“table_name”) Using rtree (“column_name”);
Table 1 shows the time taken by the index namely, R*-Tree and R-Tree to implement in the dataset. The use of indexes clearly
increases the processing speeds of the queries.
We compared the response time of the queries in different tables and it is clearly shown as per above histogram that, R-Tree is
the least time consuming indexing structure. However R*-Tree is also an option when there is no arithmetic operations in the
queries.
VII. ACKNOWLEDGEMENT
I would like to sincerely thank to Dr. Anita Soni, Associate Professor, Department of Computer Application, TIT, Bhopal for
constant support & guidance for this study. Her constant support and supervision always encourage me to achieve new
dimension in research field.
REFERENCES
[1] Freeman, H.1974. Computer processing of line{drawing images.ACM Computing Surveys,6, 1 (Mar.), 57-97.
[2] Hoel, E. G. and Samet, H.1992. A qualitative comparison study of data structures for large line segment databases.
InProceedings of the SIGMOD Conference, (San Diego, June), pp. 205-214.
[3] Jagadish, H. V.1990. On indexing line segments. InMcLeod, D., Sacks-Davis, R., and Schek, H., Eds.,Proceedings of
the Sixteenth International Conference on Very Large Databases (VLDB), (Brisbane, Australia).
[4] Varma, H., T. Milne, and G. Kierstead, The Origins of HH Codes, Internal report of Bedford Institute of
Oceanography, Canada (1996).
[5] Varma, H. et al. A Data Structure for Spatio-Temporal Databases (1990). International Hydrographic Review,
Monaco, LXVII(1).
[6] Bern, M.; Eppstein, D.; Teng, S.-H. (1999), "Parallel construction of quadtrees and quality triangulations", Int. J.
Comp. Geom. & Appl., 9 (6): 517–532.
[7] Gargantini, I. (1982), "An effective way to represent quadtrees", Communications of the ACM, 25 (12): 905–910.
[8] H. NOBORIO, S. FUKUDA, S. ARIMOTO: Construction of the octree Approximating Three Dimensional Objects
by Using Multiple Views, University of Toyonaka, Osaka, Japan, IEEE Transactions on Pattern Analysis and
Machine Intelligence, vol. 10 no. 6 , November 1988, pp. 769-781.
[9] Rudolfh Bayer, "The universal B-Tree for multidimensional Indexing: General Concepts", World-Wide Computing
and its Applications '97 (WWCA '97), 1997.
[10] A. Guttman, R-trees: a dynamic index structure for spatial searching, Proceedings of the SIGMOD Conference,
Boston, June 1984, 47–57.
[11] Beckmann, N.; Kriegel, H. P.; Schneider, R.; Seeger, B. (1990). "The R*-tree: an efficient and robust access method
for points and rectangles". Proceedings of the 1990 ACM SIGMOD international conference on Management of data
- SIGMOD '90 (PDF). p. 322. ISBN 0897913655.
[12] Berchtold, Stefan; Keim, Daniel A.; Kriegel, Hans-Peter (1996). "The X-tree: An Index Structure for High-
Dimensional Data". Proceedings of the 22nd VLDB Conference. Mumbai, India: 28–39.
[13] Fei-Ching Kuo, Shuang Liu, T. Y. Chen, Testing a binary space partitioning algorithm with metamorphic testing,
Pages: 1482-1489.
[14] Schumacker, Robert A.; Brand, Brigitta; Gilliland, Maurice G.; Sharp, Werner H (1969). Study for Applying
Computer-Generated Images to Visual Simulation (Report). U.S. Air Force Human Resources Laboratory, Page. 142.
[15] Fuchs, Henry; Kedem, Zvi. M; Naylor, Bruce F. (1980). "On Visible Surface Generation by A Priori Tree Structures"
(PDF). SIGGRAPH '80 Proceedings of the 7th annual conference on Computer graphics and interactive techniques.
ACM, New York. pp. 124–133.
[16] Brinkhoff, Kriegel, Schneider, R. (1993a), Comparison of Approximations of Complex objects used for Query
processing in Spatial Database System, in the proceeding of 9th International conference on data Engineering.