Exercise 2: 2. Data Storage: Digitizing and Data Structure

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

Exercise 2

2. DATA STORAGE: DIGITIZING AND DATA STRUCTURE

Introduction
This exercise deals with spatial data storage in GIS. During this course you will mainly use the relational data model
as a method of structuring data. This means that you structure data as collections of tables that are logically related
to each other by shared attributes.

The first part of this exercise deals with data storage in vector which is based on tables. You will create a new vector
dataset by digitizing points as a method of (secondary) data capture. Digitizing is the process of using a mouse to
automatically store locations of geographic features by converting their map positions to series of x, y coordinates in
a computer file or an associated table in a database. Subsequently, you will view, create and fill tables. In addition
you will also join tables.

The second part deals with data storage in raster. You will see that spatial data is structured differently in a raster
environment.

In this exercise:

 Creating a new dataset.


 Adding attributes to tables of vector data.
 Calculating the area of polygon features.
 Joining tables.
 Comparison of the vector and raster data structures.

Objectives

After having completed this exercise you will be capable:


 to create a new dataset using ArcMap;
 to create and fill a table using ArcMap;
 describe the different data types;
 to distinguish the difference between discrete and continuous rasters and between zones and regions;
 to understand the data structure of vector and rater datasets.

ArcMap document: Data storage.mxd

Data structure of a vector dataset


The vector data model is an object-based description of the real world. Geographic phenomena are represented as
point, line or polygon features. A collection of features of one type with the same attributes and spatial reference
(often referred to as feature class by ArcGIS) are stored in a vector dataset: the shapefile. For example the shapefile
‘Landuse.shp’ contains polygon features that represent land use types. Attribute data associated with the features are
stored in tables. Each feature has one record entry in the table. Each field (column) describes a particular attribute.
In the case of a shapefile the attribute data are saved in a dBase file (.dbf).
Important note regarding dataset names!

Never use blank spaces in the names of datasets! Use the underscore “_” if you want to separate words. Avoid the
use of symbols (e.g. +, -) in dataset names and try to limit the length of the name to 10-12 characters. If you only use
of ESRI GIS software 12 is safe, if you also use other GIS software limit the length of the name to 10 characters to
avoid interoperability problems.

Creating a point dataset

If your data consist of features too small to be depicted either as lines or areas, when the features have no
dimensionality, or when you are not interested in the dimensionality or geometry of the features, then you should
create an ArcMap point dataset. Points represent discrete locations such as wells, shops, and telephone booths.
During the next exercise, you will create a dataset representing soil points sampled in the southern part of
Wageningen. First, you will have to digitize the locations of the points and subsequently fill the attribute table with
attribute values.

INSTRUCTIONS:

1. Open ArcToolbox, as described in the previous exercise.


2. Expand Data Management Tools Feature Class and double click Create Feature Class. A dialog
box opens. Fields marked with a green point have to be filled in before you run the tool.
3. Select a Feature Class Location for the new dataset, i.e. where it is stored.
4. Give the new dataset a name in the Feature Class Name field.
5. Select the Geometry Type (point) and press OK; leave the rest of the options on Default.
6. As you can see, a new but still empty point dataset is created.
7. Click the Editor dropdown arrow in the Editor toolbar (Figure 1), click Start Editing and select the
dataset you want to edit in the pop-up window. If the toolbar is not yet open, click Customize on the
Menu bar, point to Toolbars and tick Editor.
8. Click OK.
9. A Create Features window appears with all datasets that have the same source as the one you selected
in 7.
10. Select the dataset you want to edit. Now it is possible to ‘draw’ point features within the new dataset.
11. When you are finished, click the Editor dropdown arrow and click Save Edits and afterwards Stop
Editing.

If you discover that a point is entered at the wrong location, you can move that point. You always can modify the
point features.

12. Click Start Editing.


13. Click the Pointer tool on the Editor toolbar and select the point you want to move. Selection handles
will appear around the selected features.
14. Move the selected point by dragging it to its new location.
15. When you are finished moving new points, click Save Edits and afterwards Stop Editing.

Figure 1. The Editor toolbar.


Figure 2. Locations of soil point observations (X1 - X9).

1.

Open ArcMap document ‘Data storage.mxd’.

In this exercise you are going to create a new point dataset by digitizing the soil point observations as presented in
Figure 3. Activate data frame ‘Wag_south’ and display the datasets ‘Soil_types’ and ‘Roads’.

a. Create a new point dataset. Choose and add your workspace-directory ‘Workspace’ as Name (do not double
click!). so D:\IGI\workspace becomes your Feature Class Location. Give the new dataset the name ‘Soil_pts’
in the Feature Class name text box.

b. Digitize the locations of the soil point observations as drawn in Figure 2 in order of profile code: start with
point X1 and finish with X9!

c. Open the attribute table of the new point dataset. How many records does the attribute table of dataset ‘Soil_pts’
contain?
d. Save your edits.

Adding attributes to point features

In the previous exercise you digitized the locations of the soil points. No tabular data (attribute data) describing
certain characteristics of these points have been added yet. When you create a new vector dataset in ArcMap, an
attribute table is automatically created for this dataset. For each digitized point a record is automatically added to
the attribute table.

Initially the attribute table will contain three fields, called FID, ID and Shape. The FID field contains unique
identifiers of the features in the vector dataset. The unique identifier links the thematic (attribute) data to the
geometry of a geographic feature. These unique identifiers cannot be changed. In the ID field a user-defined
identifier can be stored. The Shape field stores the feature type of the geographic feature (Point, Line or Polygon).
This field is maintained by ArcMap and cannot be edited.

You can add new fields to this table at any time to store additional attribute data for the features. When you a create
new field in the attribute table you must select a data type for that field. The data type determines the kind of data
(e.g. text, number) that can be stored in a field. ArcMap supports several data types, but the most important ones are:
float, integer, text, and date.

Important:
 NEVER use blank spaces and symbols in attribute names, only letters, numbers and underscores.
Attribute names can be at most 10 characters long.
 Adding and deleting fields should be done using ArcToolbox as described below.
 When adding or deleting fields or rows in a table, make sure that you are NOT in edit mode,
otherwise it will not work.

INSTRUCTIONS:

1. Open ArcToolbox, select Data Management Tools Fields Add field.


2. Select the dataset you want to edit in the Input table field, give the field a Field Name, define the Field
(data) type, define the width of the field in the Field Length in case of text, or the Field Precision (number
of digits the field can contain) and Field scale (number of decimal places) in case of numeric values.
3. Click OK.
4. If you want to delete a field, select Data Management Tools Field Delete field.

An alternative way of adding and deleting fields is using the Table Options button on the top of an attribute table for
adding a field and right click in the field with the field name to delete this field.

Adding attribute values to point features

With the instructions listed before you can add new attributes. However, when you have a look at the table
afterwards, you will see that the fields are empty. You should now start to add the attribute values.

INSTRUCTIONS:

1. Click Start Editing.


2. Open the attribute table, click the cell you want to edit and type the attribute value. To make sure that the
right feature is being edited, select the feature (clicking on the left of the row, see the Figure 3) and see in
the view window which feature lights up.
3. When finished, stop editing and save your edits.

2.

a. Write down the meaning of the data types integer, float, text and date. Use the ArcGIS Desktop Help System to
find the definitions. Hint: use the keyword ’'add field’ in the Search field of the help.

b. Add six attributes to the table of the dataset ‘Soil_pts’: ‘Prof_code’, ‘pH’, ‘Clay’, ‘Silt’, ‘Sand’ and ‘Soilcode’.
Use the field definitions presented in Table 1.

c. When you have defined the new fields of attribute table ‘Soil_pts’ enter the attribute values according to Figure
4. Do not forget to save your edits. Then save the ArcMap document.

Table 1. Field definitions of the attributes of the digitized point features.

Important: When you have saved a field definition (i.e. completed the Add field action), it is not possible anymore
to change this field definition! If you make a mistake in the field definition, you have to delete the field and add a
new field to the attribute table.
Figure 3. The attribute table of vector dataset ‘Soil_ points’.

Calculating the area of polygon features

Polygons represent discrete areas such as houses, provinces or land uses. A polygon is two-dimensional. As a
consequence it has, in addition to location, the properties area and perimeter. In ArcMap you can calculate the area
of the polygon features of a dataset. There are three ways to do this with ArcMap: with ArcToolbox, with the Field
Calculator and with the CalculateGeometry option. This section contains instructions to make you acquainted with
the last method. First of all you have to add an extra field to the dataset. After adding an extra field to the dataset it’s
possible to calculate the areas of all polygon features.

INSTRUCTIONS (Calculate Geometry):

1. Activate the data frame that contains the dataset you want to calculate the area for.
2. Add a field to the attribute table of the dataset you want to calculate the area for. Name the field for
example ‘Area’ (data type = double, precision = 10, scale = 2).
3. Right-click the field heading for this new field and click Calculate Geometry.
4. A Calculate Geometry window appears (Figure 4). Chose the geometry property you want to calculate (in
the case of a polygon area, perimeter, x-coordinate of centroid or y-coordinate of centroid), the coordinate
system (more about this in Exercise 3) you want to use, and the units in which the geometry should be
expressed. Click OK.
Figure 4. The Calculate Geometry window.

3.

a. Calculate the area of the polygon features of the dataset ‘Soils’.

Creating a new table

Until now you have seen how to add new fields and attribute values to a dataset’s attribute table. Another way to get
your data into ArcMap is to create a new, empty table that you can fill yourself. If your tabular data is not stored on
the hard drive of your computer yet, creating a new table is a good way to get it into ArcMap. Creating a new table
is more flexible than simply adding attribute information to the attribute table of an existing dataset as you did
during the previous exercises. By putting your data in a separate table, you can work with it independent of any
particular dataset, and you can join it to any appropriate dataset whenever you want it (see next section).

INSTRUCTIONS:

1. Select Data Management Tools Table Create Table.


2. Select the Table Location (your workspace) and give the table a Table Name with extension .dbf. Click OK.
Whether or not the created table appears in the Table Of Contents, depends on the way the datasets are listed. This
can be changed with the buttons at the top of the Table Of Contents:
 Use List By Drawing Order to author the contents of your map, such as to change the display order of
datasets on the map, rename or remove datasets, and create or manage layers and group layers. All the Data
Frames in your map are listed when the table of contents is sorted by drawing order.
 Use List By Source to show the organized by the folders or databases in which the referenced data sources
can be found. This view will also list tables that have been added to the map document as data.
 Use List By Visibility to see a dynamic listing of the datasets currently displayed in the active Data Frame.
The way layers are listed updates automatically as you pan and zoom, select features, and turn layers on
and off.
 Use List By Selection to group layers automatically by whether or not they are selectable and have selected
features. A selectable layer means that features in the layer can be selected using the interactive selection
tools.

So, to make the table visible, select the use the List By Source option. When the table is opened, you can see that it
contains three fields: ‘Rowid’, ‘OBJECTID’ and ‘FIELD1’. Adding and deleting fields and attribute values to this
table works the same as adding fields to the other datasets. Note that you cannot remove or edit the Rowid value
and that a table must always contain at least two fields.

4.

a. Create a new table called ‘Landscape.dbf’. Save it in your workspace directory. Add two fields to the new table:
‘LU_CODE’ and ‘LU_DESCRIP’. The field definitions for these fields are presented in the table below.

b. Enter the attribute values according to Figure 5 and save the edits.

Figure 5. Information of landscape stored in a table.

Joining tables

You can add tabular data to an existing dataset by joining it to the dataset’s attribute table. When you join a table to
an attribute table, all fields (attributes) from the join table are appended to the attribute table of the dataset. You can
use any of these joined fields to symbolize, label, query, or analyze the dataset’s features.
A join is based on the values of an attribute that can be found in both tables. The name of the attribute does not
have to be the same in both tables, but the data type has to be the same and (at least some of) the attribute values
have to correspond (Figure 6).

INSTRUCTIONS:

1. To make join, go to ArcToolbox and click Data Management Tools Joins Add Join.
2. Select the dataset (Layer name) to which the table will be joined and select the field of this dataset on
which the join is based (Input join field).
3. Select the table which is going to be joined in the Join table field and select the Output join field of this
table on which the join is based. Click OK.
4. When the attribute table of the dataset to which a table is joined is opened, it can be seen that all fields of
the join table are appended into the dataset’s attribute table. The fields appear at the right hand side of the
table.
5. To remove the join, double click on the dataset name, select the tab Joins & Relates, select the join and
click Remove, or Remove all in the Joins field.

Figure 6. Result of a join based on a common field (Landuse Code).

5.

a. Join the fields of the table ‘Landscape’ to the attribute table of dataset ‘Soil_types’ by common field
‘LU_CODE’.

b. Display a map based on the attribute ‘LU_DESCRIP’.

c. Remove the join.


A second method to establish a link between attribute tables is with the Relate functionality. The difference with
Join is that one attribute table is not appended to the other. Relate simply defines a relationship between two tables
based on a shared attribute. You can only access related data by working with the attributes of a dataset. ‘Relate’ is
treated in more detail during the following-up course Geo-information Tools (GRS 20806).

Data structure of a raster dataset


Until this moment, this exercise dealt with spatial data stored in a vector data structure. Another widely used data
structure to store geographic information is a raster or grid (Figure 7). A raster is a location-based data structure.
Space is partitioned into a regular matrix of equally sized cells, arranged in rows and columns (left part of figure 7).
Each cell is given a value (right part of Figure 7) to correspond to a spatial characteristic of its location (e.g.
elevation, soil or land use type), as opposed to a vector structure, which associates attributes with geographic
objects.

Figure 7. The raster data structure.

Discrete vs. continuous rasters

Depending on the information it represents, a raster dataset may be created out of either integer values (whole
numbers) or floating point values (numbers with decimals). In ArcGIS, a raster dataset created out of integer
(discrete) values can have an associated raster cell value attribute table. The unique attribute value combinations are
saved in this table. Raster datasets created out of floating point (continuous) values will not have associated tables.

Discrete rasters represent geographic features that have definable boundaries, sometimes referred to as categorical
or discontinuous data. Examples of discrete terrain objects are: lakes, forests, buildings, roads etc.

Continuous rasters represent geographic phenomena that vary spatially without discrete steps. Each cell value is a
measure of the concentration or level of that location. Continuous geographic phenomena, in general, do not have
distinct boundaries like discrete geographic features. A geographic feature, such as a lake, has a real and definable
boundary. However a geographic phenomenon, like lake depth, continuously changes. Potentially, each cell in a
continuous raster can have a different value. Examples of geographic phenomena include contamination levels, heat
from a fire or elevation.

Important: Rasters are always rectangular. Every cell location in a raster has a value assigned to it.
When information is insufficient or unavailable for a cell location, the location will be assigned the value of
NoData. NoData and 0 are not the same: 0 is a valid value that can be used in geoprocessing whereas NoData is
excluded from geoprocessing.

INSTRUCTIONS:

1. Display the raster dataset in the view window. The colors are assigned to the raster cells based on the cell
value. Each value is symbolized with one color.
2. Click on the Identify tool to identify a cell value.
3. Click on a cell in the View Window.

6.

Activate the data frame ‘Vector vs. Raster’ and display the raster dataset ‘LU_raster’.

a. Use the Identify tool to explore the raster. Explain the meaning of the attributes of the dataset ‘LU_raster’ as
they are displayed in the Identify window.

b. Is the raster dataset ‘LU_raster’ an example of a discrete or a continuous raster? Why?

c. Open the attribute table of ‘LU_raster’ (right click on dataset). How many different values (i.e. land use classes)
does the dataset of ‘LU_raster’ contain?

Zone vs. region in raster

Cells in a discrete raster that share the same value represent the same type of geographic object. Clusters of
contiguous raster cells with the same value are called a region. A region represents a discrete geographic object, e.g.
a building or a lake. All regions with the same value make up a zone (Figure 8). Zones represent all geographic
objects with the same value, e.g. all buildings or lakes. Thus, the zones in a thematic raster dataset “Land_use” are
land use types.

Note that a region is the raster equivalent of a vector point, line or polygon feature: a discrete object that represents
one geographic feature.
Figure 8. Raster cells belong to zones and regions. This raster contains five zones. The zone with value 4 is made up
of three regions.

Of the two GIS data structures discussed in this exercise (raster and vector), the raster data structure provides the
most comprehensive modeling environment and operators for spatial analysis. ArcToolbox contains a
comprehensive toolset to perform cell-based (raster) operations. These tools can be found in the Spatial Analyst
Tools toolbox and will be discussed in Exercise 7.

7.

a. Do you select a zone or region when you select one record in the attribute table of ‘LU_raster’? Explain your
answer!

8.
a. Compare the attribute tables of the datasets ‘LU_raster’ and ‘Land_use’ (Figure 9) and write down the main
differences in data storage between the two tables.

Figure 9. Attribute table in raster (left) and vector (right).

You might also like