Database Design for Dynamic Online Surveys
Roy P. Pargas1, James C. Witte2, Kowshik Jaganathan1, John S. Davis3
Department of {1Computer Science, 2Sociology, 3Management}, Clemson University
Clemson, SC 29631
{pargas, jwitte, jkowshi, davis}

Abstract increasing the sample size is not associated with added

This paper discusses the architecture and implementation interviewer costs. As with other computer-assisted
of dynamic web-based surveys with an emphasis on the formats, a Web-based survey also eliminates the time and
recently completed Survey2001 project. Survey2001 was expense of data entry, which is performed by the
made available at the National Geographic website for respondent in the course of the survey [1]. Finally, a Web-
several months starting October 2001 and could be taken based survey may draw on the multi-media capabilities of
in four different languages: English, German, Spanish and the Internet to yield an instrument that collects data in an
Italian. This paper discusses surveys in general, the engaging and interactive format.
advantages of web-based surveys, lays the background for Web surveys can be either static or dynamic. Dynamic
Survey2001, describes the details of the database used and surveys benefit both survey respondent and administrator
the manner in which transitions were conducted in this through the use of interactive forms. With such forms,
dynamic web-based survey. It also lists the results, feedback can be displayed that is specifically tailored to
including other surveys developed using the same the content of the responses supplied by the user, thereby
database structure, and concludes with a look to the giving the respondent instant feedback. By using dynamic
future. surveys, it is possible to arrange it such that respondents
are no longer merely giving information, but are also
1 Introduction receiving information in return for their efforts. Under
Web-based survey research represents the most recent circumstances where respondents are made aware that
addition to a growing repertoire of computer-assisted they will benefit by participating, they are likely to exhibit
survey tools—including computer assisted telephone increased motivation. If respondents know that the
interview (CATI) and computer assisted personal feedback they receive is about them, and based on the
interview (CAPI) systems—dating back to 1971 [2,3]. data that they provide, then they are likely to supply
There are several advantages to web-based surveys. To accurate and thoughtful responses [8].
begin with, as with all forms of computer-assisted survey This paper discusses the architecture and implementation
research, a web-based instrument allows for complicated of dynamic web-based surveys with an emphasis on the
skip patterns that tailor the survey to the respondent and recently completed Survey2001 project. Survey2001 was
eliminate redundant or irrelevant questions [6]. For made available at the National Geographic Web-site for
example, a person who is unemployed would not be asked several months starting October 2001 and could be taken
questions about his/her work culture or how many emails in four different languages: English, German, Spanish and
he/she received at work. Moreover, as the idiosyncratic Italian. This paper describes the technical details of the
performances of individual interviewers are eliminated, implementation of this dynamic web based survey.
customized surveys are implemented with a degree of 2 Survey2001: Background
accuracy and transparency unmatched by CATI or CAPI
methods. As part of their coverage of the millenium, researchers at
As a self-administered survey format, a Web-based the National Geographic Society began collaborating with
survey potentially mitigates interviewer effects and researchers at Northwestern University and a half dozen
permits a degree of anonymity not found in survey modes other universities to use a brand new research tool, an
that depend on respondent-interviewer interaction [9]. At online survey, to tackle an age-old question, “How does
the same time, web-based surveys may include detailed where you live shape who you are?” Launched in October
help functions to guide and assist the respondent to a 1999, Survey2000 was an ambitious experiment in web
degree that is not possible with paper and pencil, self- survey methodology and technology. Survey2000 asked
administered formats [3,5]. how often people moved, how strongly they felt about
Web-based surveys are also less expensive to maintain their communities, how extensively they used the
and make it easy to manipulate large volumes of data. Internet, and measured their global and local cultural
Programming a Web-based survey can be costly, preferences in terms of food, music, or authors. Magazine
particularly if the instrument involves complex skip and television advertisements were used to reach potential
patterns or elaborate design elements. However, this cost respondents and more than 80,000 respondents in over
is fixed. Unlike face-to-face or telephone surveys,
175 countries participated in the survey over a ten-week The user was then randomly presented one of four
period [10]. sections: Community, Reading and Politics, Science or
Survey2000 was so successful that a follow-up project, Lifestyles. After this randomly selected section, each
Survey2001, was funded by the National Science respondent was given an opportunity to quit the survey or
Foundation (NSF). Once again with the cooperation of the continue with another module, randomly selected from
National Geographic Society, Survey2001 studied the the remaining survey modules. This process continued
impact information technology, particularly the Internet, until the respondent either quit or completed all survey
has had on contemporary society. The survey focussed on modules. Thus there were dynamic aspects to Survey2001
impacts in three areas of society: community, culture and within, between and across substantive survey modules.
conservation. The overarching substantive goal of the Indeed the inherently dynamic nature of the survey
survey was to use these three areas to describe how the instrument defined the critical constraints in carrying out
Internet redefines our sense of geography, particularly the this survey project. Further, the solution to this problem,
distinction between the global and the local. as outlined below, was intended to provide the basic
database design and presentation software tools for a wide
The important methodological aim of Survey2001 was to range of web based survey instruments regardless of
further explore sampling issues associated with web content or complexity.
surveys. Like Survey2000, Survey2001 relied primarily
on convenience sampling techniques, with the majority of 3 Problem Statement
respondents initiating their survey participation through None of the online survey authoring tools available at the
the National Geographic Magazine’s homepage time could satisfy all of the requirements of Survey2001.
( However, in this Specifically, Survey2001 (a) was multi-lingual, (b) had
instance the survey instrument was accessed by a number one version for minors (under eighteen) and another for
of distinct URLs. These URLs shared the same server and adults, (c) required moderately complex skip patterns
database, but each indicated a separate portal to the (transitions) from one block of questions to others based
survey. Particular Internet sites (for example, the Sierra on responses to multiple questions, (d) required images to
club) were assigned unique URLs, which they posted on accompany some of the questions for half of the
their web sites with an encouragement to their visitors to respondents and no images for the other half, (e) had four
participate in Survey2001. Thus, the characteristics of mandatory and four optional categories of questions, (f)
sample-subsets based on each portal may be compared, presented optional categories in random order, (g)
providing a unique opportunity to contrast convenience allowed the respondent to enter comments at any time, (h)
samples recruited from different online locations. recorded what questions the client was viewing when the
Individual respondents were also given the opportunity to comment was made, (i) provided visual queues indicating
send email invitations to others to participate in progress made through the survey, and (j) worked with an
Survey2001. A summary of the participants in Survey2001 external data database (of zip codes and the corresponding
is provided in Table 1. Finally, for comparative purposes cities and states) in developing the text for one of the
a telephone survey based on a subset of Survey2001 questions.
questions was conducted with a randomly selected The decision was made to develop the database according
national sample. to the needs of the survey, enter the data manually or
In this paper we focus on the Survey2001 database design, semi-automatically, and develop presentation software to
which allowed for a dynamic online questionnaire in an deploy the survey on the web. The total effort involved
open Internet environment. The dynamic character of eight sociologists, three computer scientists, and six
Survey2001 is introduced in the opening screen, where Clemson University and National Geographic personnel
respondents select to take the survey in one of four and took over six months to enter the data and develop the
languages. Similarly, on the following screen respondents software to create Survey2001.
are steered to an adult or a youth version of the survey Section 4 explains how the requirements mentioned above
depending on their age. Next, respondents supply were addressed by a combination of database design and
additional demographic information, including current presentation software.
primary residence, marital status and household
composition. Subsequent questions asked about race and 4 Architecture
ethnicity, educational enrollment and attainment; and An overview of software and the database supporting
current employment status. This initial section was very Survey2001 is shown in Figure 1. The database system
important because many subsequent questions were based used is MySQL (version 3.23.47). The survey is
on the respondent’s demographic data. deployed by a collection of servlets, called the
Questions about the clients’ usage of the Internet and Presentation Manager, whose HTML output is served to
Internet tools were asked in the second section. The third the client by an Apache web server and Tomcat servlet
category was composed of questions on environmental engine (version 3.2.3).
issues both global and local to where the respondent lived.
The database tables are divided into five general groups, the example, control transfers from QB1 to QB4 if
supporting (a) questions and answers, (b) question blocks, Boolean expression T1 evaluates TRUE. Transitions are
(c) responses and comments, (d) presentation format and arbitrary Boolean expressions whose variables are
layout, and (e) overall survey information. Details on each questions that the client has already seen and answered.
are provided in Sections 4.1−4.5. Section 4.6 provides a For example, transition T1 from QB1 may read:
description of OnQ, an authoring tool that works with this Q5=A10 and Q7=A9
database and allows a survey developer to more easily
create an online survey. which is interpreted by the Presentation Manager as:
“From block QB1, move to block QB4 if the client
4.1 Questions and Answers answered ‘Yes’ (i.e., answer A10) to question Q5 and
‘No’ (i.e., answer A9) to question Q7.”
Consider a survey that contains the item shown below.
Example 1 If the expression is FALSE, then control will transfer
from QB1 to another question block (not shown).
Rate each of the following foods
From QB4, control transfers to QB25 if T1 evaluates
awful so-so good great TRUE, or the survey ends if T2 evaluates TRUE.
hamburgers ο ο ο ο Otherwise control transfers back to QB1. Note that T1
soft tofu ο ο ο ο from QB1 is not the same as T1 from QB4 and that the
pizza ο ο ο ο default transition from any node always evaluates TRUE.
fried chicken ο ο ο ο Boolean expressions are evaluated in increasing index
bean sprouts ο ο ο ο order and the default expression is evaluated last.
which have been suggested for the office picnic lunch. This simple mechanism is a powerful tool. The author
may now view the entire survey as a directed graph
Questions and answers have multiple components. In the representing a deterministic finite state automaton where
database, this example is represented by five questions nodes are questions blocks and transitions are controlled
whose root fields contain the words “hamburgers”, by Boolean expressions. An example is shown in Figure
”soft tofu”, ”pizza”, “fried chicken”, and “bean 3. Two distinguished blocks, void of questions and called
sprouts”. The answer type for each question is “radio START and END, mark the start and end of the survey.
button” and each question has four answer options: Initially, START is designated as current; at any other
“yuck”, “so-so”, “good”, and “great”. Only the first time, the block whose questions the client is answering at
question has a prefix and a suffix field, which hold the the present time is considered current. The client’s
fragments “Rate each of the following foods” and “which answers determine which Boolean expressions will
have been suggested for the office picnic lunch,” evaluate TRUE. These, in turn, determine which block is
respectively. Finally, the questions are grouped into a selected next.
single radio button table with answer options aligned in The two tables used to support transitioning are shown in
Figure 4. Table question_block assigns an ID number
The same five questions can be presented in different to each block. Questions contained in the block are listed
ways. The answer type for one or more of the questions in field block_seq. Fields rand_paramter and
may be drop-down menu, instead. The questions may also rand_type deal with whether the questions in the block
be presented individually rather than in a table. Each are to be randomized before being presented to the client.
question may have a different prefix and suffix, may have Table block_sequence provides 4-tuple records of the
up to three media components (still images, audio, or form:
video), and may be laid out in one of several ways. Or all
five questions may be completely reorganized as one (current_block, expression, next_block, seqnum)
multiple-response, check box question in which the client enabling enable the Presentation Manager to decide
clicks on all acceptable choices (“hamburgers”, etc.). which block of questions to present next. Expressions are
4.2 Question Blocks and Transitions evaluated in seqnum order and the next_block whose
expression evaluates TRUE first is selected. The field
Questions are grouped into categories (for example, some
survey_type_id allows the survey author to develop
of the categories in Survey2001 were Demography,
different graphs for different survey types. For example,
Internet, Conservation, and Culture). Within a category,
a similar set of questions, possibly blocked and sequenced
questions are grouped into question blocks.
differently, may be used in a telephone survey. A
The concept of a question block, and the ability of the telephone operator may click through the survey as he or
survey author to specify that transitions, or transfer of she asks the respondent the questions. Or a survey taker
control, conditionally take place from one block to in a public area, such as a shopping mall, may use a
another, provides the dynamism in online surveys. Figure handheld digital device with a wireless connection to the
2 shows an example of four question blocks (QB1, QB4,
QB25, and END) and transitions from one to another. In
Internet may use an abbreviated version of a survey to the time the client started, information about the client’s
assess public opinion. computing environment (for example, browser name and
4.3 Responses and Comments version), the URL with which the client accessed the
survey, whether the client is a referral by an earlier
Client responses are kept in two tables, one for fixed respondent, and if so, the user id of the referring client.
answer choices (Figure 5, Table user_response) and
another for open-ended answers (Table user_response The URL from which the client linked to the survey is of
text). Both record the client identification (user_id) and particular interest to the survey analyst because this may
the question number (q_id). In both, answers are date- provide insight into potential biases common among
and time-stamped (r_datetime), providing the survey clients who frequent that URL. For example, it may be
developer with an answer trail for clients who click the informative to group responses from clients who linked to
back button on their browser and answer questions a survey from a politically moderate website and contrast
multiple times. They differ only in the answer field. them to responses from clients who connected from
Table user_response stores an index (a_id) into the liberal and conservative websites. This may assist the
table of answer choices. Table user_response_text analyst in more accurately assessing true public opinion.
records the client’s text answer. 4.6 OnQ: An Authoring Tool
At any time during the survey, the client may submit a Entering survey information into the tables is difficult
comment (Figure 5, Table comments). Comments are without additional software. An authoring tool, called
date- and time-stamped. The current question block OnQ [7], short for Online Questionnaire, is being
number is stored in field q_block_id in order to provide developed by researchers at Clemson University. The
to the survey developer the context within which the tool provides a graphical user interface that helps a survey
client made the comment. However, the number of developer (a) enter questions and answers, (b) select
questions within the question block may be large. To answer types, (c) select media components, (d) create and
narrow the context down, field questions lists precisely sequence question blocks, (e) view question blocks and
those questions which the client has on his or her screen transitions in graph form, and (f) format questions and
when the comment is submitted. question blocks. The graphs in Figures 2 and 3 were
automatically generated by OnQ from a sample survey.
4.4 Presentation Format
5 Results
The survey developer can provide the Presentation
Manager with formatting requests for individual As noted above, an important aim of our efforts to
questions and for blocks of questions. For example, say produce a database design for dynamic online surveys is
that question block 1 of survey type 1 contains questions to produce the means to efficiently deploy web surveys
1 through 10, but the developer wants to present to the without undermining the strengths of web survey
client first three, then five, and finally two questions on technology. Flexibility and complexity in question skip
three separate web pages. Entries of (3,1,1,”EP”), patterns, along with respondent friendly presentation of
(8,1,1,”EP”) in Table presentation_format (Figure 6) questions and answers—where the means used to elicit
tells the Presentation Manager to end a web page (“EP” responses may include photo images and sound and video
stands for “end page”) with questions 3 and 8. The final files, as well as simple text—constitute the primary
two questions of the block, i.e., questions 9 and 10, will advantages of web surveys. Our database design intends
be presented on a separate page because the Presentation to go beyond existing web survey tools by retaining these
Manager will end a web page automatically when a advantages while allowing surveys to be developed and
question block is exhausted. deployed with few programming resources.
If, further, the survey developer wants the five questions Our strategy has been to develop a system of survey
on the second page to form a radio button table, implementation, where the database design is unaffected
presentation_format entries of (4,1,1,”RBT1”), by question and answer content. Launching a new survey
(5,1,1,”RBT1”), …, (8,1,1,”RBT1 EP”) inform the simply means creating a clone of the general database
Presentation Manager that questions 4 through 8 are to be structure and entering new question, answer and transition
presented collectively in a radio button table. rules into the appropriate tables. Similarly changes to the
presentation manager are minimal; new graphic files may
The field q_presentation_format also enables the be referenced to give each survey a distinctive look and
author to select from several question layouts, which feel but the essential operation of the presentation
determine the relative positions of questions and answers manager is unchanged.
on a page.
Using this approach developed as part of the Survey20011
4.5 Overall Survey Information
project we have deployed additional web surveys, in each
One table, called survey_instance, contains general
information about the client. This information includes
the language in which the client chose to take the survey, 1
instance we have refined the basic tool as part of the offered a new challenge. This survey included a series of
process. For example, an online survey project for the radio button tables that involved rather complicated
Clemson University Office of Access and Equity will instructions to the respondents. Incorporating these
track a cohort of incoming freshman and survey them instructions could have meant significant modifications
annually as to their attitudes and behavior regarding and customization of the presentation manager were it not
campus diversity. For the initial survey, Diversity20012, a for the database design we have employed. One of the
means to invite and register respondents was added to the question types permitted in the design is called “splash
database. Unlike Survey2001 participation in this survey and continue,” which we have typically used as a separate
was not open to all visitors to particular web sites, but was screen to mark transitions from one set of questions to the
limited to a sample of invited students who needed to next and to orient respondents to a change in topic. These
enter their student identification numbers to participate. “questions” are distinctive in that they do not allow for an
Moreover these identification numbers had to be retained answer. Rather they serve as clarification or instructions
to track individual students in subsequent years and had to for subsequent questions. Simply by placing a question of
be done in a way that protected the confidentiality of this type, one which contained the instructional text for
respondents and satisfied university requirements the following set of questions, on the same page as the
protecting human research participants. radio button items we were able to provide the
Another web survey, UnitedWay20023, a community appropriate instructions without customizing the
survey of neighborhood issues and challenges for the presentation manager.
United Way of Greenville County South Carolina, marked 6 Summary and Future Work
our first efforts to use the OnQ authoring tool to insert The database described in this paper has served us well
questions and answers into our generic database design. over the past one-and-a-half years, and continues to do so.
At the same time we identified and standardized the New surveys with new content and completely different
graphic elements that make up the basic building blocks skip patterns are now developed quite easily. Each
used by the presentation manager in the initiation and survey, however, suggests new ideas on how to make
introductory pages of a survey, as well as for the standard OnQ more powerful and our goal is to continue
pages that contain survey questions. improving the database, modifying tables as new needs
The second year Clemson University diversity survey, arise. For the immediate future, changes to the database
Diversity2002, included a number of complicated skip will: (a) allow the author to specify type fonts and sizes of
patterns in a series of questions regarding university prefixes, roots, and suffixes, (b) enable the author to
housing. Implementing this survey with OnQ specify different backgrounds for different categories, (c)
demonstrated that the authoring tool could effectively add a new question type, the email invitation question,
enter such a survey pattern into the database, but also that causes the Presentation Manager to send email
showed that changes to the authoring tool are needed to invitations (to take the survey) to addresses specified by
do so effectively. The most complicated skip patterns the client, and (d) print paper equivalents (with
came as part of a series of questions regarding each appropriate skip instructions) of an online survey.
student’s housing situation. These questions needed to be 7 Acknowledgements
repeated for each semester with slight modifications to the
questions to remind the respondent as to which semester Support for this work has been provided by the National
was currently under consideration. Cyclical sequences of Science Foundation (NSF-ITR/Soc (Award # 0082750),
questions are not uncommon in survey research (e.g., the Clemson University Office of Access and Equity, the
questions about each job a respondent has ever held, or United Way of Greenville County, and the Clemson
each consumer product ever tried) and our aim is to University Department of Industrial Engineering.
improve the next version of OnQ so that a set of
Figure 1. Overview of Survey2001: development and deployment.

Survey2001 Development Apache Web Server /

and Deployment Tomcat Servlet Engine
Manual / Semi-Automatic
Data Entry


Figure 2. Question blocks and transitions.

Figure 3. View of an entire survey.

Figure 4. Tables supporting question blocking and transitioning.

table question_block;
| Field | Type | Null | Key | Default |
| q_block_id | bigint(20) | | PRI | 0 |
| q_block_seq | text | YES | | NULL |
| rand_type | bigint(20) | YES | | NULL |
| rand_parameter | bigint(20) | YES | | NULL |

table block_sequence;
| Field | Type | Null | Key | Default |
| survey_type_id | bigint(20) | | | 0 |
| current_q_block_id | bigint(20) | | PRI | 0 |
| boolean_expression | text | YES | | NULL |
| next_q_block_id | bigint(20) | | | 0 |
| sequence_number | bigint(20) | | PRI | 0 |
Figure 5. Client response tables.
Table user_response;
| Field | Type | Null | Key | Default |
| user_id | bigint(20) | | PRI | 0 |
| q_id | bigint(20) | | PRI | 0 |
| a_id | bigint(20) | | PRI | 0 |
| r_datetime | datetime | | PRI | 0000-00-00 00:00:00 |

Table user_response_text;
| Field | Type | Null | Key | Default |
| user_id | bigint(20) | | PRI | 0 |
| q_id | bigint(20) | | PRI | 0 |
| r_text | text | YES | | NULL |
| r_text_datetime | datetime | | PRI | 0000-00-00 00:00:00 |

Table comments;
| Field | Type | Null | Key | Default |
| user_id | bigint(20) | | MUL | 0 |
| q_block_id | bigint(20) | | | 0 |
| questions | text | YES | | NULL |
| comment | text | YES | | NULL |
| c_datetime | datetime | | | 0000-00-00 00:00:00 |

Figure 6. Question and block formatting information.

Table presentation_format;
| Field | Type | Null | Key | Default |
| q_id | bigint(20) | | PRI | 0 |
| q_block_id | bigint(20) | | PRI | 0 |
| survey_type_id | bigint(20) | | PRI | 0 |
| q_presentation_format | text | YES | | NULL |
Table 1. Origin and language of Survey2001 respondents

Total NGS Other Email

site site referral

Surveys: initiated 23,192 14,064 8,569 559

Survey language
English 75.0% 77.7% 69.7% 86.8%
German 6.9% 6.8% 7.1% 5.4%
Italian 7.2% 6.3% 8.8% 3.0%
Spanish 11.0% 9.2% 14.1% 4.8%

Surveys: demographics 12,361 7,583 4,470 408

complete adults

Surveys: complete 7,767 4,831 2,669 267

Survey language
English 85.1% 85.1% 84.8% 88.8%
German 6.6% 7.3% 5.5% 5.6%
Italian 3.5% 3.6% 3.6% 3.2%
Spanish 4.7% 4.0% 6.1% 3.4%

