IET Intelligent Trans Sys - 2015 - Grant Muller - Enhancing Transport Data Collection Through Social Media Sources Methods

Published in IET Intelligent Transport Systems
Received on 30th December 2013
Revised on 6th September 2014
Accepted on 13th September 2014
doi: 10.1049/iet-its.2013.0214

ISSN 1751-956X

Enhancing transport data collection through social

media sources: methods, challenges and
opportunities for textual data
Susan M. Grant-Muller1, Ayelet Gal-Tzur2, Einat Minkov 3, Silvio Nocera4, Tsvi Kuflik3,
Itay Shoor5
Institute for Transport Studies, University of Leeds, Woodhouse Ln, Leeds LS1 3HE, UK
Transportation Research Institute, Technion, Technion City, Haifa 32000, Israel
Information Systems Department, University of Haifa, Mount Carmel, Haifa 31905, Israel
IUAV University of Venice, Research Unit TTL - Transport, Territory and Logistics, Convento delle Terese - Dorsoduro
2206, I-30123 Venice, Italy
Computer Science Department, University of Haifa, Mount Carmel, Haifa 31905, Israel
E-mail: [email protected]

Abstract: Social media data now enriches and supplements information flow in various sectors of society. The question addressed
here is whether social media can act as a credible information source of sufficient quality to meet the needs of transport planners,
operators, policy makers and the travelling public. A typology of primary transport data needs, current and new data sources is
initially established, following which this study focuses on social media textual data in particular. Three sub-questions are
investigated: the potential to use social media data alongside existing transport data, the technical challenges in extracting
transport-relevant information from social media and the wider barriers to the uptake of this data. Following an overview of
the text mining process to extract relevant information from the corpus, a review of the challenges this approach holds for the
transport sector is given. These include ontologies, sentiment analysis, location names and measuring accuracy. Finally,
institutional issues in the greater use of social media are highlighted, concluding that social media information has not yet
been fully explored. The contribution of this study is in scoping the technical challenges in mining social media data within
the transport context, laying the foundation for further research in this field.

1 Introduction social networks data in order to study activity-travel patterns.

It is important to note that they used traditional tools as
In recent years there has been a dramatic upsurge in the surveys and interviews for building social networks, while
popularity of social media. Nowadays millions of people nowadays this information is freely available and easily
make use of a variety of online web platforms to express accessible. Recently, Efthymiou and Antoniou [14]
their opinions, thoughts and experiences. It has already demonstrated the usefulness of today’s social networks data
been shown that social media can serve as a reliable for eliciting transport information in a study that analysed
resource for public opinions as well as factual information carsharing and bikesharing using a questionnaire distributed
across several disciplines. Examples include the analysis of by email. They also noted and briefly demonstrated the
user opinions posted on social media relating to specific potential of mining Twitter data for that purpose.
products in the marketing domain [1, 2], the assessment of Compared with the traditional means of collecting user
political support rates as alternative to polls [3], aggregate generated inputs (such as traffic counts, roadside and
financial measures [4–6] and the tracking of public health household surveys and focus groups), the main advantages
indicators to identify the outbreak of diseases [7, 8]. Social of social media data are that very fresh and recent
media has also created an opportunity for transport information can be obtained, plus it is a relatively low-cost
stakeholders and policy makers, where information flow has method. The potential of a social media platform as a
strategic importance both in long-term planning and means to conduct transport surveys has already been
short-term tactical system management. Specifically, social recognised [15], and demonstrated [14]. However there are
media may serve as a near real-time information source for obstacles to overcome in order to extract useful information
tactical measures that require travel times, network demand from social media effectively. First, there is the ‘needle in a
or incident detection [9, 10] as well as supporting the haystack’ problem, whereby relevant data must be identified
development of strategic policies, such as those concerning from a very large data mass. In addition, social media
levels of service quality [11, 12]. Carrasco et al. [13] mined content is mostly in natural language form, which unlike

other data forms, cannot be readily interpreted, queried or The answers to these questions will enable policy makers to
aggregated. For these reasons, relevant information must be assess the challenges in obtaining information from social
‘harvested’ using text mining techniques [16, 2]. A key media text sources and subsequently either fuse the data
question is whether social media information is of sufficient with that from other sources or process it as an additional
quality to meet the needs of the system operators/policy data stream.
makers and travellers who, through traveller information This paper provides a review of the state-of-the-art in the text
systems, may also be end-users of the information. mining techniques needed to obtain transport-related data from
To illustrate the type of transport-related content in social social media text sources. A reflection on institutional issues is
media and the challenges involved in the automatic also made to assess whether technical or institutional issues are
processing of social media data, consider the following the greater obstacle, informing the direction for future research.
example short texts. Each corresponds to an authentic This paper continues in Section 2 with a summary of uses of
message posted on Twitter, a popular micro-blogging service. current transport data relationships with social media data.
Message 1: ‘Just looked to get train to Liverpool to see giant An overview of text mining methodology is given in Section
exhibition. Two adults two children 358.20 could drive my 3, whereas Section 4 outlines text mining challenges, the
6cyl volvo up & back for 70’. state-of-the-art and a reflection of the implications for
Categories: Transport-related? (‘Yes’) Subjective? (‘Yes’). transport data requirements. In Section 5, the evidence
Message 2: ‘Can you get a day rider that takes you to concerning wider barriers to transport stakeholders (local
Liverpool on a bus other than the X1 obv an arriva one! government, practitioners and suppliers) using social media
Haven’t got buses properly in ages :-(’. is outlined and Section 6 concludes.
Categories: Transport-related? (‘Yes’) Subjective? (‘No’).
Message 3: A return ticket to Chester is 10 cheaper than a
return ticket to Liverpool. Chester is further!!!!!’ 2 Role of data in transport planning and
Categories: Transport-related? (‘Yes’) Subjective? (‘Yes’). policy
Message 4: ‘Time to walk the dawn - seeing what is going on
in the football can wait a while’. Information flow plays a central role in the decisions made by
Categories: Transport-related? (‘No’) Subjective? (‘No’). transport system users in how, when and whether to travel. It
Messages 1–3 discuss transport-related topics, and may be also supports recovery in cases of unexpected disruption [17,
of interest from several perspectives. For instance, Messages 18]. The question is therefore whether current data streams
1 and 2 indicate user intention to travel to a specified can be either integrated with (or replaced by) new sources,
destination. If a large pool of similar messages were to provide cost effective and potentially more complete
processed it may be possible to elicit popular trip routes information. The many sources of new technology-enabled
and needs. Messages 1 and 3 also both include opinions on data include textual social media, geographic information
the quality of public transport (PT) services (specifically, systems and digital data from intelligent transport systems.
relating to price). Although such information may be useful, The potential for information enrichment arises from data
the processing of textual messages into meaningful data is collection at various levels of aggregation and with some
not straightforward. Crucially, only a proportion of social sources providing associated ‘clues’ to the socio-economic
media messages concern genuine transport issues – characteristics associated with individual data units.
Message 4, for example, contains the term `walk’, however A typology of primary data needs, data sources and the
not in a transport context. potential role of social media is given in Table 1, which
The task of automatically associating text with specified draws on sources including the Common Highways Agency
categories such as `transport’ or ‘subjectivity’, is a Rijkswaterstaat Model specifications [19] and other PT
well-researched field in text mining that is typically service performance specifications (for example [17]).
addressed using classification methods. Another important Existing transport systems often comprise layers of
component of text mining is information extraction (IE). IE technologies and monitoring equipment that have
tasks include identifying names in the text (e.g. location accumulated as technology has advanced [20]. However the
names, underlined in the example above) and processing distribution of instrumentation can be patchy, resulting in
different references to the same entity unambiguously, for some geographic areas with dense data collection and
example, ‘NYC’ and ‘New York’. Classifying message others with sparse data (typically rural). This gives rise to
content, identifying location names and treating name two challenges: the first is whether new data forms can be
variations are just three challenges that need to be addressed integrated with other data sources to create new or better
in order to obtain useful information on transport-related quality knowledge [13] and the second is whether there is
topics from textual content. the possibility to adopt various ‘user generated data’ where
In this paper, the goal is therefore to review both the current data collection is sparse [21]. Uses may include
opportunity and challenges that are involved in harvesting monitoring system performance, informing new policies
large amount of social media in the area of transport. based on expected demand, providing cost effective, more
Specifically, we are interested in addressing the following detailed and potentially more complete information on
questions: context, improving understanding of behaviour and
perceptions that underlie mode choice [22] and enriching
(1) In which ways can social media data be used alongside or the understanding of scheme impacts [23–27]. They may
potentially instead of current transport data sources? serve to improve the efficiency and effectiveness of current
(2) What technical challenges in text mining social media data databases, for example, through reconciling data
create difficulties in generating high-quality data for the contradictions and reducing redundant data collection. To
transport sector? answer both challenges fully, a further significant tranche of
(3) Are there wider institutional barriers in harnessing the research is needed. The remainder of this paper therefore
potential of social media data for the transport sector in focuses on one particular stream of new data only, that is,
addition to the technical challenges? social media textual data.

Table 1 Overview of current and new transport data sources (source: authors)
Data purpose Data issues Potential role of social media or new technology

speed monitoring (1) automatic number plate recognition (ANPR) GPS tracking (e.g. on mobile phone)
inputs to estimated travel time for traffic † automatic data collection following installation † can generate speeds
management and advanced traveller (fixed location) † tracking has some errors in measurement
information systems (ATIS)
† can capture large proportions of population at † data are generated automatically
location † GPS data are given by consent
current data sources: † costly to maintain † can be applied to non-motorised modes
(1) ANPR camera † accuracy in plate reading and matching, for (such as bike riders and pedestrians)
(2) loops (e.g. motorway incident detection example, in poor weather
and automatic signalling (MIDAS)) † data are obtained without explicit consent
embedded in the highway
† no user characteristics for each trip
† applicable to motorised vehicles only

(2) loops (MIDAS) cellular-based monitoring

† automatic data collection following installation † less accurate than GPS based (accuracy is
(fixed location or temporary loops) usually proportional to cell size)
† can capture large proportions of population at † however, it does not require activation of any
location GPS-based app
† faulty loops generate data gaps and errors, which † the number of samples is much higher (to
can be substantive compensate for the accuracy of identification of
† loop maintenance costs and costs of downloading/ any single probe)
processing data † cellular operators provide the data to the
† speed is inferred and smoothed, missing spikes in companies calculating speeds/travel time
actual vehicle speed † privacy is preserved by assigning each probe
† no user characteristics for each trip an ID which is different than the original ID of
† applicable to vehicles only phone

constructing O–D movements (1) ANPR (issues as above) social media text content
input estimates of demand to policy decisions (2) revealed preference (RP)/stated preference (SP) † content can contain O–D data, but coverage
and traffic management strategies † allows bespoke design and can collect user may be limited
characteristics † depends on users choosing to contribute
current data sources: † resource intensive content
(1) ANPR camera and † potential sampling and other sources of bias, for † socio-economic and contextual data may be
(2) RP/SP questionnaires example, in administration of complex design present alongside O–D
† response rate/participation may not be high GPS tracking
† can generate individual O–D movements for
full trip

link demand (1) roadside counts social media text content

input estimates of demand to policy decisions † manual process, resource intensive therefore † may generate additional data on demand, but
and traffic management strategies limited sample possible unlikely to capture total demand
† human and other errors † unlikely to identify specific link within a
journey as part of content
current data sources: (2) MIDAS GPS tracking
(1) roadside counts, (2) loops (e.g. MIDAS) † inaccuracies in vehicle classification † can generate link demand, but would need
embedded in the highway and (3) RP/SP † other issues as above large-scale monitoring to estimate total demand
questionnaires (3) RP/SP (issues as above)

PT mode demand (1) in-mode counts social media text content

inputs to policy making and/or commercial † can be targeted to areas/services of interest † useful for understanding mode choice
decision making by suppliers † manual, therefore resource intensive rationale
† limited sampling practical † unlikely to capture total demand
Current data sources: (2) patronage data † may capture evidence on demand for
(1) in-mode counts, † automatically collected responsive services
(2) patronage data (e.g. ticket sales) and † large proportions of population can potentially be GPS tracking
(3) RP/SP questionnaires captured † can generate evidence on mode demand
† commercially sensitive † large-scale monitoring needed to estimate
† some inaccuracies, for example, in cross-mode mode demand with confidence
(3) RP/SP (issues as above)

service quality and driver comfort (1) RP/SP (issues as above) social media text content
inputs to policy planning and operations † analysis of text content effective in
current data sources: generating service quality data
(1) RP/SP questionnaires

public opinion (e.g. new schemes or services) (1) groups and meetings social media text content
inputs to long-term policy development and † resource intensive † analysis of text content effective in
planning decision † limited samples possible supplementing or replacing public opinion data
current data sources: † sources of bias, for example, in participation sources
(1) focus groups, committees and consultation (2) household questionnaires
meetings and (2) household questionnaires † resource intensive
† Some biases, for example, in response rates

detection of abnormal or undesirable event (1) physical devices social media text content
(various modes of transport) † continuous monitoring † low cost for authority
inputs to operations and ATIS. Includes † level of accuracy is usually sufficient † even a small number of similar reports
incidents along the road network, train delays, † high coverage is often costly constitute a solid basis for verifying the event
packed bus, missing rental bikes etc. (2) management/operational/control systems † many types of events can be detected in the
† systems often belong to private operators and the same manner
current data sources: quality of data sharing is often a challenging issue † depends on human reporting
(1) various types of physical devices (e.g. video † such systems do not necessarily enable real-time † time constraints require the use of very
cameras, loops, in-mode counters etc.) and (2) data processing which is required for event detection efficient text mining techniques
systems (such as patronage data, bike rental
systems etc.)

The premise that social media contains valuable transport 3.2 Semantic annotation
information forms the rationale for the examination here of
the mining task needed to extract the data for use. The The initial pool of filtered raw texts (‘source texts’, Fig. 1) can
characteristics of social media that are of particular value be further annotated with useful semantic information [29].
for the transport sector include: the potential for all users to Specifically, named entity recognition (NER) techniques
contribute content, dematerialisation of data collection, annotate the scope and types of entities of interest,
community facilities (such as discussion boards/blogs, including place names, facilities, organisation and person
video sharing) and virtual meetings. Information harvesting names. Recent NER models have been adapted to handle
may be: (i) dynamic, informing short-term decisions by informal text such as social media [30, 31]. It is also useful
system operators and users or (ii) off-line, supporting policy to annotate transport-related concepts in the text, linking
makers and stakeholders in forming improved policies. An textual phrases to domain ontology as discussed in Section
overview of the text mining process follows as a 4. This level of annotation can be used to assist in further
background to a discussion of specific transport-related decoding the meaning of the whole message, while place
challenges in Section 3. names provide evidence on the location orientation of the
message (see Section 4.3).
3 Mining transport data from social media
text 3.3 Message relevancy

Text mining uses a set of techniques and tools which need to The relevance of annotated messages to the transport authority
be carefully adapted for the task in hand [28]. Fig. 1 outlines a can then be more thoroughly evaluated. The automatic
general flow of the text mining process, as applied to social association of text with a topic – transport here – typically
media, with the following main steps. uses supervised machine learning approaches. In these
approaches, a model is learned based on labelled examples,
implying that a ‘dataset’ must be constructed containing
3.1 Initial message filtering example texts with their correct labels. Manual labelling may
be costly, especially if domain expertise is required. To learn
Owing to the computational intensity of the task, a set of a classification model that fits labelled examples and
potentially relevant messages must first be identified from generalises to new examples, example texts are abstracted
the general social media message stream. Meta-data is often into pre-defined ‘feature’ values. In the popular
useful for this purpose if available. For example, Twitter’s ‘bag-of-words’ feature schema, a document is represented as
streaming application programming interface (an interface an unordered set of the words [16]. This simple
provided by Twitter for access to the real-time tweet representation can give good performance, for example,
stream) allows message filtering using criteria such as date documents containing the terms ‘train’, ‘bus’, and ‘ticket’ are
and geographical meta-data. In addition, keyword likely to be transport-related. Similarly, it may be useful to
specification allows the extraction of messages that contain model word bigrams, or trigrams, capturing collocations like
a set of pre-specified words from the general stream, which ‘car accident’. Enhanced feature schemes encode additional
may be highly effective in asserting message relevance in semantic information rather than surface words; for example,
some contexts. For example, in the political arena, features indicating whether location names or
messages containing ‘Obama’ were analysed to assess transport-related terms are observed in the text, as indicated
presidential approval rates over time [6]. In the transport by the semantic annotation [29]. Various classification
domain [12], a list of train names was used to filter social paradigms are known to give good performance on text
media message content, with the goal of eliciting user categorisation problems, including support vector machines,
opinions on the transit system from social media. Further Bayesian models and more [16]. Similar classification
research by Mai and Hranac [9] involved the collection of techniques are well established in other fields of transport
incident statistics from social media. A set of word modelling, for example, [32]. Once classified, messages that
collocations was used to filter potentially relevant messages are identified as irrelevant to transport are discarded at this
including the collocations ‘traffic accident’ and ‘car crash’. stage. Finally, another aspect of relevancy is the location
In this paper, we consider the scenario where a transport orientation of a message. We discuss methods for further
authority is interested in processing social media data about a identifying messages that are relevant with respect to location
wide variety of transport issues within its remit. Message in Section 4.3.
relevancy in this case requires that texts are related to a broad
range of transport issues. A reasonable strategy for keyword
specification would be to use keywords that are typical for 3.4 Semantic processing
the transport sector, possibly deriving them from an existing
transport lexicon or ontology. Although some contexts may Messages judged as relevant can then be classified into finer
involve keywords with unique meaning (such as ‘Obama’, categories within the transport domain; for example,
‘influenza’ or train names), typical words for the transport identifying messages that report accidents, messages in
sector may be highly ambiguous. For example in the texts which users express a wish to travel to some known
‘cross the bridge when we get there’, or ‘wash the car’, the destination etc. Similarly, messages may be automatically
terms ‘bridge’ and ‘car’ are associated with transport, but are associated (using dedicated classifiers) with transport modes
used with an irrelevant sense. Filtering messages by or sentiment analysis may be used (see Section 4.2, [30]).
keywords may therefore yield very noisy results. However,
having identified candidate messages using the initial criteria, 3.5 Summarisation and presentation
an improved assessment of relevance and more detailed
interpretation of content can be performed using further text The final stage is to aggregate and present the mining
mining steps, as described below. outcomes, to support decision making [33]. For example,

Fig. 1 Overview of text mining process (source: authors)

graphical presentations of positive against negative public term ‘61c’ as an entity, connected with an ‘is-a’
sentiment [34] towards service. (hyponymy) relation to the concept ‘bus’, where ‘bus’ in
Automated text mining is inherently imperfect, however turn is mapped as a hyponym of the ‘transport mode’
this should not imply the data cannot be used within the concept etc. This allows an association of the text with
transport information cycle. To the contrary, an appreciation transport categories at various granularities; for example,
of where data quality is either strong or weak allows a more ‘transport’, ‘transport mode’, ‘bus’ etc.
confident utilisation of the data. Performance considerations A literature review of transport-related ontologies reveals
that are particularly relevant to the transport sector are two main categories. The first concerns the type of activity
discussed in Section 4.4. for which the ontology was created. Some work has
focused on very specific tasks such as the transmission of
communication between in-vehicle and external systems
4 Text mining challenges for transport data [36]. Others have targeted more general processes such as
needs micro-simulation [37] or journey planning [38]. Generally,
an ontology required for specific activities is narrower than
In this section, some more detailed attention is given to four one for more general processes. The second category
challenging areas of particular relevance to uses of social concerns the transport mode the ontology covers, with some
media in transport. The aim is to highlight the focusing on a single mode while others cover multi-modal
state-of-the-art in the technical process and reflect on the travel. Combining both categories of ontology-related
implications for increased uptake of social media as a transport research means that all combinations appear in the
transport information source. literature:

4.1 Transport ontology † Ontologies addressing a specific activity and a single

mode: Private vehicle context-aware services [36]; customer
A main barrier to automating text processing in general, (and
satisfaction of travellers of mass transit system [39];
micro-blogs such as Twitter in particular) is the lack of
situation awareness of city tunnel traffic [40].
accompanying context. By way of illustration, to infer that
† Ontologies addressing a specific activity and multi-modal
the text ‘the 61C was late this morning’ is relevant to
journey: Military transport planning and scheduling [41].
transport, it must be known that ‘61c’ is the name of a bus
† Ontologies addressing a general activity and a single
line and is being used intentionally in a transport context.
mode: Personalised private vehicles route planning [42];
One of the most effective ways to represent background
activity-based carpooling micro-simulation [37].
(world) context is through ontologies. ‘Ontologies’ serve as
† Ontologies addressing a general activity and multi-modal
a methodological framework for representing contextual
journey: Public transport query system [42]; journey
information as a networked structure of objects or concepts,
planning [43].
with related items linked by labelled relationships. Freebase
[35] is a popular example of a general-purpose ontology.
The mining process involves ‘annotating’ text, linking text Despite the substantial contribution of existing research,
segments to concepts in the ontology, rendering the text the work to date only provides a partial solution for the
amenable to semantic search and processing. Following the problem of creating overall comprehensive transportation
example above, an ontology is needed that represents the ontology (see also [44]). Constructing such ontology would

be resource intensive as it involves the abstraction and sarcastic. Inferring sentiment can be posed as a text
conceptualisation of the transport domain, typically classification task [53], associating the text with the
conducted by domain experts. Fusing existing ontological categories of positive, negative and neutral sentiments. Pang
resources may alleviate this effort and some attempts in this and Lee [2] claim that labelled training data from within the
direction have already been made, for example, [45]. The same domain must be used. As with ontology (see Section
dynamic and geography dependent nature of the 4.1), some consideration of either a mode-specific or
transport-related social media content further contributes to task-specific lexicon in the learning process may be
the complexity in creating ontology. A full-scale ontology appropriate. The social media platform from which the
should, for example, capture the reality in which the content is harvested might also influence sentiment analysis.
underground system in London (UK) is called ‘tube’, For example, sites dedicated to complaints, such as
whereas at the same time ‘T’ is commonly used for, might be biased towards negative
informally referring to the one in Boston (USA). To the sentiments [47]. Despite the challenges and the fact that
best of our knowledge, this aspect has not yet been dealt there are great differences in success of sentiment analysis
with by researchers in this field. Ideally, a transport in different domains, Pang and Lee [2] noted that machine
ontology would also be maintained using collaborative learning techniques can achieve >80% accuracy in
intelligence and drawing on contributions by non-experts, sentiment analysis.
in a similar fashion to Wikipedia. Given the current lack of In recent years, Twitter has been the target of intensive
such a resource for the transport community, future research research with respect to content analysis and especially
activities are likely to include modelling relevant semantic opinion mining, given its nature as a short and immediate
information given pre-specified tasks and consolidating response to events. Challenges in mining sentiments in
dictionaries that are available in different formats. Twitter’s textual contents are exacerbated by the length of
the text, its contextual nature and its lifespan. However, it
4.2 Sentiment analysis has been widely used for estimating public mood [53],
trends such as stock market behaviour [54], political
Sentiment analysis (or ‘opinion mining’) is the process of elections results [55] and also in the transport sector.
extracting opinions concerning an event or an entity from Collins et al. [12] used Twitter as an information source for
the text. This area generally has drawn a lot of recent evaluating transit rider satisfaction. Focusing on the rapid
attention [2], with the bloom of user generated content in transit system of the Chicago Transit Authority as a case
social media boosting research efforts [46]. Addressing study, researchers have found a correlation between
some of the challenges in sentiment analysis is important in irregular events (such as extreme delays) and the volume of
order that social media plays an increasing role in the negative sentiments. This correlation supports the notion
transport sector information loop. Opinion data includes that Twitter is a valid source of information for inferring
bus, train or plane passenger views (e.g. on service quality) transport-related sentiments. The short message length
and in governance processes that include public creates difficulties in the transport sector as in others. Still,
participation, for example, consultations concerning new it has also the advantage that users have a lower payload in
transport schemes [47]. sending a message than for other types of social media, for
Sentiment analysis often begins by creating of a lexicon of example, Facebook. The need for contextual data is
words marked with their prior polarity, that is, negative/ potentially more of an issue than for other domains because
positive [48], which is then used to analyse emotions in the of the dynamic and spatial nature of travel and with
text [49]. Sood et al. [49] suggested a methodology for journeys often involving more than one mode. The lifespan
detecting sentiments based on three steps: (i) collecting a of transport sentiment data is possibly less of a problem
training corpus of texts, often manually annotated according given the links into both long-term planning and short-term
to the sentiments expressed in them; (ii) building a set of responses (Section 1).
categories associated with positive, negative and neutral
sentiments; and (iii) training a system to classify new texts 4.3 Location data
automatically into the desired categories. Pang and Lee [2]
claim that in order to perform a sentiment analysis task, Most transport operators and managers are primarily
labelled training data from within the same domain must be concerned to identify transport-related information from
used. As with the issues surrounding ontology (see Section social media that is closely associated with the transport
4.1), some consideration to a mode-specific or task-specific services for which they have responsibility. It is a
lexicon may be appropriate. Given the nature of natural reasonable assumption that most messages posted on the
language, identifying negative/positive meaning is formal websites for a transport authority (or supplier) will
challenging [50]. For example, ‘busy’ may be positive in have relevance to the associated locality. However, the
describing some transport contexts, for example, ‘the road transport system inherently contains networks (e.g. of roads,
is busy and should qualify for upgrade’, but negative in PT services). As a result, both upstream and downstream
others ‘the road is busy and unsuited for further housing transport activities may be of relevance to a particular
development’. A text may say that a policy is ‘not at all geographic location. The governance of particular sections
desirable’ (negative sentiment) or a product is ‘terribly of the transport system that together form networks may be
good’ (positive sentiment). Natural language may also undertaken by different authorities with different websites.
include irony and sarcasm which add to the challenge [51]. For example, complaints about connections between
Analysis of transport sentiment data [52] illustrated the inter-city and local services may be posted on the web site
difficulty with sarcasm in service quality related text. The of inter-city service operators, but be of interest to local
message ‘train service is just fantastic’ needs the providers seeking to improve connection services.
surrounding context for interpretation. In this case, clues in It is therefore necessary to identify those messages (from
the preceding or subsequent content (e.g. relating to late the very many that will be available) relevant to the
running trains) may indicate whether it is genuine or location and/or specific transport services for the task. Two

possible location identification approaches are either (a) to that fine geographical distinctions are possible based on
identify the current location of the person posting the local language characteristics [61, 62].
message and/or (b) to correctly identify locations from The second approach to identifying location data is from
message content. Fig. 2 outlines the process involved for an the contents of the message. This task is especially
example case of PT messages based on the fusion of challenging when considering the high ambiguity of names
information either within the message or attached to it. of places. For example, ‘Liverpool’ is the name of a UK
A primary source of information on the location of the city, a London rail station (Liverpool Street), a city in the
person posting the text message is voluntarily posted USA and an Australian suburb.
geo-meta-data associated with the social media-user (‘user’) Several approaches have been proposed for identifying
account. In practice, many users do not provide this geo-location based on message content. NER techniques
information [9] and even if a message is geo-tagged, it may can automatically annotate the text with mentions of entity
be inaccurate. The message may also relate to transport in names. Having extracted candidate location names,
locations distinct from the users home town, for example, disambiguation is needed to align inferred location with any
while travelling. Mobile device global positioning system other contextual information, in conjunction with relevant
(GPS) coordinates offer further implicit meta-data indicating sources of location names.
the users’ location, but is only a portion of all social media Web-a-Where, a system for associating geography with
traffic and user consent is required to enable the web pages was one of the early works that have tackled this
functionality. Research continues to maximise the precision problem [63]. TwitterTagger [64] geotags tweets based on
of location inference from pervasive devices [56]. Given comparing their content with the United States Geological
current limitations in coverage of these types of meta-data Survey (USGS ( database of
[57, 58], other implicit information sources have been locations. Another approach for content-based geo-location
investigated for potential location inference. Social network of multilingual tweets is based on collating contextual
structures can be used for this purpose as users tend to live tweets into a document using a user-tweeting-frequency-
in close geographic proximity to their social network peers based temporal window [65]. An approach based on ‘local’
[59]. An estimate of user location may be inferred based on words, for example, words that are typical to specific
the message content [60]. In particular, it has been shown location such as ‘Hoody’ for Texas, was proposed by
Cheng et al. [66] for estimating a Twitter-user city level
The availability of data sources containing transport-related
entities (e.g. PT line identifications, station formal and
informal names, names of parking facilities) may constitute
a valuable asset for identifying locations for transport-related
messages. Bry et al. [67] provide interesting examples of the
use of such data sources in building a world model for
geospatial data. The world model presented consists of
concrete data (such as train connections) as well as logically
formalised ontologies of transport networks. Following
conjectures on the location of the message based on the
different approaches, the data analyst can check for possible
inconsistencies and choose whether to discard messages
where there is low confidence in geographical orientation.

4.4 Measuring the accuracy of the text mining


A quantitative evaluation of the text mining process (Section 3)

is needed to tune the system and evaluate the degree of
success. Evaluation generates common performance
measures originating from the information retrieval domain
[12], namely precision and recall. Precision measures the
accuracy of the predictions made by the system (i.e. how
many of the texts classified as relevant are indeed relevant).
Recall corresponds to coverage ratio (how many of the total
transport-relevant texts were classified as relevant). Given a
dataset of examples associated with correct class labels on
one hand, and automatically inferred labelled on the other
hand, it is possible to evaluate the system’s performance
both in terms of precision and recall.
An important step in social media text mining is to classify
an initial pool of texts as relevant or not (Section 3). For the
potential uses here (see Table 1), a text is considered relevant
if it contains either objective or subjective information about
the transport system or a journey undertaken by an individual
(s). This may include origin and/or destination information,
mode-specific commentary, opinions and experiences about
Fig. 2 Analysis of geo-location data in social media transport data transport or transport-related activities, observations on the

state of the system and information concerning system changes processed as labelled examples to retrain improved models
or interventions. In practice, however, relevance is a subjective [12]. Finally, semi-automatic settings may be preferred where
notion, difficult to define deterministically [12] and subject to the text processing outputs are further assessed by a human.
personal interpretation. To evaluate subjectivity, it is Summarising the findings overall, text mining provides a
common practice to have a shared set of examples manually means for automatic identification of transport-relevant
annotated with the target class. The inter-annotator agreement messages in a stream of incoming messages. Specific
rate provides an indication of the level of subjectivity in the challenges remain and solutions are needed where the user
task. Automatic methods that learn a concept class based on must be in the loop for periodic monitoring and
annotated examples cannot be expected to outperform the enhancement of the system.
inter-annotator agreement rate.
The infinite nature of the message stream in social media is
challenging from several perspectives. From a performance 5 Harnessing social media data in practice
perspective, as the content posted on social media changes
rapidly over time, periodic monitoring and possibly This section seeks to address the final research question: are
re-tuning of the system is required. From an evaluation there wider institutional or other barriers in harnessing the
perspective, it is impossible to identify all relevant potential of social media data in transport in addition to the
messages in the data stream and as a result one cannot technical issues? A review of institutional attitudes to social
compute recall precisely. The large mass of data on social media use is followed by some findings on social media use
media however also carries an important advantage. Social by transport authorities in practice.
media information is characterised by a high degree of Although there is a growing tranche of literature concerned
redundancy, having multiple messages that are phrased with the attitudes and perceptions of ‘individuals’ in social
differently conveying similar content. This means that while media use, rather less has been published on the formal
some relevant messages may be overlooked by the text stance of ‘organisations’ on social media use. This is
mining process this may not have a drastic effect on the particularly the case for those in a governmental (or public
output of the text analysis process. It has been shown that sector) role, which is often the case for transport sector.
text analysis of social media can yield results that are Given the important role of information in both operational
consistent with formally conducted polls [28]. activity and strategic planning [47], improved understanding
The individual components of the text processing pipeline, of the barriers and enablers to accelerating effective uptake
including the classifier and text annotators can be evaluated of social media in the transport sector is needed. Initial
using labelled examples that were set aside for testing attempts to provide authorities with guidelines for effective
purposes. Each component is typically tuned until the output use of social media have already been made [15, 68]. Based
performance measures are considered satisfactory. In general, on more general social media literature, the following points
an important factor affecting the performance of learning arise as possible organisational stances.
systems is the size of the labelled data that is available to A reluctance to engage with social media may arise from
learn from. Rather than manually label large amounts of data, the need to be active as a ‘key requirement of success’ [58],
which are costly, automatic and semi-automatic methods for potentially related to the need for resource input (see also
labelling examples can be applied. Using the pseudo [52, 69, 70, 11, 71]). There may well be concerns about
relevance feedback approach, for example, texts that are safeguarding corporate image, given the dynamic nature of
classified with high confidence at early iterations are messaging. Less opportunity for ‘lagged’ responses may

Table 2 Summary of findings on the use of social media by authorities

Authority Summary of findings

Buckinghamshire and City of using Twitter to give timely information to drivers about conditions on the road network
Edinburgh [73]
Transportation Safety Board of mainly used as an alternative method for accessing the material shared through the Transportation
Canada [74] Safety Board of Canada’s really simple syndication feeds and websites
American Association of State just <90% of states are using Twitter. More than three-quarters of states are using Facebook
Highway and Transportation
Official, Third Annual State DOT
Social Media Survey [72]
Minnesota’s Local Road 25 cities and 25 counties were selected for closer examination. Among the 50 governments sampled,
Research Board [75] Facebook was found to be the most common social media outlet (used by 19 for any reason and by
10 for transport communications) followed by Twitter (used by 15 for any reason and by nine for
transport). Across all social media channels, the most common transport-related topics for
communication were planning and zoning road construction and street closures
New York region’s major links to social media accounts from the homepage, in tandem with service alert tools. This feature
transport providers [76] allows riders to comprehend urgent information in the context of social media’s engagement,
showing the complementary nature of the different resources
various US authorities [77] although it should be noted that social media could greatly enhance a transport organisation’s public
outreach, there are also some potential dangers. Some aspects of social media are beyond the
authority’s control: hackers are a threat, people will post old or false news and some leaks will occur
Virginia DOT [78] the Virginia DOT is expanding its use of social media to communicate with the 7.5 million Virginians
who depend on us to connect them with the things that are most important in their lives
local authorities in California cities are generally more interested in information-sharing through social media than constituent
[69] engagement
Bay Area Rapid Transit (BART), Facebook page is mostly used to promote contests, highlight agency news and make followers aware
Oakland, CA [70] of upcoming public hearings. The Twitter account mostly includes service alerts

Fig. 3 Flow of information to and from transport authority social media sites

give rise to fears around ‘sending out the wrong messages’. media related activity and therefore the evidence found is
[52] outline evidence of a ‘code of conduct’ having been not complete. However descriptions within social media
established in the case of one heavily engaged transport sites, articles, interviews with officials and surveys
supplier (Koninklijke Luchtvaart Maatschappij N.V.). Lack conducted by various organisations reveal a set of typical
of formal evaluation and ‘proof of concept’ may be a activities conducted either frequently or occasionally.
further issue as a body of evidence on the benefits for the Table 2 contains some examples of such activities.
transport sector has yet to be established [71], although This evidence of uptake supports previous findings [79] on
anecdotally they may be substantive. Further research to both the volume and pertinence of the information contained in
in-fill the formal evidence basis may be needed. A social media textual data. It also highlights potential uses of
willingness to engage with social media may be a result of social media information that have not yet been explored by
perceived advantages in closing the perceptual distance authorities. The most prominent concerns the potential to
between public and governmental services, resulting in aggregate traveller’s information. Aggregated information
increased public satisfaction and trust [65, 70]. Social can serve as a basis for identifying major needs and
media can be used to create a positive image (e.g. for PT perceived satisfaction that serve as a vital input for decision
use and encouraging PT use through building a community making in the medium and long term. Fig. 3 depicts the
of customers) and to support operational objectives (see principle data flows to and from official transport authority
Table 1). Finally, authorities may benefit through promoting social media sites, covering those reported as currently
and connecting related activities – social media can act in implemented and additional potential future flows.
an integrative way for those who have a range of activities
rather than just transport.
The willingness of transport authorities to engage with 6 Conclusions
social media as a working tool is reflected by the interest of
the state Departments of Transport (DOTs, USA) to The goal of this paper was to address three research questions,
improve the effectiveness of their social media programmes in brief, whether social media data may be used either
[72]. Not all agencies publish reports concerning social alongside or potentially instead of current transport data, the

technical challenges in text mining social media for 8 Corley, C., Cook, D., Mikler, A., Singh, K.: ‘Text and structural data
high-quality transport data and whether institutional barriers mining of influenza mentions in web and social media’,
Int. J. Environ. Res. Public Health, 2010, 7, (2), pp. 596–615
to harnessing the potential of social media data in transport 9 Mai, E., Hranac, R.: ‘Twitter interactions as a data source for
sit alongside technical issues. transportation incidents’. TRB 92nd Annual Meeting Compendium of
For the first, it is clear that both established and new social Papers, 2013
media data have strengths and weaknesses; however, the 10 Pender, B., Currie, G., Delbosc, A., Shiwakoti, N.: ‘Social media use in
advantages of social media sources include the ability to unplanned passenger rail disruptions – an international study’. TRB 93rd
Annual Meeting, 2014
capture the whole trip, preserve elements of the associated 11 Schweitzer, L.: ‘How are we doing? Opinion mining customer sentiment
context and/or the individual socio-characteristics, garner in US transit agencies and airlines via twitter’. Presented at the 91th
qualitative data on large scale, and finally cost effectiveness. Annual Meeting of the Transportation Research Board, Washington,
For the second, challenges arise from the dynamic, location DC, 2012
dependent and informal nature of transport textual content. 12 Collins, C., Hasan, S., Ukkusuri, S.V.: ‘A novel transit rider satisfaction
metric: rider sentiments measured from online social media data’, J.
This contributes to the complexity in establishing ontology, Public Transp., 2013, 16, (2), pp. 21–45
but also to sentiment analysis, where contextual data are 13 Carrasco, J.A., Hogan, B., Wellman, B., Miller, E.J.: ‘Collecting social
potentially more of an issue for transport than other domains. network data to study social activity-travel behavior: an egocentric
Efforts directed at creating transport-related ontologies are approach’, Environ. Plan. B: Plan. Des., 2008, 35, (6), pp. 961–980
already emerging and future efforts are likely to set the basis 14 Efthymiou, D., Antoniou, C.: ‘Use of social media for transport data
collection, Procedia’, Soc. Behav. Sci., 2012, 48, pp. 775–785, ISSN
for increasing the efficiency of the text mining task. The 1877–0428. Available at
opportunities in combining geo-meta-data associated with 06.1055
the user account/location and place names included in the 15 Barron, E., Peck, S., Venner, M., Malley, W.G.: ‘Suggested Practices
message are also promising to improve the quality of this task. Guidance Resource’, NCHRP 25–25 TASK 80, September 2013
For the final research question, a literature review 16 Manning, C., Raghavan, P., Schtze, H.: ‘Introduction to information
retrieval’ (Cambridge University Press, NY, USA, 2008)
suggested that the need to be active for success with social 17 Nocera, S.: ‘An operational approach for quality evaluation in public
media, the resource requirement and concerns to safeguard transport services’, Ing. Ferrov., 2010, 65, (4), pp. 363–383
corporate image may be issues. Willingness to engage with 18 Nocera, S.: ‘The key role of quality assessment in public transport
social media may be based on the clear advantages in policy’, Traffic Eng. Control, 2011, 52, (9), pp. 394–398
closing the perceptual distance between the public and 19 ‘Common Highways Agency Rijkswaterstaat Model
(CHARM)’, 2013. [online] Available at
governmental services. Image building (e.g. for PT) and the documents/1524978/1866952/CHARM+business+specification/b5f6281d-
ability to support operational objectives may also be 8701-4287-84e9-c00d266a15b3, accessed: 11 December 2013
incentives to engage. Evidence within the ‘grey’ literature 20 Grant-Muller, S.M., Usher, M.: ‘Intelligent transport systems: the
revealed that an increasing number of authorities appreciate propensity for environmental and economic benefits’. Technological
the advantages from rising above the barriers and routinely Forecasting and Social Change, 2013, doi 10.1016/j.
engage with social media. However, the potential of this
21 Caceres, N., Romero, L.M., Benitez, F.G., del Castillo, J.M.: ‘Traffic
engagement has not yet been fully exploited, especially flow estimation models using cellular phone data’, IEEE Trans. Intell.
with regards to the use of aggregated social media Transp. Syst., 2012, 13, (3), pp. 1430–1441
information for transport planning and management, 22 Libardo, A., Nocera, S.: ‘Transportation elasticity for the analysis of
performance measurement and quality evaluation. Italian transportation demand on a regional scale’, Traffic Eng.
Control, 2008, 49, (5), pp. 187–192
Overall, it is possible to conclude that social media has
23 Nocera, S., Cavallaro, F.: ‘Policy effectiveness for containing CO2
an increasingly important role in the transport sector, emissions in transportation’, Procedia – Soc. Behav. Sci., 2011, 20,
potentially filling some gaps and enriching other data pp. 703–713
sources. Although challenges in data quality remain, 24 Nocera, S., Cavallaro, F.: ‘Economical evaluation of future carbon
addressing institutional perspectives may yield many ‘low impacts on the Italian highways’, Procedia – Soc. Behav. Sci., 2012,
hanging fruits’ through greater uptake of the social media 54, pp. 1360–1369
25 Nocera, S., Cavallaro, F.: ‘A methodological framework for the
data already available. economic evaluation of CO2 emissions from transport’, J. Adv.
Transp., 2014, 45, pp. 138–164
26 Nocera, S., Maino, F., Cavallaro, F.: ‘A heuristic method for evaluating
7 References CO2 efficiency in transport planning’, Eur. Transp. Res. Rev., 2012, 4,
pp. 91–106
1 Kushal, D., Lawrence, S., Pennock, D.M.: ‘Mining the peanut gallery: 27 Nocera, S., Tonin, S.: ‘A joint probability density function for reducing
opinion extraction and semantic classification of product reviews’. the uncertainty of marginal social cost of carbon evaluation in transport
Proc. of the 12th Int. Conf. on World Wide Web, 2003, pp. 519–528 planning’. Advances in Intelligent Systems and Computing, 2013,
2 Pang, B., Lee, L.: ‘Opinion mining and sentiment analysis’, Found. accepted for publication
Trends Inf. Retr., 2008, 2, (1–2), pp. 1–135 28 Aggarwal, C.C., Zhai, C.-X.: ‘Mining text data’ (Springer, 2012)
3 Tumasjan, A., Sprenger, T.O., Sandner, P.G., Welpe, I.M.: ‘Predicting 29 Schulz, A., Ristoski, P., Paulheim, H.: ‘I see a car crash: real-time
elections with twitter: what 140 characters reveal about political detection of small scale incidents in microblogs’. ‘The semantic web:
sentiment’. Proc. of the Fourth Int. AAAI Conf. on Weblogs and ESWC 2013 satellite events’, Berlin Heidelberg, New York, 2013
Social Media, 2010 (LNCS, 7955), pp. 22–33
4 Antweiler, W., Frank, M.Z.: ‘Is all that talk just noise? The information 30 Li, C., Weng, J., He, Q., et al.: ‘TwiNER: named entity recognition in
content of internet stock message boards’, J. Finance, 2004, 59, (3), targeted twitter stream’. Proc. of the Int. ACM SIGIR Conf. on
pp. 1259–1294 Research and Development in Information Retrieval, 2012
5 Koppel, M., Shtrimberg, I.: ‘Good news or bad news? Let the market 31 Ritter, A., Clark, S., Mausam, Etzioni, O.: ‘Named entity recognition in
decide’. AAAI Spring Symp. on Exploring Attitude and Affect in tweets: an experimental study’. Proc. of the Conf. on Empirical Methods
Text: Theories and Applications, 2004 in Natural Language Processing (EMNLP), 2011
6 O’Connor, B., Balasubramanyan, R., Routledge, B.R., Smith, N.A.: 32 Oppenheim, N.: ‘Urban travel demand modeling: from individual
‘From tweets to polls: linking text sentiment to public opinion time choices to general equilibrium’ (John Wiley and Sons, Inc.,
series’. Proc. of the Fourth Int. AAAI Conf. on Weblogs and Social New York, 1995)
Media (ICWSM), Washington, DC, 2010, pp. 122–129 33 Nugroho, A.S., Endarnoto, S.K., Pradipta, S., Purnama, J.: ‘Traffic
7 Grishman, R., Huttunen, S., Yangarber, R.: ‘Information extraction for condition information extraction amp; visualization from social media
enhanced access to disease outbreak reports’, J. Biomed. Inform., twitter for android mobile application’. Proc. of the Int. Conf. on
2002, 35, (4), pp. 236–246 Electrical Engineering and Informatics (ICEEI), 2011

34 Kaur, A., Gupta, V.: ‘A survey on sentiment analysis and opinion 57 Leetaru, K., Wang, S., Cao, G., Padmanabhan, A., Shook, E.: ‘Mapping
mining techniques’, J. Emerging Technol. Web Intell., 2013, 5, (4) the global twitter heartbeat: the geography of twitter’, First Monday,
pp. 367–371 2013, 18, (5), doi:10.5210/fm.v18i5.4366
35 Bollacker, K., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: ‘Freebase: a 58 Kaplan, A.M., Haenlein, M.: ‘Users of the world, unite! The challenges
collaboratively created graph database for structuring human and opportunities of social media’, Bus. Horiz., 2010, 53, (1), pp. 59–68
knowledge’. Proc. of the ACM SIGMOD Int. Conf. on Management 59 Davis, C.A.Jr, Pappa, G.L., de Oliveira, D.R.R., de L Arcanjo, F.:
of Data, Vancouver, BC, Canada, 2008, pp. 1247–1250, ISBN ‘Inferring the location of twitter messages based on user
978-1-60558-102-6 relationships’, Trans. GIS, 2011, 15, (6), pp. 735–751
36 Madkour, M., Maach, A.: ‘Ontology-based context modeling for 60 Priedhorsky, R., Culotta, A., Del Valle, S.Y.: ‘Inferring the origin
vehicle-aware services’, J. Theor. Appl. Inf. Technol., 2011, 34, (2), locations of tweets with quantitative confidence’. Proc. of the 17th
pp. 158–166 ACM Conf. on Computer Supportive Cooperative Work and Social
37 Cho, S., Kang, J.Y., Yasar, A., et al. ‘An activity-based carpooling Computing (CSCW), Baltimore, MD, 15–19 February 2014
microsimulation using ontology’, Procedia Comput. Sci., 2013 19 61 Eisenstein, J., O’Connor, B., Smith, N.A., Xing, E.P.: ‘A latent variable
pp. 48–55 model for geographic lexical variation’. Proc. of Empirical Methods in
38 Niaraki, A.S., Kim, K.: ‘Ontology based personalized route planning Natural Language Processing, Stroudsburg, PA, USA, 2010,
system using a multi-criteria decision making approach’, Expert Syst. pp. 1277–1287
Appl., 2009, 36, pp. 2250–2259 62 Khanwalkar, S., Seldin, M., Srivastava, A., Kumar, A., Colbath, S.:
39 Trappey, C., Wu, H.Y., Liu, K.L.: ‘Knowledge discovery of customer ‘Content-based geo-location detection for placing tweets pertaining to
satisfaction and dissatisfaction using ontology-based text analysis of trending news on map’. Fourth Int. Workshop on Mining Ubiquitous
critical incident dialogues’. Proc. of the 2012 IEEE 16th Int. Conf. on and Social Environments (MUSE), Prague, Czech Republic,
Computer Supported Cooperative Work in Design, Wuhan, 2012, September 2013
pp. 470–475 63 Amitay, E., Har’El, N., Sivan, R., Soffer, A.: ‘Web-a-where:
40 Li, L., Wu, W., Liu, N.: ‘Ontology model for situation awareness of city ‘Geotagging web content’. SIGIR’04 Proc. of the 27th Annual Int.
tunnel traffic’. Proc of the Second Int. Symp. on Computer, ACM SIGIR Conf. on Research and Development in Information
Communication, Control and Automation (ISCCCA-13), Atlantis Retrieval, 2004, pp. 273–280
Press, Paris, France, 2013, pp. 601–603 64 Paradesi, S.: ‘Geotagging tweets using their content’. Proc. of the 24th
41 Becker, M., Smith, S.F.: ‘An ontology for multi-modal transportation Int. Florida Artificial Intelligence Research Society Conf., 2011,
planning and scheduling’, Technical Report, CMU-RI-TR-98-15, pp. 335–356
Robotics Institute, Carnegie Mellon University, 1997 65 Tapscott, D., Williams, A.D., Herman, D.: ‘Government 2.0:
42 Wang, J., Ding, Z., Jiang, C.: ‘An ontology-based public transport query transforming government and governance for the twenty-first century’,
system’. Proc. of the First Int. Conf. on Semantics and Grid’, SKG, 2005 New Paradigm, January 2008. Available at http://www.mobility.
43 Houda, M., Khemaja, M., Oliveira, K., Abed, M.: ‘A public
transportation ontology to support user travel planning’. Proc. of the 66 Cheng, Z., Caverlee, J., Lee, K.: ‘You are where you tweet: a
Fourth Int. Conf. on Research Challenges in Information Science content-based approach to geo-locating Twitter users’. Proc. of
(RCIS), Nice, France, 2010, pp. 127–136 CIKM’10 Proc. of the 19th ACM Int. Conf. on Information and
44 Grosenick, S.: ‘Real-Time Traffic Prediction Improvement through Knowledge Management, New York, 2010, pp. 759–768
Semantic Mining of Social Networks’. Thesis (Master’s), University
67 Bry, F., Lorenz, B., Ohlbach, H.J., Rosner, M.: ‘A geospatial world
of Washington, 2012. URI available at
model for the semantic web’. Principles and Practice of Semantic Web
Reasoning, Berlin, Heidelberg 2005 (LNCS, 3703), pp. 145–159
45 Yang, W.D., Wang, T.: ‘The fusion model of intelligent transportation
68 Gao, L., Zhang, Z., Wu, H.: ‘Analyzing the use of Facebook page
systems based on the urban traffic ontology’, Phys. Procedia, 2012,
among state DOTs’. TRB 92nd Annual Meeting Compendium of
25, pp. 917–923
Papers, 2013
46 Pak, A., Paroubek, P.: ‘Twitter as a corpus for sentiment analysis and
69 Zimmer, C.G.: ‘Social Media Use in Local Public Agencies: A Study of
opinion mining’, Computer, 2010, 10, pp. 1320–1326
California’s Cities’, Master thesis, Department of Public Policy and
47 Musakwa, W.: ‘The use of social media in public transit systems: the
case of the Gautrain, Gauteng province, South Africa: analysis and Administration, California State University, Sacramento, 2012
lessons learnt’. Proc. REAL CORP 2014 Tagungsband, Vienna, 70 Cotey, A.: ‘Social media: transit agencies connect with riders in new
Austria, 21–23 May 2014. Available at ways’, Progressive Railroading, January 2011. Available at http://www.
48 Wilson, T., Wiebe, J., Hoffmann, P.: ‘Recognizing contextual polarity:
an exploration of features for phrase-level sentiment analysis’, agencies-connect-with-riders-in-new-ways–25447
Comput. Linguist., 2009, 35, (3), pp. 399–433 71 Barron, E., Peck, S., Venner, M., Malley, W.G.: ‘Potential Use of Social
49 Sood, S., Owsley, S., Hammond, K., Birnbaum, L.: ‘Reasoning through Media in the NEPA Process’, NCHRP 25–25 TASK 80, September
search: a novel approach to sentiment classification’, WWW2007, 2013
North Western University, Electrical Engineering and Computer 72 Third Annual State DOT Social Media Survey, AASHTO, September
Science Department Technical Report, NWU-EECS-07–05, Banff, 2012. Available at
Canada, 21 July 2007, Documents/Social_Media_Survey_2012.pdf
papers/paper10171.pdf, accessed 7th July 2013 73 Use of Social Networking to promote public transport and sustainable
50 Wiegand, M., Balahur, A., Roth, B., Klakow, D., Montoyo, A.: ‘A travel. Available at
survey on the role of negation in sentiment analysis’. Proc. of the Social+Media+to+promote+PT+$26+Sustainable+Travel.pdf, accessed
Workshop on Negation and Speculation in Natural Language 1 August 2013
Processing (NeSp-NLP ’10), Association for Computational 74 Transportation Safety Board of Canada. Social media terms of use.
Linguistics, Stroudsburg, PA, USA, 2010, pp. 60–68 Available at, accessed 1 August
51 Davidov, D., Sur, O., Rappoport, A.: ‘Semi-supervised recognition of 2013
sarcastic sentences in Twitter and Amazon’. Proc. of the Fourteenth 75 Minnesota Department of Transportation, Office of Policy Analysis: ‘Use
Conf. on Computational Natural Language Learning, Uppsala, of Social Media by Minnesota Cities and Counties’, Transportation
Sweden, 2010, pp. 107–116 Research Synthesis, November 2011. Available at http://
52 Gal-Tzur, A., Grant-Muller, S.M., Minkov, E., Nocera, S.: ‘The impact media/reports/TRS1104.pdf, accessed August 2013
of social media usage on transport policy: issues, challenges and 76 Moss, M.L., Kaufman, S.: ‘How Social Media Moves in New York –
recommendations’, Procedia – Soc. Behav. Sci., 2014, 111, pp. 937–946 Final report’. Available at
53 Bollen, J., Pepe, A., Mao, H.: ‘Modeling public mood and emotion: Final-Report-Social-Media-NYC.pdf, accessed 1 August 2013
twitter sentiment and socio-economic phenomena’. Proc. of the Fifth 77 Shepherd, P.A.: ‘The Transportation World Should Embrace Social
Int. AAAI Conf. on Weblogs and Social Media (ICWSM), Barcelona, Media... Carefully’, Eno Center of Transportation. Available at http://
Spain, 17–21 July 2011, pp. 450–453
54 Bollen, J., Mao, H., Zeng, X.J.: ‘Twitter mood predicts the stock social-media-carefully, accessed 1 August 2013
market’, J. Comput. Sci., 2011, 2, pp. 1–8 78 Virginia Department of Transportation. VDOT on Social Media.
55 Chung, J., Mustafaraj, E.: ‘Can collective sentiment expressed on twitter Available at,
predict political elections?’. Proc. of the 25th AAAI Conf. on Artificial accessed 1 August 2013
Intelligence, San Francisco, CA, USA, 2011, pp. 1770–1771 79 Gal-Tzur, A., Grant-Muller, S.M., Kuflik, T., Minkov, E., Nocera, S.,
56 Bie, J., Bijlsma, M., Broll, G., et al.: ‘Move better with tripzoom’, Shoor, I.: ‘The potential of social media in delivering transport policy
Int. J. Adv. Life Sci., 2012, 4, pp. 125–135 goals’, Transp. Policy, 2014, 32, pp. 115–123

