Performance Evaluation of Trajectory Queries On Multiprocessor and Cluster
Performance Evaluation of Trajectory Queries On Multiprocessor and Cluster
Performance Evaluation of Trajectory Queries On Multiprocessor and Cluster
TRAJECTORY QUERIES ON
MULTIPROCESSOR AND CLUSTER
Christine Niyizamwiyitira and Lars Lundberg
Department of Computer Science and Engineering,
Blekinge Institute of Technology
SE-37179 Karlskrona, Sweden
[email protected], [email protected]
ABSTRACT
In this study, we evaluate the performance of trajectory queries that are handled by Cassandra,
MongoDB, and PostgreSQL. The evaluation is conducted on a multiprocessor and a cluster.
Telecommunication companies collect a lot of data from their mobile users. These data must be
analysed in order to support business decisions, such as infrastructure planning. The optimal
choice of hardware platform and database can be different from a query to another. We use data
collected from Telenor Sverige, a telecommunication company that operates in Sweden. These
data are collected every five minutes for an entire week in a medium sized city. The execution
time results show that Cassandra performs much better than MongoDB and PostgreSQL for
queries that do not have spatial features. Statios Cassandra Lucene index incorporates a
geospatial index into Cassandra, thus making Cassandra to perform similarly as MongoDB to
handle spatial queries. In four use cases, namely, distance query, k-nearest neigbhor query,
range query, and region query, Cassandra performs much better than MongoDB and
PostgreSQL for two cases, namely range query and region query. The scalability is also good for
these two use cases.
KEYWORDS
Databases evaluation, Trajectory queries, Multiprocessor and cluster, NoSQL database,
Cassandra, MongoDB, PostgreSQL.
1. INTRODUCTION
Large scale organisations continuously generate data at very high speeds. These data are often
complex and heterogeneous, and data analysis is a high priority. The challenges include what
technology in terms of software and hardware to use in order to handle data efficiently. The
analysis is needed in different fields such as transportation optimization, different business
analytics for telecommunication companies that seek to know common patterns from their mobile
users.
Jan Zizka et al. (Eds) : CCSEIT, AIAP, DMDB, MoWiN, CoSIT, CRIS, SIGL, ICBB, CNSA-2016
pp. 145163, 2016. CS & IT-CSCP 2016
DOI : 10.5121/csit.2016.60612
146
The analysis comprises querying some points of interests in big data sets. Querying big data can
be time consuming and expensive without the right software and hardware. In this paper, various
databases were proposed to analyse such data. However, there is no single database that fits all
queries. The same holds for hardware infrastructure there is no single hardware platform that fits
all databases. We consider a case of trajectory data of a telecommunication company where
analysing large data volumes trajectory data of mobile users becomes very important. We
evaluate the performance of trajectory queries in order to contribute to business decision support
systems.
Trajectory data represents information that describes the localization of the user in time and
space. In the context of this paper, a telecommunication company wants to optimize the use of
cell antennas and localize different points of interests in order to expand its business. In order to
successfully process trajectory data, it requires a proper choice of databases and hardware that
efficiently respond to different queries.
We use trajectory data that are collected from Telenor Sverige (a telecommunication company
that operates in Sweden). Mobile users' position is tracked every five minutes for the entire week
(Monday to Sunday) from a medium size city. We are interested to know how mobile users move
around the city during the hour, day, and the week. This will give insights about typical behavior
in certain area at certain time. We expect periodic movement in some areas, e.g., at the location
of stores, restaurants location during lunch time.
Without loss of generality, we define queries that return points of interests such as nearest cell
location from a certain position of a mobile user. The contribution of this study is to solve
business complex problem that is,
Define queries that optimize the point of interests, e.g., nearest point of interest, the most
visited place at a certain time, and more.
Choice of database technology to use for different types of query.
Choice of hardware infrastructure to use for each of the databases.
This data is modelled as spatio-temporal data where at a given time a mobile user is located at a
position (, ). The location of a mobile user is a triples (, , )such that users position is
represented as a spatial-temporal point with = ( , , ).
By optimizing points of interests, different types of queries are proposed. They differ in terms of
what are their input and output:
Distance query which finds points of interests that are located in equal or less than a distance,
e.g., one kilometer from a certain position of a mobile user.
K-Nearest neighbor query that finds K nearest points of interests from a certain position of a
mobile user.
Range query that finds points of interests within a space range from a certain position of a
mobile user.
147
Region query that finds the region that a mobile user frequently passes through at certain time
throughout the week.
The performance of different queries is evaluated on three open sources databases; Cassandra,
MongoDB, and PostgreSQL. We choose to use open source databases for the sake of allowing
the study reproducibility. The hardware configuration is done on a single node, and on multiple
nodes (distributed) in a cluster. The execution time of each of the queries at each database on
different hardware infrastructure is measured. Since the company knows the locations that are the
most, or the least visited during a certain time, in order to avoid overloading and underloading at
such locations, antenna planning will be updated accordingly. For business expansion, a busy
location during lunch time is for e.g., convenient for putting up a restaurant. Moreover, the
performance measurement shows which database is better for which specific query on which
hardware infrastructure, thus contributing to business support systems.
The rest of the paper is organized as follows; Section 2 defines the concepts, Section 3
summarizes the related work, Section 4 describes the configuration and gives the databases
overview, Section 5 presents results and discussions, finally Section 6 draws conclusions.
2. TRAJECTORY DATA
2.1 Definition of Trajectory
Trajectory is a function from a temporal domain to a range of spatial values, i.e., it has a
beginning time and an ending time during which a space has been travelled (see Equation 1)[1].
(1)
(2)
148
In this study, the trajectory data has space extent that is represented by latitude and longitude.
With represents latitude and represents longitude, and the time that is represented by ;i.e., a
mobile user is located at position (, )at time
149
Example: find cells that are located in less than 1km from a certain mobile users position.
In terms of latitude and longitude coordinates, a query that covers the circle of 10 km radius from
a user position at ( , ) = (1.3963, 0.6981) is expressed for different databases as follows;
In Cassandra
cell_city _( , : : "", :
In MongoDB
. . ( $
[0.6981, 1.3963], $: 10 , _: 1)
In PostgreSQL
_
arccos (sin sin( )+ cos ) cos ( )cos ( ( ))) <= 10 ;
(3)
150
Example: find five nearest cells from a mobile users position. A typical query that select 5
nearest cells within 10 km is as follows;
In Cassandra
cell_city
_(, : "", : [: ""_,
: "", : 1.3963,
: 0.6981, _: "10" ] ) 5
In MongoDB
. . ( $ [ 0.6981, 1.3963], $: 0.10 ,
cell_city: 1). (5)
In PostgreSQL
>= ( _ )
( => <
= ) )(+ )(
, 5
2.2.3. Range Query
Definition: Range query returns all point of interest (e.g., gas stations) that are located within a
certain space shape (polygon) [2].
Figure 5 shows inputs to range query to find cells that belong to a polygon.
151
In Cassandra
cell_city _( ,
: "", : [: ""_, : "",
_: 11.300398, _: 12.300398
_: 56.569256, _: 58.569256 ] )
In mongoDB
. . ( $:
$: [ [ 12.300398, 57.569256], [ 11.300398, 56.569256 ],
[12.300398, 58.569256 ] ] , _)
PostgreSQL
The following is a typical query range query between two points = ( , ) and
= ( , ) with is latitude and y is longitude.
Example: = (1.2393, 1.8184) and = (1.5532, 0.4221).
>= ( _ 1.2393 1.5532)
( => 1.8184 =< 0.4221)
2.2.4. Region Query
Generally, trajectories of mobile users are independent each other, however, they contain
common behavior traits such as passing through a region at a certain regular period, e.g., passing
through the shopping center during lunch time.
Definition: identify the region which is more likely to be passed by a given user at a certain time
based on the many other relevant regions to that user [2]. In the context of this study, the
knowledge about region reveals which cell city that many users mostly pass by, this cell city
might have some point of interests such as stores, high way junction.
Figure 6 shows inputs to region query to find cell city that is the most visited at certain time.
152
Example: find the cell city that is frequently passed by the same mobile users during a certain
time every day for the entire week. A typical query that returns the cell city that is the most
visited during interval [12: 10: 00,13: 10: 00] is as follows;
In Cassandra
In MongoDB
. . ( : $: 12: 10: 00, $: 13: 10: 00,
cell_city: 1 ). )(. )(
In PostgreSQL
cell_city, )(
12: 10: 00
13: 10: 00 cell_city O ""
3. RELATED WORK
In [7], authors propose an approach and implementation of spatio-temporal database systems.
This approach treats time-changing geometries, whether they change in discrete or continuous
steps. The same can be used to tackle spatio-temporal data in the other databases. We rather
evaluate trajectory queries on existing general purpose databases notably Cassandra, PostgreSQL,
and MongoDB. In [8], author describes requirements for database that support location basedservice for spatio-temporal data. A list of ten representative queries for stationary and moving
reference objects are proposed. Some of those queries that are related to this study are given in
section two.
In [9], Dieter studied trajectory moving point object, he explained three scenarios, namely
constrained movement, unconstraint movement and movement in networks. Different techniques
to index and to query these scenarios define their respective processing performance. Author
modelled the trajectory as triples (x,y,t), we use the same model in this study.
In [10], authors introduced querying moving objects (trajectory) in SECONDO, the latter is a
DBMS prototyping environment particularly geared for extension by algebra modules for
nonstandard applications. The querying is done using SQL-like language. In our study, we are
querying moving object using SQL and Not Only SQL (NoSQL) querying languages on top of
different databases. Continuously, authors provide a benchmark on range queries and nearest
153
neighbor queries on SECONDO DBMS for moving data object in Berlin. The moving object data
was generated using computer simulation based on the map of Berlin [11]. This benchmark could
be extended to other queries such as region queries, distance queries, and so on. In our study, we
apply these queries on real world trajectory data, i.e., mobile users trajectory from Telenor
Sverige.
In [5], authors introduced a new type of query Reverse Nearest Neighbor (RNN) which is the
opposite to Nearest Neighbor (NN). RNN can be useful in applications where moving objects
agree to provide some kind of service to each other, whenever a service is need it is requested
from the nearest neighbor. An object knows objects that it will serve in the future using RNN.
RNN and NN are relatively represented by distance query in our study. In [12], authors studied
aggregate query language over GIS and no-spatial data stored in a data warehouse. In [13],
authors studied k-nearest search algorithm for historical moving object trajectories, this k-nearest
neighbor is one of the queries that is considered in our study.
In [14], authors presents techniques for indexing and querying moving object trajectories. This
data is represented as three dimension (3D) space, where two dimensions correspond to space and
one dimension corresponds to time. We also represent our data in 3D as (x,y,t), with x,y
represents space whereas t represents time.
Query processing on multiprocessor has been studied in [15], authors implemented an emulator
of parallel DBMS that uses cluster with multiprocessor. This study is different from ours in a
sense that we evaluate query processing on real physical hardware with existing general purpose
databases. Query processing on FPGA and GPU on spatial-temporal data was studied in [16].
Authors present a FPGA and GPU implementation that process complex queries in parallel, the
study did not investigate the performance on various existing databases, the distributed
environment was not also considered, whereas, in our study we investigate query processing on
various databases on top of different computational platforms including cluster. In [17], authors
conducted a survey on mining massive-scale spatio-temporal trajectory data based on parallel
computing platforms such as GPU, MapReduce and FPGA, again existing general purpose
databases were not evaluated. Authors presented a hardware implementation for converting
geohash codes to and from longitude/latitude pairs for Spatio-temporal data [18], the study shows
that longitude and latitude coordinates are the key points for modelling spatio-temporal data. In
our paper, we also use these coordinates for location based querying.
154
Key value data model means that a value corresponds to a key, column based uses tables as the
data model, the data is stored by column, each column is the index of the database, queries are
applied to column, whereby each column is treated one by one. Document based database stores
in JSON or XML format, each document (similar to a row in RDBMS) is indexed and it has a
key.
4.1. Cassandra
Apache Cassandra is an open-source NoSQL column based database. It is a top level Apache
project born at Facebook and built on Amazons Dynamo and Googles BigTable. It is a
distributed database for managing large amounts of structured data across many commodity
servers, while providing highly available service and no single point of failure. In CAP,
Cassandra has availability and partition tolerance (AP) with eventual consistency. Cassandra
offers continuous availability, linear scale performance, operational simplicity and easy data
distribution across multiple data centers and cloud availability zones. Cassandra has a masterless
ring architecture [23]. Keyspace is similar to database in RDBMS, inside keyspace there are
tables which are similar to tables in RDBMS, column and rows are similar to those of RDBMS
tables. The querying language is Cassandra Query Language (CQL) is almost similar to SQL in
RDBMS [24].
4.2. MongoDB
MongoDB is an open-source NoSQL document database, MongoDB is written in C++.
MongoDB has database, inside a database there are collections, these are like table in RDBMS,
Inside a collection there are documents, these are like a tuple/row in RDBMS, and inside a
document there are fields which are like column in RDBMS [21], [22]. MongoDB is consistent
and partition tolerant.
4.3. PostgreSQL
PostgreSQL is an open source Object RDBMS that has two features according to CAP theorem,
those are availability, i.e., each user can always read and write. PostgreSQL consists of
consistency, i.e., all users have the same view of data. PostgreSQL organises data in column and
rows [20, p. 3].
155
Processor (4x Xeon X7550) has 32 cores, each core is hyperthreaded into 2 cores, this give 64
virtual cores. At the time of experiment this server is running some other work, i.e., it is not
exclusive to our databases only. This affect the execution time of our databases, however the
trends such as variability between queries upon the databases are not affected. Standard deviation
of the execution time keeps the same trends.
MongoDB partitions data across nodes, i.e., MongoDB scales horizontally by dividing and
distributing data over multiple servers that are called shards. Each shard is an independent
database, and collectively, the shards make up a single logical database. Sharding reduces the
156
number of operations each shard handles. Each shard processes fewer operations as the cluster
grows. As a result, a cluster can increase capacity and throughput horizontally (by adding nodes
in the cluster). Sharding reduces the amount of data that each server needs to store. Each shard
stores less data as the cluster grows [26]. Sharded cluster (contains shards, config servers and
mongos instances). We use three shards, each on a node. In Figure 8, we see config servers that
holds the metadata about the cluster such as the shard location of the data, they must be three
servers. There is also Mongos server that serves as the routing service that process queries
throughout the cluster. Mongos is installed on its own node, whereas config servers and shards
are installed on the same nodes (1, 2, 3) as it shown in the Figure 8. Since we have three shards,
each shard contains a third of the total data. Mongo will be eventually available if we replicate
each shard on different nodes. In this study we install MongoDB 3.0.9. MongoDB has built in
spatial query functions.
PostgreSQL is installed on the cluster in master/ slave replication mode (see Figure 9). Nodes
serve each other in pool using pgpool2 [25]. Pgpool 2 provides, load balancing and data
redundancy. In order to keep available the data, each slave holds a copy of data and it is readonly, there are three slaves, thus three replicas. Whereas in order to keep the consistency of the,
only the master can read and write. Master and pgpool 2 are installed on the same node. In this
157
study we install PostgreSQL version 9.3.11. PostgreSQL does not have explicit spatial query
functions, thus, we have to use mathematical functions in order to query the database using
geographical coordinates.
158
159
It is observed that Cassandra has the shortest execution time for range and region queries,
particularly for region query. Region query has one input which is time, it does not involve
spatial features or geographical shapes, e.g., sphere, near, within. It is clear that Cassandra
outperforms much better than MongoDB and PostgreSQL for general purpose queries. E.g.,
Region query involves time only as input. For queries that contain geographical or specific spatial
features, MongoDB seems to perform almost as Cassandra when the latter is indexed by Stratio
Lucene Index (see Figure 10, 11, 12). In figure 13, MongoDB has longer execution than
Cassandra and PostgreSQL for region query, this is caused by aggregation query process which
seems to be slower in MongoDB.
In all queries, when we run a query for the first time, we observed that for Cassandra, it takes
longer than the next runs. We have to mention that Lucene makes a huge use of caching,
therefore the first query will be especially slow due to the cost of initializing caches [26]. Thus,
we disregard the first run of each query when measuring the performance.
Whereas for MongoDB and PostgreSQL the same query on the same hardware, runs almost with
relatively same execution time. Spatial queries have the longest execution time in PostgreSQL,
the reason is that we have to use mathematical functions to represent geographical locations. This
involves different steps of calculation, thus, making it longer (see Figure 10, 11, 12).
The scalability according to increase of number of nodes is significant for Cassandra and
MongoDB for range query. The reason is that the range query involves a partition of the data
according to range specification, hence the cache is relatively not overloaded. Whereas the
scalability is not very noticed for the other queries which covers the whole data, thus consuming
much cache which results in slowing the execution time. In terms of processing, PostgreSQL
does not exploit the increase of number of nodes, since nodes are used for replication purposes in
order to keep the database available. MongoDB distributes data across shards, in order to provide
high availability, we need to replicate each shard on its own server, e.g., in our case we have
three shards, in order to have a second copy of the whole data, we need three more servers, in
total we need 6 servers for 2 copies. However, for Cassandra, since we have a full copy of data at
each node, i.e., for 4 nodes cluster we have 4 copies of data. This feature makes Cassandra to be
attractive than MongoDB in cases where a number of servers is a constraints. Furthermore if
Mongos fails, the whole database fails, the same holds for PostgreSQL, if the master goes down,
the whole database cannot operate anymore. Whereas, for Cassandra, if any node goes down,
others keep working.
6. CONCLUSIONS
In this study, we evaluated the performance of trajectory queries on Cassandra, MongoDB, and
PostgreSQL on Multiprocessor and cluster environment. The evaluation is conducted on data
collected from a Telecommunication Company. We observed that Cassandra performs much
better than MongoDB and PostgreSQL to handle queries that do not contain special geographical
features such as sphere shape, near coordinates (example of region query that involves time as
input). MongoDB has natively a built in function for spatial queries, this speeds up the query
response time. In order to speed up Cassandra while handling spatial queries, we incorporate
Stratios Cassandra Lucene Index which holds spatial indexes. This gives same performance as
using MongoDB and even better for some queries. MongoDB seems to handle aggregate query
slower than Cassandra and PostgreSQL (e.g., region query involves two steps of aggregation).
160
Since we are using open source databases, the choice of which database to use depends merely on
the needs and preferences, for instance MongoDB is well documented comparing to Cassandra.
MongoDB uses XML language that is understood by internet, thus if one would like to work with
different data traffic over internet MongoDB is a good choice. From developer perspective, it is
easier to implement and integrate plugins to Cassandra than MongoDB. Cassandra seems to be
updated every couple of weeks, this tick-tock releases are not immediately compatible with some
plugin as it is the case in this paper, we have to use Cassandra 3.0.3 in order to be able to use
Stratios Cassandra Lucene Index 3.0.4, while at the moment, the current release is 3.4. One
would choose to use PostgreSQL if relational database features is important to handle the data.
In terms of servers, if there is a constraint of number of servers, Cassandra is more preferable
since it economically uses a less number of servers comparing to what MongoDB will require to
provide same features.
APPENDIX
In tables 1, 2, 3, 4, 5, E.time is the average execution time of ten runs, Stdev is the standard
deviation.
Table 1. Query processing time (in seconds) on 4 nodes installation (Dell powerEdge R320)
Query types
Distance Q
Cassandra
E. time
Stdev
0.036
0.077
MongoDB
E. time
Stdev
0.024
0.011
PostgreSQL
E. time Stdev
0.79
0.0005
K-n Neighbors Q
0.029
0.005
0.024
0.0107
0.881
0.015
Range Q
0.008
0.008
0.021
1.83E-18
0.621
0.001
Region Q
0.045
0.011
1.562
0.030
1.221
0.001
Table 2. Query processing time (in seconds) on 3 nodes installation (Dell powerEdge R320)
Query types
Distance Q
Cassandra
E. time Stdev
0.073
0.013
MongoDB
E. time
Stdev
0.039
0.010
PostgreSQL
E. time Stdev
0.666
0.0005
K-n Neighbors Q
0.018
0.006
0.039
7.31E-18
0.886
0.015
Range Q
0.0130
0.024
0.04
0.011
0.666
0.001
Region Q
0.0515
0.021
1.593
0.030
1.222
0.001
161
Table 1. Query processing time (in seconds) on 2 nodes installation (Dell powerEdge R320).
Query types
Distance Q
Cassandra
E. time Stdev
0.060
0.008
MongoDB
E. time Stdev
0.059
0.011
PostgreSQL
E. time
Stdev
0.766
0.0008
K-n Neighbors Q
0.031
0.025
0.059
0.017
0.822
0.0007
Range Q
0.0147
0.002
0.045
7.31E-18
0.611
0.0006
Region Q
0.0518
0.024
1.633
0.030
1.225
0.001
Table 4. Query processing time (in seconds) on a single node installation (Dell powerEdge R320).
Query types
Distance Q
Cassandra
E. time Stdev
0.017
0.076
MongoDB
E. time
Stdev
0.001
2.2857E-19
PostgreSQL
E. time Stdev
0.789
0.0008
K-n Neighbors Q
0.012
0.007
0.001
2.29E-19
0.882
0.0007
Range Q
0.028
0.023
0.048
0.000422
0.621
0.0006
Region Q
0.054
0.019
2.526
0.066
1.225
0.0016
Table 5. Query processing time (in seconds) on a single node installation (Fujitsu RX600S5).
Query types
Distance Q
Cassandra
E. time Stdev
1.121
0.001
MongoDB
E. time Stdev
1.579
0.298
PostgreSQL
E. time Stdev
2.243
0.181
K-n Neighbors Q
1.012
0.012
1.432
0.089
2.363
0.001
Range Q
1.432
0.001
1.654
0.068
2.154
0.001
Region Q
2.132
0.002
4.260
0.257
3.268
0.0009
ACKNOWLEDGEMENTS
This work is part of the research project "Scalable resource-efficient systems for big data
analytics" funded by the Knowledge Foundation (grant: 20140032) in Sweden. We also thank
HPI-FSOC, and Telenor Sverige.
REFERENCES
[1]
[2]
Y. Zheng and X. Zhou, Computing with spatial trajectories. Springer Science & Business Media,
2011.
162
[3]
N. Pelekis and Y. Theodoridis, Mobility data management and exploration. Springer, 2014.
[4]
Jan, philip Matuschek, Finding Points Within a Distance of a Latitude/Longitude Using Bounding
Coordinates. [Online]. Available:
http://janmatuschek.de/LatitudeLongitudeBoundingCoordinates#SQLQueries. [Accessed: 07-Mar2016].
[5]
R. Benetis, C. S. Jensen, G. Kariauskas, and S. altenis, Nearest neighbor and reverse nearest
neighbor queries for moving objects, in Database Engineering and Applications Symposium, 2002.
Proceedings. International, 2002, pp. 4453.
[6]
E. Frentzos, K. Gratsias, N. Pelekis, and Y. Theodoridis, Nearest neighbor search on moving object
trajectories, in Advances in Spatial and Temporal Databases, Springer, 2005, pp. 328345.
[7]
[8]
Y. Theodoridis, Ten benchmark database queries for location-based services, Comput. J., vol. 46,
no. 6, pp. 713725, 2003.
[9]
D. Pfoser, Indexing the trajectories of moving objects, IEEE Data Eng Bull, vol. 25, no. 2, pp. 39,
2002.
[10] V. T. De Almeida, R. H. Gting, and T. Behr, Querying moving objects in secondo, in null, 2006,
p. 47.
[11] C. Dntgen, T. Behr, and R. H. Gting, BerlinMOD: a benchmark for moving object databases,
VLDB J., vol. 18, no. 6, pp. 13351368, 2009.
[12] L. I. Gmez, B. Kuijpers, and A. A. Vaisman, Aggregation languages for moving object and places
of interest, in Proceedings of the 2008 ACM symposium on Applied computing, 2008, pp. 857862.
[13] Y.-J. Gao, C. Li, G.-C. Chen, L. Chen, X.-T. Jiang, and C. Chen, Efficient k-nearest-neighbor search
algorithms for historical moving object trajectories, J. Comput. Sci. Technol., vol. 22, no. 2, pp.
232244, 2007.
[14] D. Pfoser, C. S. Jensen, Y. Theodoridis, and others, Novel approaches to the indexing of moving
object trajectories, in Proceedings of VLDB, 2000, pp. 395406.
[15] K. Y. Besedin and P. S. Kostenetskiy, Simulating of query processing on multiprocessor database
systems with modern coprocessors, in Information and Communication Technology, Electronics and
Microelectronics (MIPRO), 2014 37th International Convention on, 2014, pp. 16141616.
[16] R. Moussalli, I. Absalyamov, M. R. Vieira, W. Najjar, and V. J. Tsotras, High performance FPGA
and GPU complex pattern matching over spatio-temporal streams, GeoInformatica, vol. 19, no. 2,
pp. 405434, Aug. 2014.
[17] P. Huang and B. Yuan, Mining Massive-Scale Spatiotemporal Trajectories in Parallel: A Survey, in
Trends and Applications in Knowledge Discovery and Data Mining, Springer, 2015, pp. 4152.
163
[18] R. Moussalli, M. Srivatsa, and S. Asaad, Fast and Flexible Conversion of Geohash Codes to and
from Latitude/Longitude Coordinates, in Field
Field-Programmable
Programmable Custom Computing Machines
(FCCM),
M), 2015 IEEE 23rd Annual International Symposium on, 2015, pp. 179
179186.
[19] J. Han, E. Haihong, G. Le, and J. Du, Survey on NoSQL database, in Pervasive computing and
applications (ICPCA), 2011 6th international conference on, 2011, pp. 363
363366.
[20] What is Apache Cassandra?, Planet Cassandra, 18
18-Jun-2015.
2015. [Online].
http://www.planetcassandra.org/what
http://www.planetcassandra.org/what-is-apache-cassandra/. [Accessed: 23-Feb-2016].
Available:
AUTHORS
Christine Niyizamwiyitira is currently a PhD student in Computer science at Blekinge
Institute of Technology (BTH) in Sweden in Computer Science and Engineering
Department. Shee completed her masters in 2010 in computer engineering from Korea
university of Technology (KUT) in South Korea. She works at University of Rwanda
as assistant lecturer. Her research interests includes Real time systems, cloud
computing, high performance computing, Database performance, and Voice based
application. Her current Research focuses on Scheduling of real time systems on
Virtual Machines (uniprocessor & multiprocessor) and Big data processing.
Lars Lundberg is a professor in Computer Systems Engineering at the Department of
Computer Science and Engineering at Blekinge Institute of Technology in Sweden. He
has a M.Sc. in Computer Science from Linkping University (1986) and a Ph.D. in
Computer Engineering from
m Lund University (1993). His research interests include
parallel and cluster computing, real
real-time systems and software engineering. Professor
Lundberg's current work focuses on performance and availability aspects.