1. Introduction
Traditional Geographic Information Systems (GIS) are well-adapted for offline algorithms over static data. In an offline environment, a GIS application is expected to have complete information about the input static data to be processed [
1]. With the proliferation of Global Navigation Satellite System (GNSS)-equipped devices and wireless sensor networks, current GIS applications (e.g., location-based services and webGIS) need to be much more suited to online processing to deal with large volumes of highly dynamic geo-streaming data. A number of successful attempts have been made for the challenge that current GIS applications face. For example, Galić et al. [
2] presents a formal framework consisting of data types and operations needed to support geo-streaming data. In [
3], a spatio-temporal query language is proposed to process semantic geo-streaming data. Furthermore, Moby Dick [
4,
5], which is a distributed framework for GeoStreams, has been developed towards efficient real-time managing and monitoring of mobile objects through distributed geo-streaming data processing on large clusters. A more comprehensive introduction about processing GeoStreams is available in [
6].
However, the above-mentioned literature only focuses on the spatial dimension of GeoStreams. In fact, a GeoStream holds the textual property as well. Concretely, more recently massive amounts of geo-textual data are generated including geo-tagged micro-blogs, photos with both tags and geo-locations, points of interests (POIs) and so on [
7,
8]. For example, according to the descriptions in [
9], about 30 million users send geo-tagged data into the Twitter services, and 2.2% of the global tweets (about 4.4 million tweets a day) provide location data together with the text of their posts. These data often come in a rapid streaming fashion in many important applications such as social networks (e.g., Facebook, Flickr, FourSquare and Twitter) and location-based services (e.g., location-based advertising) [
9]. Monitoring the geo-textual streaming data is critical to efficiently support above-mentioned GIS applications. For instance, in tweeter applications, users often subscribe some requests containing both location information and textual content. Thus, the GIS applications need to monitor incoming geo-textual streaming data to discover matched messages and notify the users during a period of time [
7]. The continuous queries are an effective technique for monitoring purposes over streaming data. These queries are issued once and then logically are executed continuously over data streams to provide a prolonged perspective on the changes of streaming data [
10]. However, periodic query re-execution is a well-known expensive operation [
11]. The optimization of continuous query processing is still an open issue in the community of data stream management in the past two decades. For geo-textual streaming data, the performance issue of continuous query is more serious since both location information and textual description need to be matched for each incoming streaming data tuple.
The query indexing approaches are dominant for the optimization of continuous query processing, since they can avoid expensive operations of index maintenances comparing to the data indexing alternatives [
12]. Recently, some query indexing methods have been proposed to address this performance issue. The systems can quickly filter stream data tuples by index structure built over spatial-keyword queries. These indexing methods have roughly been categorized into three classes: the keyword-first indexing method (e.g., Ranked-key Inverted Quadtree (RQ-tree) [
9]), the spatial-first indexing method (e.g., R
-tree [
13] and Inverted File Quadtree (IQ-tree) [
7]), and the adaptive indexing method based on location information and textual information (e.g., Adaptive spatial-textual Partition tree (AP-tree) [
9]). However, existing continuous spatial-keyword query indexing approaches are faced with two drawbacks for current applications. First, these existing indexing structures lack the support of approximate keyword matching. The approximate keyword matching is necessary when users have a fuzzy search condition, or a spelling error when submitting the query, or the strings in the database contain some degree of uncertainty or errors. The keyword search for retrieving approximate string matches often is required when searching geo-textual objects, according to the descriptions in [
14]. Meanwhile, because geo-textual streaming data tuples arrive rapidly from data sources, high-performance data processing is a key requirement for current continuous query methods. Therefore, there is a need for an efficient continuous query indexing approach for spatial approximate keyword query over geo-textual streaming data.
To address these research challenges, we first employ an approximate string search method to enhance the AP-tree indexing structure to support approximate keyword matching. Furthermore, a GPU platform is used to improve the query performance of our query indexing method.
The main contributions of this study are as follows:
We have introduced an advanced AP-tree indexing method called AP-tree
to support continuous spatial approximate keyword queries with efficiently embedding
signatures into the AP-tree structure based on one-permutation
hashing [
15].
We designed a parallel version of AP-tree
on a GPU platform, which further improved the performance of our indexing structure for fast processing geo-textual streaming data. The GPU-aided method parallelized the approximate keyword matching of AP-tree
based on a
parallel hashing algorithm [
16].
We further employed a data streaming communication method [
17] to optimize I/O overheads between GPU and CPU during continuously processing geo-textual streaming data.
Additionally, this study only handled the range query amongst a variety of continuous queries since our study is based on AP-tree and the spatial query related to AP-tree is range query.
This paper has been expanded from its previous conference version [
18] to include one description about a parallel scheme of approximate keyword matching, a data streaming communication method between CPU and GPU in [
17] that is used to optimize the performance of GPU-aided AP-tree
, an additional data set for experiments, and more experiments for evaluating proposed indexing methods. The remainder of this paper is organized as follows:
Section 2 discusses work relating to indexing methods for geo-textual data.
Section 3 presents our materials and methods.
Section 4 presents the experimental results and discussions.
Section 5 concludes with a summary.
2. Related Work
This section describes the most salient works along indexing methods for geo-textual data. The indexing methods for geo-textual data can be roughly classified into two categories, i.e., for static data and streaming data. The approaches for static data consider that all spatial objects have been stored in a spatial database and each spatial object is described with a set of keywords. Thus, these methods indexed both the location information and textual keywords of each spatial object to support spatial-keyword queries. Among these methods, the R-tree has been widely extended to support geo-textual data. For instance, authors in [
19] proposed a hybrid indexing structure that maintains classical inverted lists for rare document terms and additional extended R-trees for more frequent geo-textual terms. Similarly, Zhang et al. proposed an Information Retrieval R-tree (IR-tree) [
20] that is the combination of R-tree and inverted files for searching geo-textual data. Some other spatial indexing structures have also been used for geo-textual data. For example, an inverted linear Quadtree (IL-Quadtree) [
21] based on the linear Quadtree and inverted index was presented to deal with the problem of top-k spatial keyword search. In [
22] an inverted-KD tree was developed for indexing geo-textual data.
With the emergences of social networks and location-based services, these geo-textual data often come in a rapid streaming fashion. The indexing solutions for static geo-textual data cannot directly apply to the geo-textual streaming data. As a consequence, recently a few query indexing attempts, which index continuous queries to filter geo-textual streaming data, have been devoted to address this issue. These indexing methods have roughly been categorized into three classes: the keyword-first indexing methods, the spatial-first indexing ones and the adaptive indexing ones. The representative of keyword-first indexing method is RQ-tree [
9]. The RQ-tree first uses ranked-key inverted list that stores least frequent keywords to partition queries into the posting lists. Then multiple Quadtrees are built based on each posting list. On the contrary, the spatial-first indexing methods prioritized the spatial factor for the index construction regardless of the keyword distribution of the query set. For example, IQ-tree [
7] employs a Quadtree to organize queries so that each query is attached to one or multiple Quadtree cells. Every query in each cell is assigned to the posting list of its frequent keyword by a ranked-key inverted list. Similarly, R
-tree [
13] first uses a R-tree to index queries based on their search regions and then each R-tree node records the keywords of its descendant queries for textual filtering purpose. The tree structures of both IQ-Tree [
7] and R
-tree [
13] are only determined by the spatial feature. Thus their overall performances are unavoidably deteriorated for different keyword and location distributions of the query workload. So, zhang et. al proposed an adaptive spatial-textual partition tree (AP-tree [
9]) that uses
f-ary tree structure so that the queries are indexed in an adaptive and flexible way with respect to the query workload. However, current indexing methods for continuous spatial-keyword queries lack the supports for approximate keyword search.
In [
14], authors proposed a
Min-wise signature with linear
Hashing
R-tree (MHR-tree) that is the one by combining R-tree and
signatures to deal with spatial approximate keyword queries. Unlike MHR-tree, our indexing structure is towards continuous spatial approximate keyword queries over streaming data while the MHR-tree is for one-pass queries over static data. And then, the MHR-tree belongs to the spatial-first indexing scheme while our method is the adaptive scheme according to query workload. Furthermore, the MHR-tree is based on a family of
ℓ independent permutations, which may incur high maintenance costs in the case of dynamic continuous queries. Compared with the MHR-tree, our indexing approach can overcome this issue by one-permutation hashing method in [
15] to generate signatures instead of
ℓ independent permutations. Finally, our indexing approach considers the GPU platform.
In contrast to the existing indexing methods for the continuous spatial-keyword queries, this paper focuses on the challenges of (1) approximate search of keywords and (2) enabling a high-performance solution to maintain the computational performance of the proposed indexing approach for streaming data. The proposed method is the first indexing method for parallel processing continuous spatial approximate keyword queries.
3. Materials and Methods
In this section, firstly our problem is formulated, and the background knowledge about AP-tree is provided, and then an advanced AP-tree called AP-tree is proposed. Finally, the GPU-aided AP-tree is presented.
3.1. Problem Formulation
In the following content, firstly the notations of a geo-textual data stream are provided. Then, the definitions about the continuous spatial approximate keyword query over the geo-textual data stream are introduced. Finally, the problem is stated.
Definition 1 (Geo-textual tuple). A geo-textual tuple, denoted as t = (ϕ, , ), is a textual message with geo-location, where ϕ is a set of distinct keywords from a vocabulary set, is a geo-location, and is the timestamp to label the creation time of the tuple.
Definition 2 (Geo-textual data stream). A geo-textual data stream, denoted as S = {|i ∈ [1, +∞) ∧ . ≤ .}, is an unbounded data set of geo-textual tuples in timestamp order.
Definition 3 (Continuous spatial approximate keyword query). A continuous spatial approximate keyword query, defined as q = (ψ, r) where ψ is a set of distinct keywords, and r is a range region, is a long-running query until it is deregistered. A geo-textual tuple t in S matches q if and only if the following two conditions are satisfied: (1) the similarity between and is enough (i.e., sim(, ) ≥ τ where τ is a similarity threshold ∈ [0, 1]), and (2) the is within the .
In this paper, given a set Q of continuous spatial approximate keyword queries, for each incoming tuple t from a geo-textual data stream S, we aim to employ indexing technique over Q to rapidly deliver t to approximate matching queries. Abbreviations summarizes the mathematical notations.
3.2. AP-Tree
Adaptive spatial-textual Partition tree (AP-tree for short) is a f-ary tree where queries are recursively divided by spatial or keyword partitions (nodes).
Given a set of spatial-keyword queries Q, an AP-tree is constructed by employing and methods to recursively divide Q in a top-down manner. Under the assumption that there is a total order among keywords in the vocabulary, the method assigns queries to a node N that is called as a and then partitions the queries into f ordered cuts according to their -th keywords, where is the partition offset of the node N. An ordered cut is an interval of the ordered keywords, denoted as [, ], where and ( ≤ ) are boundary keywords and [, ] is denoted as [] if there is only one keyword in the cut. The method recursively partitions the space region of a node N called as a into f grid cells and pushes each query into corresponding grid cells whose space regions overlap the region of query. One leaf node of AP-tree is called as a that holds at most queries.
As the AP-tree structure is constructed in an adaptive way to query workload by carefully choosing keyword or spatial partitions, two partition methods are measured by a cost model shown in Formula (
1).
where,
P is a partition over a set of queries on one node.
is the expected matching cost about the partition
P. Meanwhile
B is a bucket of the partition,
is the
B’s weight which is the number of queries associated to
B, and
is the probability that
B is explored during the query matching. Given the object workload can be simulated by query workload,
can be introduced as the following equation:
In Equation (
2),
where
is the frequency of keyword
w among all queries in
Q.
is the area of the bucket (i.e., cell)
B and
is the region size of the node
N. The optimal keyword partition of and spatial partition can be achieved by a dynamic programming algorithm and a local improvement heuristic algorithm introduced in [
9], respectively.
In addition, for each keyword node N, a query q is assigned to a dummy cut if N cannot find a cut as there is no enough query keywords, i.e., |q.| . Similarly, each spatial node N has a dummy cell for queries whose region contain the region of N.
An example is given for illustrating the structure of AP-tree in
Figure 1. As we can see, given a set of spatial-keyword queries
Q shown in
Figure 1a, an AP-tree (shown in
Figure 1b) is constructed by employing keyword partition and spatial partition methods to recursively divide
Q in a top-down manner. In our example, the spatial partition first is chosen to generate one
-node with four cells as the spatial partition is more beneficial to pruning data objects than keyword partition based on the query workload in
Figure 1a. Then, according to the queries assigned to
in
-node (i.e.,
,
,
,
and
), one keyword node
-node with three cuts (i.e., [
], [
,
], and [
]) is created by keyword partition. Meanwhile, a spatial node
-node with four cells is generated based on the queries in
(i.e.,
and
). In addition, the cell
and
link to two
, respectively.
Furthermore, for the -node, [] links to a contains and [] points to another new holding . According to the workload of queries (i.e., , , and ) in cut [,], a new -node with two cuts (i.e., [] and []) is created. The -node points to two (i.e, one contains , and the other holds and ). For the -node, the cell and also link to two , individually.
3.3. AP-Tree
In this section, we introduce an advanced AP-tree called AP-tree to support the approximate keyword matching between queries and streaming data tuples.
3.3.1. Developing AP-Tree
The key issue in achieving approximate keyword matching is to define the similarity between the set of query keywords and textual string of a streaming data tuple. The edit distance based on q-grams has been widely applied for approximate string matching (e.g., in [
23,
24,
25,
26]). The main idea behind these methods is to utilize q-grams as signatures to gain strings similar to a query only if they share common signatures (i.e., q-grams) with the query.
In this paper, we attempt to incorporate q-grams into AP-tree to support approximate keyword matching as well. We call the AP-tree embedding q-grams as AP-tree. Since the textual message of one query q in one is a set of distinct and ordered keywords represented as {, ,...,}, one straightforward approach is to embed q-grams of all keywords in q represented as {, ,...,} into AP-tree. In fact, for space saving, we do not have to store q-grams of all keywords of q in a , due to the fact some keywords of one query have been indexed in of AP-tree. To explain this issue, we first present a Lemma:
Lemma 1. Given a query q with a set of ordered keywords {, ...,} in one , the first n keywords (where 0 ≤ n ≤ m) in {, ,...,} can be found in if there are n in ’s parent node and ancestor nodes.
Proof of Lemma 1. According to the construction of AP-tree (see
Section 3.2. AP-tree), the query
q may at most be indexed by
m . The indexing rule is as following: the first keyword
of
q is indexed in one of cuts in one
if the
keyword node exists, and then the following
stores the second keyword
of
q if the
keyword node exists and so on. At most, all keywords can be stored in
. For instance, in
Figure 1, for a query
with keywords {
,
}, the first keyword
has been indexed in
-node, and for another query
with keywords {
,
},
and
have been respectively indexed in
-node and
-node. Meanwhile, in the worst case, there is no one
that indexes any keyword of
q. For example, for the
storing
, both its parent node (i.e.,
-node) and ancestor node (i.e.,
-node) are
. □
Based on Lemma 1, we embed q-grams into one AP-tree using the following rules:
For one q-node, if there is no one belongs to in q-node’s parent node and ancestor nodes belong to keyword node, we store q-grams of its all keywords for each query in q-node.
For one q-node, if there are n nodes belongs to in q-node’s parent node and ancestor nodes, we hold q-grams of the last keywords for each query with m keywords in q-node.
We store q-grams of keywords in each cut in every .
3.3.2. Improving AP-Tree
As can be observed, AP-tree
can effectively support approximate keyword matching by embedding q-grams into
and
. However, according to the description in [
14], the problem with q-grams is that it may introduce high storage overhead and increase the query cost for large sets of keywords. In our setting, this problem also exists when most indexing nodes in AP-tree
are keyword nodes and there are many cuts in most
. To address this issue, we first employ a
signature-based [
27] method in [
14] to reduce the storage cost caused by holding large amount of q-grams. Then, we introduce how to accelerate the procedure of generating the large number of
signatures.
Reducing Storage Cost with Signatures
According to [
14], given a family of
independent permutations
F, for a set
X and any element
x ∈
X, when
is chosen at random from
F, the following equation holds:
where,
is a permutation of
X and
is the location value of
x in the resulted permutation, and
. With
ℓ independent permutations from
F, the
signature of
X is defined as:
Thus, the set resemblance of two sets
A and
B defined as
can be estimated by the similarity of their
signatures
and
defined as
. That means
can be computed by the following equation:
Noted that, since the actual permutation constitutes an expensive operation, a two-universal (2U) hash function in [
28] is used to simulate such permutations.
In our setting, the implementation of ℓ independent permutations is as following:
gain a set of all keywords from the query workload.
extract a universe set of ordered q-grams U with D dimensions from .
randomly generate ℓ permutations {, …, } from U.
Additionally, for
k sets
, …,
, the
signature of union of
, …,
can be computed by combining the
signature of individual sets (see, Equation (
6)).
Using Equation (
4), we can only store the
signature of q-grams of each keyword instead of q-grams to reduce the storage cost of AP-tree
. However, for the case that there are multiple keywords in one cut of one keyword node, there still exist multiple
signatures. We can further reduce the space cost by merging multiple
signatures into one
signature based on Equation (
6). Let
be a q-gram set of keyword W, the scheme is illustrated in
Figure 2. We store
signatures of q-grams of keywords in
query nodes and
keyword nodes. For example, for one
holding
with a keyword set
,
, we only store
that represents the
signature of
since
has been indexed in
-node. Similarly, for anther
holding
with a keyword set
,
, due to the reason there no any keyword node that hold
and
among its parent node and ancestor nodes, we store
signatures of both
and
(i.e.,
and
) into this
. On the other hand, for
, we directly store the
signature of q-grams of the keyword if there is only one keyword in one cut, while the
signature of union of q-gram sets of these keywords is held if there are more than one keyword in one cut. For instance,
-node has two cuts (i.e., [
], [
]), thus we input the
signature of
(i.e.,
) and
(i.e.,
) into
-node. For
-node with three cuts (i.e., [
], [
,
], [
]), we input three
signatures (i.e.,
,
,
). Among three signatures,
is the
signature of
,
is the one of
⋃
, and
is the
signature of
.
Accelerating The Generation of Signatures
In the aforementioned method, we employ a family of
ℓ independent permutations
F to create
signatures of q-grams. However, the major drawback of this
hashing method is the expensive preprocessing cost, as the method requires applying a large number of permutations on the data [
29]. For example, in [
14]
ℓ = 50 permutations have been used for constructing a MHR-tree. According to our previous works, the continuous queries hold dynamic property [
30], thus AP-tree
may suffer from frequent updates, as a result, the
signatures need to be frequently recomputed as well. To address this problem, we used an one-permutation
hashing method in [
15] to generate signatures instead of
ℓ permutations. The one-permutation method breaks the space evenly into
ℓ bins, and stores the smallest nonzero in each bin, instead of only storing the smallest nonzero in each permutation and repeating the permutation
ℓ times. For example in
Figure 3, consider two sets of q-grams
,
⊆ U with D = 12, a sequence of index of one permutation
from U is defined as
I = {0,1,…,11},
,
are two binary (0/1) data vectors for representing locations of the nonzeros in
. We equally divide the sequence
I into three bins and find the smallest nonzero element in each bin to generate
(
) = [0, 5, 8] and
(
) = [1, 6, 8]. Finally, we can get three
signatures of
from
(
) (i.e., 0, 5, 8 ) and three
signatures of
from
(
) (i.e., 1, 6, 8).
3.4. The GPU-Aided AP-Tree Indexing
This section develops an indexing approach aided by GPU. First, we map an AP-tree into the GPU’s memory to form a G-AP-tree. Then, we employ a GPU-aided approach for set similarity join to accelerate approximate keyword matching of G-AP-tree. Finally, a CPU-GPU data communication scheme is used for efficiently processing geo-textual streaming data with G-AP-tree.
3.4.1. Data Structure for G-AP-tree
Operations on matrices, vectors and arrays naturally suit for the GPU architecture [
31]. Our design intends to use a variety of one-dimensional arrays to organize different components of AP-tree
including
,
,
, and ordered keyword trie. Furthermore, to effectively support approximate keyword matching of G-AP-tree
, we utilize a compact
in [
16] to represent q-grams of all keywords in AP-tree
.
The structure of G-AP-tree
is illustrated in
Figure 4. As we can see in
Figure 4, a
, a
, and a
are respectively used to store all
,
and
in AP-tree
. The root node in AP-tree
may be a
or a
, thus we define a root node in G-AP-tree
as an array holds two index entries (i.e., index 1 and index 2) where index 1 points to the
and index 2 links to the
. The value of index 2 is set to −1 and the value of index 1 is an index of
if the root node is
, while the value of index 1 is set to −1 and the value of index 2 is an index of
if the root node is
.
For each , m cells are in turn input into the . Each cell consists of one region reflects its spatial area and one index that points to one corresponding child. Since one child node in AP-tree may be one , or , the index in the may represent one index in the spatial node array, keyword node array or query node array. Similarly, for each , l cuts are in sequence filled into the . Since are generated based on an ordered keyword trie, we also map an ordered keyword trie into GPU memory. Concretely, we use m one-dimensional arrays to store m levels of one ordered keyword trie. Thus, every cut in the holds a pair of <index, triple>. The index has the same function as the index in . The triple is denoted as (L, , ), where L presents the level, is the lower boundary keyword and is the upper boundary keyword in the L-th level of ordered keyword trie. Furthermore, for each query node, n queries at most can be filled into a . Each query holds two parts: a that represents its spatial area and a contains a set of keywords. Noted that, all keywords in both and are represented by signatures as well.
To support approximate keyword matching, we employ a compact matrix structure
in [
16]. As we can see in
Figure 4, a characteristic matrix assigns the value as 1 when a q-gram represented by a row belongs to a keyword represented by a column, and 0 when it does not. Since the characteristic matrix is highly sparse, we employs Compressed Sparse Row (CSR) format mentioned in [
16] to compress our characteristic matrix to fit the GPU memory. Noted that we have made order for all keywords and q-grams in advance. Thus, we only hold index of keywords and q-grams in the GPU memory. Meanwhile, we dynamically maintain the characteristic matrix based on the keywords in the query workload in host side, and then transfer the compact characteristic matrix into the GPU memory.
3.4.2. Parallelising Approximate Keyword Matching
Once the G-AP-tree is constructed in the GPU memory, we can constantly input data segments (see Definition 4) from a geo-textual data stream into GPU, and then use a G-AP-tree to filter data tuples in a segment in parallel.
Definition 4 (Segment). A segment, denoted as g ={,…,}, is a set of m spatial-textual tuples in timestamp order.
During the parallel computing procedure, however, the efficiency of similarity computing of keywords between queries and data tuples based on
hashing is a problem. We proposed a parallel scheme for solving this issue. The parallel scheme is shown in
Figure 5. For simplification, we assume that each data tuple
(1 ≤
i ≤
m) has two keywords. Thus, we first construct
m in parallel based on the first keywords of
and the keywords from the
of G-AP-tree
. Each
stores
signatures of q-grams of keywords from both the streaming data tuple and queries in G-AP-tree
. All
are constructed in parallel as well. The construction procedure of one
is shown in Algorithm 1.
Algorithm 1: Constructing a signature matrix on GPU |
- 1
Construction_Procedure (w, T)/*Input:wis one keyword of data tuple,Tis a G-AP-tree */ /* Output:Mis a signature matrix */ /*Construct a q-gram-keyword matrix */ - 2
Retrieve n signatures of keywords from T for matching w. - 3
Transfer n signatures into n q-gram-keyword vectors (, ,…, ) with the . - 4
Compute the signature of w and transfer it to a q-gram-keyword vector l with the . - 5
Assemble one signature matrix M = {l, , ,…, }.
|
Furthermore, we can simultaneously execute the approximate matching of signatures between the first keyword of the streaming data tuple and the keywords of queries in G-AP-tree by computing the similarity among the signatures in one . After then, we execute the same procedure for the second keywords of all streaming data tuples.
3.4.3. A Communication Scheme for Processing Geo-Textual Streaming Data with G-AP-Tree
In our setting, a G-AP-tree
is utilized to continuously filter data segments from a geo-textual data stream
S. However, some issues complicate the efficient use of GPU for the streaming data filtering. First, CPU and GPU have separate memories, we have to explicitly partition the streaming data into segments and copy them into the GPU memory. Nevertheless, the efficient partitioning is not always straight-forward. Then, the PCI-E link that connects the two memories has limited bandwidth can often be a bottleneck, starving GPU cores from their data. Finally, the high bandwidth of GPU memory can only be exploited when GPU threads executing at the same time access memory in a coalesced fashion, where the threads simultaneously access adjacent memory locations. For efficient streaming data filtering, we applied “
” [
17] that is a data communication scheme between CPU and GPU to address the above issues.
can use a four-stage pipeline with an automated prefetching method to (i) optimize CPU-GPU communication and (ii) optimize GPU memory accesses.
In our setting, to filter a segment g, which is represented by an array, the four-stage pipeline is as following:
Prefetch address generation: transforming the read accesses to the g array to instead store the addresses in a CPU-side address buffer
Data assembly: using a CPU thread to fetch the corresponding data element from the g array for each address in the address buffer and placing it in one prefetch buffer, which also must be a pinned buffer.
Data transfer: the data transfer stage is executed by the GPU streaming engine, transferring data from CPU-side prefetch buffer to GPU-side data buffer.
Kernel computation: filtering data tuples in data buffer using G-AP-tree.
For a data stream
S = {g
, g
, g
, g
, …}, the four-stage pipeline is illustrated by
Figure 6.