Wavelet Tree Ioi
Wavelet Tree Ioi
10, 19–37
19
© 2016 IOI, Vilnius University
DOI: 10.15388/ioi.2016.02
Abstract. The wavelet tree is a data structure to succinctly represent sequences of elements over
a fixed but potentially large alphabet. It is a very versatile data structure which exhibits interest-
ing properties even when its compression capabilities are not considered, efficiently supporting
several queries. Although the wavelet tree was proposed more than a decade ago, it has not yet
been widely used by the competitive programming community. This paper tries to fill the gap
by showing how this data structure can be used in classical competitive programming problems,
discussing some implementation details, and presenting a performance analysis focused in a
competitive programming setting.
Key words: wavelet tree, data structures, competitive programming, quantile query, range query.
1. Introduction
Let = (1 ) be a sequence of integers and consider the following query over .
Query 1. Given a pair of indices ( ) and a positive integer , compute the value of the
-th smallest element in the sequence ( +1 ).
Notice that Query 1 essentially asks for the value of the element that would occupy
the -th position when we sort the sequence ( +1 ). For example, for the se-
quence = (3 7 5 2 3 2 9 3 5) and the query having ( ) = (3 7) and = 4,
the answer would be 5, as if we order sequence (3 4 5 6 7) = (5 2 3 2 9) we
would obtain (2 2 3 5 9) and the fourth element in this sequence is 5. Consider now
the following update query.
Query 2. Given an index , swap the elements at positions and + 1.
That is, if = (3 7 5 2 3 2 9 3 5) and we apply Query 2 with index 5, we
would obtain the sequence 0 = (3 7 5 2 2 3 9 3 5).
Consider now a competitive programming setting in which an initial sequence of
10 elements with integer values in the range [ − 109 109] is given as input. Assume that
6
20 R. Castro et al.
a sequence of 105 queries, each query of either type 1 or type 2, is also given as input.
The task is to report the answer of all the queries of type 1 considering the applications
of all the update queries, every query in the same order in which they appear in the
input. The wavelet tree (Grossi, 2015) is a data structure that can be used to trivially
solve this task within typical time and memory limits encountered in programming
competitions.
The wavelet tree was initially proposed to succinctly represent sequences while still
being able to answer several different queries over this succinct representation (Grossi
et al., 2003; Navarro, 2014; Grossi, 2015). Even when its compression capabilities are
not considered, the wavelet tree is a very versatile data structure. One of the main fea-
tures is that it can handle sequences of elements over a fixed but potentially large alpha-
bet; after an initial preprocessing, the most typical queries (as Query 1 above) can be
answered in time (log σ), where σ is the size of the underlying alphabet. The prepro-
cessing phase usually constructs a structure of size ( × log σ) for an input sequence
of elements, where is a factor that will depend on what additional data structures
we use over the classical wavelet tree construction when solving a specific task.
Although it was proposed more than a decade ago (Grossi et al., 2003), the wave-
let tree has not yet been widely used by the competitive programming community.
We conducted a social experiment publishing a slightly modified version of Query 1
in a well known Online-Judge system. We received several submissions from experi-
enced competitive programmers but none of them used a wavelet tree implementation
to solve the task. This paper tries to fill the gap by showing how this structure can be
used in classical (and no so classical) competitive programming tasks. As we will show,
its good performance to handle big alphabets, the simplicity of its implementation, plus
the fact that it can be smoothly composed with other typical data structures used in
competitive programming, give the wavelet tree a considerable advantage over other
structures.
Navarro (2014) presents an excellent survey of this data structure showing the most
important practical and theoretical results in the literature plus applications in a myriad
of cases, well beyond the one discussed in this paper. In contrast to Navarro’s survey,
our focus is less on the properties of the structure in general, and more on its practical
applications, some adaptations, and also implementation targeting specifically the issues
encountered in programming competitions. Nevertheless, we urge the reader wanting to
master wavelet trees to carefully read the work by Navarro (2014).
The wavelet tree (Grossi, 2015) is a data structure that recursively partitions a sequence
into a tree-shaped structure according to the values that contains. In this tree, every
node is associated to a subsequence of . To construct the tree we begin from the root,
which is associated to the complete sequence . Then, in every node, if there are two
or more distinct values in its corresponding sequence, the set of values is split into two
non-empty sets, and ; all the elements of the sequence whose values belong to
Wavelet Trees for Competitive Programming 21
form the left-child subsequence; all the elements whose values belong to form the
right-child subsequence. The process continues recursively until a leaf is reached; a leaf
corresponds to a subsequence in which all elements have the same value, and thus no
partition can be performed.
Fig. 1 shows a wavelet tree constructed from the sequence
= (3 3 9 1 2 1 7 6 4 8 9 4 3 7 5 9 2 7 3 5 1 3)
We split values in the first level into sets = f1 4g and = f5 9g.
Thus the left-child of is associated to 0 = (3 3 1 2 1 4 4 3 2 3 1 3) If we
continue with the process from this node, we can split the values into 0 = f1 2g and
0 = f3 4g. In this case we obtain as right child a node associated with the sequence
00 = (3 3 4 4 3 3 3). Continuing from 00, if we split the values again (into sets
f3g and f4g), we obtain the subsequence (3 3 3 3 3) as left child and (4 4) as right
child, and the process stops.
For simplicity in the exposition, given a wavelet tree we will usually talk about
nodes in to denote, interchangeably, the actual nodes that form the tree and the sub-
sequences associated to those nodes. Given a node in , we denote by Left () its
left-child and by Right () its right-child in . The alphabet of the tree is the set of
different values that its root contains. We usually assume that the alphabet of a tree is a
set Σ = f1 2 σg. Without loss of generality, and in order to simplify the partition
process, we will assume that every node in has an associated value m () such that
Left () contains the subsequence of composed of all elements of with values
≤ m (), and Right () the subsequence of composed of all elements with values
m (). (In Fig. 1 the value m () is depicted under every node.) We can also as-
sociate to every node in , two values l () and r (), such that corresponds to
the subsequence of the root of containing all the elements whose values are in the
range [l () r ()]. Notice that a wavelet tree with alphabet f1 σg has exactly
σ leaves. Moreover, if the construction is done splitting the alphabet into halves in every
node, the depth of the wavelet tree is (log σ).
As we will see in Section 4, when implementing a wavelet tree the complete infor-
mation of the elements stored in each subsequence of the tree is not actually necessary.
But before giving any details on how to efficiently implement the wavelet tree, we use
the abstract description above to show the most important operations over this data
structure.
The most important abstract operation to traverse the wavelet tree is to map an index in
a node into the corresponding indexes in its left and right children. As an example, let
be the root node of wavelet tree in Fig. 1, and 0 = Left (). Index 14 in (marked
in the figure with a solid line) is mapped to index 8 in 0 (also marked in the figure with
a solid line). That is, the portion of sequence from index 1 to index 14 that is mapped
to its left child, corresponds to the portion of sequence 0 from index 1 to 8. On the other
hand, index 14 in root sequence is mapped to index 6 in Right ().
We encapsulate the operations described above into two abstract functions,
mapLeft ( ) and mapRight ( ), for an arbitrary non-leaf node of . In Fig. 1,
if is the root, 0 = Left () and 00 = Right ( 0), then we have mapLeft (
14) = 8, mapRight ( 0 8) = 5 and mapLeft ( 00 5) = 3 (all indexes marked
with solid lines in the figure). Function mapLeft ( ) is essentially counting how
many elements of until index are mapped to the left-child partition of . Similarly
mapRight ( ) counts how many elements of until index are mapped to the right-
child partition of .
As we will describe in Section 4, these two operations can be efficiently implemented
(actually can be done in constant time). But before going into implementation details, we
show how mapLeft and mapRight can be used to answer three different queries by
traversing the wavelet tree, namely, rank, range quantile, and range counting.
2.1. Rank
The rank is an operation performed over a sequence that counts the occurrences of
value until an index of . It is usually denoted by rank( ). That is, if = (1
) then
repeat this process until we reach a leaf node; if we reach a leaf with this process, we
know that rank( ) = .
In Fig. 1 the execution of rank3( 14) is depicted with solid lines. We map index
14 down the tree using either mapLeft or mapRight depending on the m value
of every node in the path. We first map 14 to 8 (to the left), then 8 to 5 (to the right) and
finally 5 to 3 (to the left), reaching a leaf node. Thus, the answer to rank3( 14) is 3.
Rank is computed by performing (log σ) calls to (either) mapLeft or mapRight ,
thus the time complexity is ( × log σ) where is the time needed to compute the
map functions. Also notice that a rank operation that counts the occurrences of be-
tween indexes and can be computed by rank( ) − rank( − 1), and thus the
time complexity is also ( × log σ).
The range quantile operation is essentially Query 1 described in the introduction: given a
sequence = (1 ), quantile( ) is the value of the -th smallest element
in the sequence ( +1 ). For instance in Fig. 1 for the root sequence we have
that quantile6( 7 16) = 7 It was shown by Gagie et al. (2009) that wavelet trees
can efficiently solve this query.
To describe how the wavelet tree can solve quantile queries, lets begin with a simpler
version. Assume that = 1 and thus, we want to find the -th smallest element among
the first elements in . Then having a wavelet tree for , quantile( 1 ) can
be easily computed as follows. Let = mapLeft ( ). Recall that mapLeft ( )
counts how many elements of until index are mapped to the left-child of . Thus if
≤ then we know for sure that the element that we are searching for is in the left subtree,
and can be computed as
On the other hand, if then the element that we are searching for is in the right
subtree, but it will no longer be the -th smallest in Right () but the ( − )-th small-
est and thus can be computed as
This process can be repeated until a leaf node is reached, in which case the answer is
the (single) value stored in that leaf.
When answering quantile( ) the strategy above generalizes as follows. We
first compute = mapLeft ( ) − mapLeft ( − 1). Notice that is the number of
elements of from index to index (both inclusive) that are mapped to the left. Thus,
if ≤ then the element we are searching for is in the leftchild of between the indexes
mapLeft ( − 1) + 1 and mapLeft ( ), and thus the answer is
24 R. Castro et al.
As before, the process is repeated until a leaf node is reached, in which case the an-
swer is the value stored in that leaf. In Fig. 1 the complete execution of quantile6(
7 16) is depicted with dashed boxes in every visited node.
As for the case of the rank operation, quantile can be computed in time ( × log σ)
where is the time needed to compute the map functions.
The range counting query range[]( ) counts the number of elements with values
between and in positions from index to index . That is, if = (1 ) then
Note that if range is called on a leaf node , then l () = r () = , so the inter-
val is either completely contained (if 2 [ ]) or completely outside (if 62 [ ]).
Both cases are already considered.
Wavelet Trees for Competitive Programming 25
It is not difficult to show that for a range counting query, we have to make at most
(log σ) recursive calls (Gagie et al. (2012) show detailed proof), and thus the time
complexity is, as for rank and quantile, ( × log σ) where is the time needed to
compute the map functions.
We now discuss some simple update queries over wavelet trees. The idea is to shed light
on the versatility of the structure to support less classical operations. We looked for
inspiration in typical operations found in competitive programming problems to design
update queries that preserve the global structure of the wavelet tree. We only describe
the high level idea on how these queries can be adopted by the wavelet tree, and we later
(in Section 4) discuss on how to efficiently implement them.
Consider Query 2 in the introduction denoted by swap( ). That is, a call to swap( )
changes = (1 ) into a sequence (1 +1 ).
The operation swap( ) can be easily supported by the wavelet tree as follows.
Assume first that ≤ m (). Then we have two cases depending on the value of +1.
If +1 m (), we know that is mapped to the left subtree while +1 is mapped
to the right subtree. This means that swapping these two elements does not modify any
of the nodes of the tree that are descendants of . In order to modify , besides actu-
ally swapping the elements, we should update mapLeft ( ) and mapRight ( );
mapLeft ( ) should be decremented by 1 and mapRight ( ) should be incre-
mented by 1 as the new element in position is now mapped to the right subtree. Notice
that these are the only two updates that need to be done to the map functions.
The other case is if +1 ≤ m (). Notice that both and +1 are mapped to
Left (), and moreover, they are mapped to contiguous positions in that sequence.
In this case, no update should be done to mapLeft ( ) or mapRight ( ). Thus,
besides actually swapping the elements in , we should only recursively perform the
operation swap(Left () mapLeft ( )) The case in which m () is sym-
metrical. The complete process is repeated until a leaf node is reached, in which case
nothing should be done.
To perform the swap in the worst case we would need to traverse from top to bottom
of the wavelet tree. Moreover, notice that the map functions mapLeft and mapRight
are updated in at most one node. Thus the complexity of the process is ( × log σ +
) where is the time needed to update mapLeft and mapRight , and is the time
needed to compute the map functions.
26 R. Castro et al.
Assume that every element in a sequence has two possible states, active or inactive,
and that an operation toggle( ) is used to change the state of element from active
to inactive, or from inactive to active depending on its current state. Given this setting,
we want to support all the queries mentioned in Section 2, but only considering active
elements. For example, assume that = (1 2 1 3 1 4) and only the non 1 elements
are active. Then a query quantile2 ( 1 6) would be 3.
A simple augmentation of the wavelet tree can be used to support this update. Besides
mapLeft and mapRight , we use two new mapping/counting functions activeLeft
and activeRight . For a node and an index , activeLeft ( ) is the number
of active elements until index that are mapped to the left child of , and similarly
activeRight ( ) is the number of active elements mapped to the right child. Be-
sides this we can also have a count function for the leaves of the tree, activeLeaf (
), that counts the number of active elements in a leaf until position . We next show
how these new mapping functions should be updated when a toggle operation is per-
formed. Then we describe how the queries in Section 2 should be adapted.
Upon an update operation toggle( ) we proceed as follows. If ≤ m () then
we should update the values of activeLeft ( ) for all ≥ adding 1 to
activeLeft ( ) if was previously inactive, or substracting 1 in case was pre-
viously active. Now, given that is mapped to the left child of , we proceed recur-
sively with toggle(Left () mapLeft ( )). If m (), we proceed sym-
metrically updating activeRight ( ) for ≥ , and recursively calling
toggle(Right () mapRight ( )). We repeat the process until a leaf is reached,
in which case activeLeaf should also be updated (similarly as for activeLeft ).
The complexity of the toggle operation is then (( + ) × log σ), where is the time
needed to update activeLeft and activeRight in every level (plus activeLeaf
in the last level), and is the time needed to compute the map functions mapLeft
and mapRight .
Consider now the quantile( ) query. Recall that for this query we first com-
puted a value representing the number of elements of from index to index that are
mapped to the left. If ≤ we proceeded searching for quantile in the left subtree,
and if ≥ we proceeded searching for quantile( − ) in the right subtree (mapping
indexes and accordingly in both cases). In order to consider the active/inactive state
of each element, we only need to change how is computed; we need to consider now
how many active elements from index to index are mapped to the left, and thus is
computed as
between and in is not less than (which can be easily checked using activeLeft
and activeRight ).
Queries rank and range[] are even simpler. In the case of rank we only need
to consider the active elements when we reach a leaf; in the last query rank( ) in a
leaf , we just answer activeLeaf ( ). In the case of range[]( ), we almost
keep the recursive strategy as before but now when [l () r ()] is totally contained
in [ ] we only have to consider the number of active elements between index and
index , which is computed as
3.3. Adding and Deleting Elements from the Beginning or End of the Sequence
mapLeft ( + 1) = mapLeft ( ) + 1
mapRight ( + 1) = mapRight ( )
and finally set mapLeft ( 1) = 1 and mapRight ( 1) = 0 and perform the call
pushFront(Left () ). If m () then we should do
mapLeft ( + 1) = mapLeft ( )
mapRight ( + 1) = mapRight ( ) + 1
and finally set mapLeft ( 1) = 0 and mapRight ( 1) = 1 and perform the call
pushFront(Right () ). When a leaf node is reached we just add at the begin-
ning of the corresponding sequence. The popFront() operation is similar. Let jj =
. If 1 ≤ m () then we should update mapLeft ( ) to mapLeft ( + 1) −
1, and mapRight ( ) to mapRight ( + 1) for all from 1 to − 1, and then do
popFront(Left ()). Symmetrically if 1 m () then we should update mapLeft (
) to mapLeft ( + 1), and mapRight ( ) to mapRight ( + 1) − 1 for all
from 1 to − 1, and then do popFront(Right ()). Upon reaching a leaf node, we just
delete the value from the front.
The complexity of all the operations above is (( + ) × log σ) where is the
time needed to update mapLeft or mapRight in every level, and is the time needed
to compute the map functions. Just notice that for the cases of the pushFront and pop-
Front we have to update several values of mapLeft and mapRight per level.
4. Implementation
In this section we explain how to build a wavelet tree and how to construct the auxiliary
structures to support the mapping operations efficiently. Based on this construction we
also discuss how to implement queries explained in the previous section. Additionally,
we present an implementation strategy alternative to the direct pointer based one. We
implemented both approaches in C++ and the code is available in github1.
1
https://github.com/nilehmann/wavelet-tree
Wavelet Trees for Competitive Programming 29
4.1. Construction
We now briey discuss how every operation in Section 2 can be efficiently implemented.
mapLeft and mapRight. These two operations can be easily implemented with the array
; in a node the number of elements until position that go to the left is − [] +
1. Since we are indexing from 0, position is mapped to the left to position − [].
Analogously, a position is mapped to the right to position [] − 1. Notice that both
mapping functions can thus be computed in constant time, which implies that rank,
quantile and range operations can be implemented in (log σ) time.
swap. The swap operation first map the position down the tree until we reach a node
where the update needs to be performed. At this point the (virtual) bitvector is such
that [] 6= [ + 1]. Swapping both bits can only change the count of 1’s until posi-
tion , and thus, only [] should be updated. If [] = 0 we do [] = [] + 1, and
if [] = 1 we do [] = [] − 1. This shows that the map functions can be updated
in constant time after a swap operation, which implies that the complexity of swap is
also (log σ).
toggle. In this case we only need to implement activeLeft, activeRight and ac-
tiveLeaf. To mark which positions are active we can use any data structure represent-
ing sequences of 0’s and 1’s that efficiently supports partial sums and point updates. For
example we can use a binary indexed tree (BIT) (Fenwick, 1994) which is a standard
data structure used in competitive programming that supports both operations in (log )
time. Thus with a BIT we are adding a logarithmic factor for each query and now rank,
quantile and range operations as well as toggle can be implemented in (log ×
log σ). In terms of construction, when using a BIT in every level we are only paying a
constant factor in the size of the wavelet tree.
pushBack and popBack. These operations only modify the array in some nodes.
Pushing an element at the end updates the (virtual) bitvector appending a new 0 or 1
(depending on the comparison between the new element and ), so being a partial
sum of of size only needs a [] = [ − 1] + [ − 1] update. Popping
an element from the end updates and doing the inverse operation, so if is of
size we only need to delete [ − 1] from memory. Both operations can be done in
amortized constant time using a dynamic array, thus the complexity of all queries plus
pushBack and popBack is (log σ) time.
pushFront and popFront. These are similar to pushBack and popBack, but act at the
beginning of the bitvector . To prepend a bit to a bitvector we must prepend its
value to . If the value of is equal to 1 we must also increment by 1 every value in
. Because it is too slow to update every position of , we define a counter δ that
starts at 0 and is incremented by 1 every time a bit equal to 1 is prepended. We then just
prepend − δ to , in which case the real count of ones until position is obtained
by [] + δ. Popping an element is as easy as deleting the first element of from
Wavelet Trees for Competitive Programming 31
In a competitive programming setting the size of the array will depend on time restric-
tions, but typically it will not exceed 106. However the number of possible values that
can store could be without any problems around 109. Thus the number of values actually
appearing in is much smaller than the range of possible values. For this reason one
usually have to map the values that appear in the sequence to a range [0 σ − 1]. Com-
monly, this will require a fairly fast operation to translate from one alphabet to the other
with a typical implementation using, for example, a binary search tree or a sorted array
combined with binary search.
To avoid having this map operation, the wavelet tree could be constructed directly
over the range of all possible values allowing the subsequences of some nodes to be emp-
ty. A naive pointer-based construction will require (σ) words which might be excessive
for σ = 109. Because many nodes will represent empty subsequences, one can save some
space explicitly tracking when some subsequences become empty in the tree.
There is an alternative implementation of the wavelet tree called wavelet matrix
(Claude et al., 2015) that was specifically proposed in the literature to account for big
alphabets. Given an alphabet its size can be extended to match the next power of two,
yielding a complete binary tree for the wavelet tree representation. For each level,
we could then concatenate the bitvectors of each node in that level and represent the
structure with a single bitvector by level. The border between each node is lost, but it
can be computed on the fly when traversing. This means extra queries yielding worse
performance. Instead, the wavelet matrix breaks the restriction that in each level sib-
lings must be represented in contiguous positions in the bitvector. When partitioning
a node at some level the wavelet matrix sends all zeroes to the left section of level
+ 1 and all ones to the right. The left and the right child of some node at level do
not occupy contiguous positions in the bitvector at level + 1, but the left (resp. right)
child is represented in contiguous positions in the left (resp. right) section of the level
+ 1. Additionally, a value is maintained at each level to mark how many elements
were mapped to the left.
With this structure the traversing operations can be directly implemented by per-
forming rank operations on bitvectors at each level. Specifically, instead of maintaining
an array for every node, we maintain an array for each level. Array store the
cumulative number of 1’s in level . Then, a position at level is mapped to the left
to position − [] at level + 1. The same position is mapped to the right to position
+ [] − 1.
32 R. Castro et al.
The wavelet matrix has the advantage of being implementable using only (log σ)
extra words of memory instead of the (σ) used to store the tree structure in the pointer
based alternative while maintaining fast operations. This (log σ) words are insignifi-
cant even for σ = 109, which means that the structure could be constructed directly over
the original alphabet. On the other hand the wavelet matrix is somehow less adaptable,
because it does not support directly the pop and push updates. However, it can support
swap and toggle in a similar way as the one described for the wavelet tree.
2
http://www.spoj.com/
3
http://codeforces.com/blog/entry/17787
4
http://d.hatena.ne.jp/sune2/20131216/1387197255
5
This data considers only until late March 2016.
Wavelet Trees for Competitive Programming 33
Table 1
General submission statistics
ILKQ1 49 9 19 18 3
ILKQ2 32 6 15 8 3
ILKQ3 35 2 12 15 6
We received submissions from several type of users, several of them can be considered
as experienced programmers. From them, even expert coders (rank 100 or better on
SPOJ) got lot of Wrong Answers (WA) or Time Limit Exceeded (TLE) verdicts which
shows the intrinsic difficulty of the problems. Considering the three problems, 5 out
of the 10 distinct users who got an Accepted (AC) veredict have rank of 60 or better
on SPOJ, and 8 are well-known ACM-ICPC World finalists. For problem ILKQ3 we
received only two AC. Both users solved the problem after several WA or TLE verdicts.
For ILKQ1 and ILKQ2 the best ranked submitter was the top 1 user in SPOJ who ob-
tained AC in both problems. For ILKQ3, the best ranked submitter was among the top 5
in SPOJ and obtained only TLE veredicts.
For ILKQ2 and ILKQ3 sorting of queries or any offline approach is not directly
useful as queries are mixed with updates. For ILKQ2 we received some submissions
implementing a square root decomposition strategy, and run extremely close to the time
limit. The most successful strategy in both problems was the use of ideas coming from
persistent data structures, in particular persistent segment trees6. As in any persistent
structure, the main idea is to efficiently store different states of it. Exploiting the fact that
consecutive states do not differ in more than (log ) nodes, it is possible to keep dif-
ferent segment trees in ( log ) space. Persistent segment trees can be used to answer
quantile queries but need some more work to adapt them for updates like swaps as in
ILKQ3. The two correct solutions that we received for ILKQ3 make use of this structure.
It’s relevant to notice that given the input size and the updates, implementing a persistent
segment tree for this problem can use a considerable amount of memory. In particular,
one of the AC submissions used 500MB and the other 980MB. Our wavelet tree solution
uses only 4MB of memory.
6. Performance Tests
Existing experimental analyses about wavelet trees focus mostly on compression char-
acteristics (Claude et al., 2015). Moreover, they do not consider the time required to
build the structure because from the compression point of view the preprocessing time is
not the most relevant parameter. Thus, we conducted a series of experiments focusing on
a competitive-programming setting where the building time is important and restrictions
on the input are driven by typical tight time constraints. The idea is to shed some light on
how far the input size can be pushed. We expect these results to be useful for competitors
as well as for problem setters.
We performed experimental tests for our wavelet tree and wavelet matrix implemen-
tations comparing construction time and the performance of rank, quantile and range
counting queries. We consider only alphabets of size less than the size of the sequence.
To analyze the impact of the alphabet size, we performed tests over sequences of dif-
ferent profiles. A profile is characterized by the ratio between the size of the alphabet
and the size of the sequence. For example, a sequence of size 103 and profile 05 has an
alphabet of size 500.
Measurements. To measure construction time we generated random sequences of in-
creasing size for different profiles. For each size and profile we generated 1,000 se-
quences and we report the average time. For queries rank, quantile and range counting,
we generated 100,000 queries uniformly distributed and averaged their execution time.
The machine used is an Intel® Core™ i7-2600K running at 3.40GHz with 8GB of RAM
memory. The operating system is Arch-Linux running kernel 4.4.4. All our code are
single-threaded and implemented in C++. The compiler used is gcc version 5.3.0, with
optimization flag -O2 as customary in many programming contests.
6
bit.ly/persistent-segment-tree
Wavelet Trees for Competitive Programming 35
Results. No much variance was found in the performance between different profiles, but
as may be expected sequences of profile 1 – i.e., permutations – reported higher time in
construction and queries. Thus, we focus on the analysis of permutations to test perfor-
mance on the most stressing setting. For the range of input tested we did not observe big
differences between the wavelet tree and the wavelet matrix, both for construction and
query time. Though there are little differences, they can be attributed to tiny implemen-
tation decision and not to the implementation strategy itself.
Regarding the size of the input (Fig. 2), construction time stays within the order
of 250 milliseconds for sequences of size less than or equal to 106, but scales up to
2 seconds for sequences of size 107, which can be prohibitive for typical time con-
straints. For the case of queries rank, quantile and range counting we report in Fig. 3
the number of queries that can be performed in 1 second for different sizes of the input
sequence. For rank and quantile, around 106 queries can be performed in 1 second for
an input of size 106. In contrast for range counting, only 105 queries can be performed
in the same setting (Fig. 3).
It would be interesting as future work to perform a deep comparison between the
wavelet tree and competing structures for similar purposes such as mergesort trees
and persistent segment trees, testing time and memory usage. From our simple analy-
sis in the previous section one can infer that wavelet trees at least scales better in
terms of memory usage, but more experimentation should be done to draw stronger
conclusions.
7. Concluding Remarks
Problems involving advanced data structures are appearing increasingly often in world-
wide programming competitions. In this scenario, competitive programmers often prefer
versatile structures that can be used for a wide range of problems without making a lot of
changes. Structures such as binary indexed trees or (persistent) segment trees, to name
a few, conform part of the lower bound for competitors, and must be in the toolbox of
any programmer. The wavelet tree has proven to be a really versatile structure but, as
we have evidenced, not widely used at the moment. However, we have noted that some
programmers have already perceived the virtues of the wavelet tree. We believe that the
wavelet tree, being quite easy to implement, and having such amount of applications, is
probably becoming a structure that every competitive programmer should learn. With
this paper we try to fill the gap and make wavelet trees widely available for the competi-
tive programming community.
Acknowledgments
J. Pérez is supported by the Millennium Nucleus Center for Semantic Web Research,
Grant NC120004, and Fondecyt grant 1140790.
References