Skip to content

Bidirectional string anchors: a New String Sampling Mechanism.

License

Notifications You must be signed in to change notification settings

solonas13/bd-anchors

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

bd-anchors: Bidirectional String Anchors

Bidirectional string anchors (bd-anchors) is a new string sampling mechanism. Given a positive integer , the mechanism selects the leftmost lexicographically smallest rotation in every sliding window of length of the input text.

Bd-anchors samples are approximately uniform, locally consistent, and computable in O(n) time, for any input text of length n and any --- our current implementation supports an O(nℓ)-time construction.

Our experiments using several datasets show that the bd-anchors sample sizes decrease proportionally to ; and that these sizes are competitive to or smaller than the minimizers sample sizes using the analogous sampling parameters. For instance, for the Chromosome 1 of human genome, which is of length n = 230,481,390, and ℓ = 500 (resp. 1000), the set A of order- bd-anchors is of size 1,560,882 (resp. 897,953).

Constructing the Sample: Our current implementation takes O(nℓ) time. To compile the program, change to directory bd-construct and follow the instructions given in file INSTALL.

We inject bd-anchors in two problems:

Text Indexing: Our index has size n bytes + O(|A|) integers and supports locate operations for any pattern of length at least in near-optimal time (bd-index-grid) --- the time supported in the bd-index implementation is not bounded, but bd-index is considerably faster in practice, especially when the number of occurrences is high. To compile the program, change to directory bd-index or bd-index-grid and follow the instructions given in file INSTALL.

Top-K Similarity Search under Edit Distance: To compile the program, change to directory bd-search and follow the instructions given in file INSTALL.

When publishing work that is based on the results from bd-anchors please cite:

G. Loukides, S. P. Pissis, M. Sweering:
Bidirectional String Anchors for Improved Text Indexing and Top-K Similarity Search. 
IEEE Trans. Knowl. Data Eng. DOI: 10.1109/TKDE.2022.3231780
G. Loukides and S. P. Pissis:
Bidirectional String Anchors: a New String Sampling Mechanism. 
ESA 2021: 64:1-64:21. DOI: 10.4230/LIPIcs.ESA.2021.64

License: GNU GPLv3 License; Copyright (C) 2021 Grigorios Loukides and Solon P. Pissis.

About

Bidirectional string anchors: a New String Sampling Mechanism.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published