Online Grammar Compression for Frequent Pattern Discovery

Fukunaga, Shouhei; Takabatake, Yoshimasa; Tomohiro, I; Sakamoto, Hiroshi

Computer Science > Data Structures and Algorithms

arXiv:1607.04446 (cs)

[Submitted on 15 Jul 2016 (v1), last revised 31 Aug 2016 (this version, v3)]

Title:Online Grammar Compression for Frequent Pattern Discovery

Authors:Shouhei Fukunaga, Yoshimasa Takabatake, I Tomohiro, Hiroshi Sakamoto

View PDF

Abstract:Various grammar compression algorithms have been proposed in the last decade. A grammar compression is a restricted CFG deriving the string deterministically. An efficient grammar compression develops a smaller CFG by finding duplicated patterns and removing them. This process is just a frequent pattern discovery by grammatical inference. While we can get any frequent pattern in linear time using a preprocessed string, a huge working space is required for longer patterns, and the whole string must be loaded into the memory preliminarily. We propose an online algorithm approximating this problem within a compressed space. The main contribution is an improvement of the previously best known approximation ratio $\Omega(\frac{1}{\lg^2m})$ to $\Omega(\frac{1}{\lg^*N\lg m})$ where $m$ is the length of an optimal pattern in a string of length $N$ and $\lg^*$ is the iteration of the logarithm base $2$. For a sufficiently large $N$, $\lg^*N$ is practically constant. The experimental results show that our algorithm extracts nearly optimal patterns and achieves a significant improvement in memory consumption compared to the offline algorithm.

Comments:	14 pages
Subjects:	Data Structures and Algorithms (cs.DS)
Cite as:	arXiv:1607.04446 [cs.DS]
	(or arXiv:1607.04446v3 [cs.DS] for this version)
	https://doi.org/10.48550/arXiv.1607.04446

Submission history

From: Hiroshi Sakamoto [view email]
[v1] Fri, 15 Jul 2016 10:42:20 UTC (1,689 KB)
[v2] Tue, 19 Jul 2016 06:25:36 UTC (1,689 KB)
[v3] Wed, 31 Aug 2016 02:01:47 UTC (1,692 KB)

Computer Science > Data Structures and Algorithms

Title:Online Grammar Compression for Frequent Pattern Discovery

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Data Structures and Algorithms

Title:Online Grammar Compression for Frequent Pattern Discovery

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators