Text mining in unclean, noisy or scrambled datasets for digital forensics analytics
K Xylogiannopoulos, P Karampelas… - … and security informatics …, 2017 - ieeexplore.ieee.org
K Xylogiannopoulos, P Karampelas, R Alhajj
2017 European intelligence and security informatics conference (EISIC), 2017•ieeexplore.ieee.orgIn our era, most of the communication between people is realized in the form of electronic
messages and especially through smart mobile devices. As such, the written text exchanged
suffers from bad use of punctuation, misspelling words, continuous chunk of several words
without spaces, tables, internet addresses etc. which make traditional text analytics methods
difficult or impossible to be applied without serious effort to clean the dataset. Our proposed
method in this paper can work in massive noisy and scrambled texts with minimal …
messages and especially through smart mobile devices. As such, the written text exchanged
suffers from bad use of punctuation, misspelling words, continuous chunk of several words
without spaces, tables, internet addresses etc. which make traditional text analytics methods
difficult or impossible to be applied without serious effort to clean the dataset. Our proposed
method in this paper can work in massive noisy and scrambled texts with minimal …
In our era, most of the communication between people is realized in the form of electronic messages and especially through smart mobile devices. As such, the written text exchanged suffers from bad use of punctuation, misspelling words, continuous chunk of several words without spaces, tables, internet addresses etc. which make traditional text analytics methods difficult or impossible to be applied without serious effort to clean the dataset. Our proposed method in this paper can work in massive noisy and scrambled texts with minimal preprocessing by removing special characters and spaces in order to create a continuous string and detect all the repeated patterns very efficiently using the Longest Expected Repeated Pattern Reduced Suffix Array (LERP-RSA) data structure and a variant of All Repeated Patterns Detection (ARPaD) algorithm. Meta-analyses of the results can further assist a digital forensics investigator to detect important information to the chunk of text analyzed.
ieeexplore.ieee.org
Showing the best result for this search. See all results