[PDF][PDF] Not all character n-grams are created equal: A study in authorship attribution
Proceedings of the 2015 conference of the North American chapter of …, 2015•aclanthology.org
Character n-grams have been identified as the most successful feature in both singledomain
and cross-domain Authorship Attribution (AA), but the reasons for their discriminative value
were not fully understood. We identify subgroups of character n-grams that correspond to
linguistic aspects commonly claimed to be covered by these features: morphosyntax,
thematic content and style. We evaluate the predictiveness of each of these groups in two
AA settings: a single domain setting and a cross-domain setting where multiple topics are …
and cross-domain Authorship Attribution (AA), but the reasons for their discriminative value
were not fully understood. We identify subgroups of character n-grams that correspond to
linguistic aspects commonly claimed to be covered by these features: morphosyntax,
thematic content and style. We evaluate the predictiveness of each of these groups in two
AA settings: a single domain setting and a cross-domain setting where multiple topics are …
Abstract
Character n-grams have been identified as the most successful feature in both singledomain and cross-domain Authorship Attribution (AA), but the reasons for their discriminative value were not fully understood. We identify subgroups of character n-grams that correspond to linguistic aspects commonly claimed to be covered by these features: morphosyntax, thematic content and style. We evaluate the predictiveness of each of these groups in two AA settings: a single domain setting and a cross-domain setting where multiple topics are present. We demonstrate that character ngrams that capture information about affixes and punctuation account for almost all of the power of character n-grams as features. Our study contributes new insights into the use of n-grams for future AA work and other classification tasks.
aclanthology.org
Showing the best result for this search. See all results