Artificial immune system for illicit content identification in social media
Journal of the American Society for Information Science and Technology, 2012•Wiley Online Library
Social media is frequently used as a platform for the exchange of information and opinions
as well as propaganda dissemination. But online content can be misused for the distribution
of illicit information, such as violent postings in web forums. Illicit content is highly distributed
in social media, while non‐illicit content is unspecific and topically diverse. It is costly and
time consuming to label a large amount of illicit content (positive examples) and non‐illicit
content (negative examples) to train classification systems. Nevertheless, it is relatively easy …
as well as propaganda dissemination. But online content can be misused for the distribution
of illicit information, such as violent postings in web forums. Illicit content is highly distributed
in social media, while non‐illicit content is unspecific and topically diverse. It is costly and
time consuming to label a large amount of illicit content (positive examples) and non‐illicit
content (negative examples) to train classification systems. Nevertheless, it is relatively easy …
Abstract
Social media is frequently used as a platform for the exchange of information and opinions as well as propaganda dissemination. But online content can be misused for the distribution of illicit information, such as violent postings in web forums. Illicit content is highly distributed in social media, while non‐illicit content is unspecific and topically diverse. It is costly and time consuming to label a large amount of illicit content (positive examples) and non‐illicit content (negative examples) to train classification systems. Nevertheless, it is relatively easy to obtain large volumes of unlabeled content in social media. In this article, an artificial immune system‐based technique is presented to address the difficulties in the illicit content identification in social media. Inspired by the positive selection principle in the immune system, we designed a novel labeling heuristic based on partially supervised learning to extract high‐quality positive and negative examples from unlabeled datasets. The empirical evaluation results from two large hate group web forums suggest that our proposed approach generally outperforms the benchmark techniques and exhibits more stable performance.

Showing the best result for this search. See all results