Publication Type
Conference Proceeding Article
Version
publishedVersion
Publication Date
11-2012
Abstract
The recent boom of weblogs and social media has attached increasing importance to the identification of suspicious users with unusual behavior, such as spammers or fraudulent reviewers. A typical spamming strategy is to employ multiple dummy accounts to collectively promote a target, be it a URL or a product. Consequently, these suspicious accounts exhibit certain coherent anomalous behavior identifiable as a collection. In this paper, we propose the concept of Coherent Anomaly Collection (CAC) to capture this kind of collections, and put forward an efficient algorithm to simultaneously find the top-K disjoint CACs together with their anomalous behavior patterns. Compared with existing approaches, our new algorithm can find disjoint anomaly collections with coherent extreme behavior without having to specify either their number or sizes. Results on real Twitter data show that our approach discovers meaningful and informative hashtag spammer groups of various sizes which are hard to detect by clustering-based methods.
Keywords
Anomaly/outlier detection, Anomaly collection/cluster
Discipline
Computer Sciences | Databases and Information Systems
Publication
CIKM'12: Proceedings of the 21st ACM International Conference on Information and Knowledge Management: October 29 - November 2, 2012, Maui, Hawaii
First Page
1557
Last Page
1561
ISBN
9781450311564
Identifier
10.1145/2396761.2398472
Publisher
ACM
City or Country
New York
Citation
DAI, Hanbo; ZHU, Feida; Ee-peng LIM; and Hwee Hwa PANG.
Mining coherent anomaly collections on web data. (2012). CIKM'12: Proceedings of the 21st ACM International Conference on Information and Knowledge Management: October 29 - November 2, 2012, Maui, Hawaii. 1557-1561.
Available at: https://ink.library.smu.edu.sg/sis_research/2869
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
http://doi.org/10.1145/2396761.2398472
- Citations
- Citation Indexes: 2
- Usage
- Downloads: 40
- Abstract Views: 11
- Captures
- Readers: 27