The Impact of the Mode of Data Representation for the Result Quality of the Detection and Filtering of Spam

The Impact of the Mode of Data Representation for the Result Quality of the Detection and Filtering of Spam

Reda Mohamed Hamou, Abdelmalek Amine
Copyright: © 2013 |Volume: 3 |Issue: 1 |Pages: 17
ISSN: 2155-6377|EISSN: 2155-6385|EISBN13: 9781466630581|DOI: 10.4018/ijirr.2013010103
Cite Article Cite Article

MLA

Hamou, Reda Mohamed, and Abdelmalek Amine. "The Impact of the Mode of Data Representation for the Result Quality of the Detection and Filtering of Spam." IJIRR vol.3, no.1 2013: pp.43-59. http://doi.org/10.4018/ijirr.2013010103

APA

Hamou, R. M. & Amine, A. (2013). The Impact of the Mode of Data Representation for the Result Quality of the Detection and Filtering of Spam. International Journal of Information Retrieval Research (IJIRR), 3(1), 43-59. http://doi.org/10.4018/ijirr.2013010103

Chicago

Hamou, Reda Mohamed, and Abdelmalek Amine. "The Impact of the Mode of Data Representation for the Result Quality of the Detection and Filtering of Spam," International Journal of Information Retrieval Research (IJIRR) 3, no.1: 43-59. http://doi.org/10.4018/ijirr.2013010103

Export Reference

Mendeley
Favorite Full-Issue Download

Abstract

Spam is now seized of the Internet in phenomenal proportions since it high represents a percentage of total emails exchanged on the Internet. In the fight against spam, the authors are interested in this article to develop a hybrid algorithm based primarily on the probabilistic model in this case Naïve Bayes for weighting the terms of the matrix term -category and second place used an algorithm of unsupervised learning (K-means) to filter two classes namely spam and ham. To determine the sensitive parameters that improve the classifications the authors are interested in studying the content of the messages by using a representation of messages by the n-gram words and characters independent of languages (because a message may be received in any language) to later decide what representation opt to get a good classification. The authors have chosen several metrics evaluation to validate their results.

Request Access

You do not own this content. Please login to recommend this title to your institution's librarian or purchase it from the IGI Global bookstore.