ABSTRACT
A new system for spam e-mail annotation by end-users is presented. It is based on the recursive application of handwritten annotation rules by means of an inferential engine based on Logic Programming. Annotation rules allow the user to express nuanced considerations that depend on deobfuscation, word (non-)occurrence and structure of the message in a straightforward, human-readable syntax. We show that a sample collection of annotation rules are effective on a relevant corpus that we have assembled by collecting emails that have escaped detection by the industry-standard SpamAssassin filter. The system presented here is intended as a personal tool enforcing personalized annotation rules that would not be suitable for the general e-mail traffic.
- R. Baumgartner, S. Flesca, and G. Gottlob. Visual web information extraction with lixto. In P. M. G. Apers, P. Atzeni, S. Ceri, S. Paraboschi, K. Ramamohanarao, and R. T. Snodgrass, editors, VLDB, pages 119--128. Morgan Kaufmann, 2001. Google ScholarDigital Library
- G. V. Cormack and T. R. Lynam. Spam corpus creation for trec. In Proc. of the Second Conference on Email and Anti-Spam (CEAS 2005), 2005.Google Scholar
- G. V. Cormack and T. R. Lynam. Online supervised spam filter evaluation. ACM Trans. Inf. Syst., 25(3), 2007. Google ScholarDigital Library
- E. Denti, A. Omicini, and A. Ricci. Multi-paradigm java-prolog integration in tuprolog. Sci. Comput. Program., 57(2):217--250, 2005. Google ScholarDigital Library
- G. Fiumara, M. Marchi, R. Pagano, and A. Provetti. Rule-based spam e-mail annotation. In P. Hitzler and T. Lukasiewicz, editors, RR, volume 6333 of Lecture Notes in Computer Science, pages 231--234. Springer, 2010. Google ScholarDigital Library
- N. A. Fonseca, A. Srinivasan, F. M. A. Silva, and R. Camacho. Parallel ilp for distributed-memory architectures. Machine Learning, 74(3):257--279, 2009. Google ScholarDigital Library
- G. Gottlob, C. Koch, R. Baumgartner, M. Herzog, and S. Flesca. The lixto data extraction project - back and forth between theory and practice. In A. Deutsch, editor, PODS, pages 1--12. ACM, 2004. Google ScholarDigital Library
- P. Hayati and V. Potdar. Evaluation of spam detection and prevention frameworks for email and image spam: a state of art. In G. Kotsis, D. Taniar, E. Pardede, and I. K. Ibrahim, editors, iiWAS, pages 520--527. ACM, 2008. Google ScholarDigital Library
- P. Hayati, V. Potdar, A. Talevski, and W. Smyth. Rule-based on-the-fly web spambot detection using action strings. In CEAS, 2010.Google Scholar
- H. Lee and A. Y. Ng. Spam deobfuscation using a hidden markov model. In Proc. of the Second Conference on Email and Anti-Spam (CEAS 2005), 2005.Google Scholar
- S. Lee, I. Jeong, and S. Choi. Dynamically weighted hidden markov model for spam deobfuscation. In Proc. of IJCAI (IJCAI 2007), 2007. Google ScholarDigital Library
- V. W. Marek and M. Truszczyński. Stable models and an alternative logic programming paradigm. The Logic Programming Paradigm: a 25-Year Perspective, Springer-Verlag, pages 75--398, 1999.Google Scholar
- S. Muggleton, R. P. Otero, and S. Colton. Editorial: special issue on inductive logic programming. Machine Learning, 70(2--3):119--120, 2008. Google ScholarDigital Library
- D. Poole, A. Macworth, and R. Goebel. Computational Intelligence: a Logical Approach (2nd ed). Oxford University Press, 2007.Google Scholar
- M. Sergeant. Internet-level spam detection and spamassassin 2.50. In Spam Conference, 2003.Google Scholar
- C. J. van Rijsbergen. Information Retrieval (2nd ed.). Butterworths, London, 1979. Google ScholarDigital Library
- J. Wielemaker and A. Anjewierden. An architecture for making object-oriented systems available from prolog. In Proc. of the 12th Int'l Workshop on Logic Programming Environments (WLPE2002), 2002.Google Scholar
- W. Yih, R. McCann, and A. Kotcz. Improving spam filtering by detecting gray mail. In Proc. of the fourth Conference on Email and Anti-Spam (CEAS 2007), 2007.Google Scholar
Index Terms
- A rule-based system for end-user e-mail annotations
Recommendations
Rule-based Sam e-mail annotation
RR'10: Proceedings of the Fourth international conference on Web reasoning and rule systemsA new system for spam e-mail annotation by end-users is presented. It is based on the recursive application of hand-written annotation rules by means of an inferential engine based on Logic Programming. Annotation rules allow the user to express nuanced ...
A Rule-Based Mailing System for an Organization
WAIMW '06: Proceedings of the Seventh International Conference on Web-Age Information Management WorkshopsAn organization usually operates its own mailing system to provide each person with an e-mail address under its domain name. There may exist a number of mailing lists in use, each of which refers to a group of its subscribers. The problem entailed in ...
A Prediction Mechanism of Mail Retrieval Based on the User Behavior Analysis for Electronic Mail Database System
ISDA '08: Proceedings of the 2008 Eighth International Conference on Intelligent Systems Design and Applications - Volume 03Due to the frequent information exchange, electronic mail system is not only a mail exchange platform but also becomes an information center. Users usually save their mails permanently in mailboxes and retrieve them as need. In the database strategy, ...
Comments