Detection of Illegitimate Emails Using Boosting Algorithm

Nizamani, Sarwat; Memon, Nasrullah; Wiil, Uffe Kock

doi:10.1007/978-3-7091-0388-3_13

Detection of Illegitimate Emails Using Boosting Algorithm

Sarwat Nizamani^4,5,
Nasrullah Memon^4,6 &
Uffe Kock Wiil⁴

Chapter
First Online: 01 January 2011

1940 Accesses
1 Citations

Part of the book series: Lecture Notes in Social Networks ((LNSN))

Abstract

In this paper, we report on experiments to detect illegitimate emails using boosting algorithm. We call an email illegitimate if it is not useful for the receiver or for the society. We have divided the problem into two major areas of illegitimate email detection: suspicious email detection and spam email detection. For our desired task, we have applied a boosting technique. With the use of boosting we can achieve high accuracy of traditional classification algorithms. When using boosting one has to choose a suitable weak learner as well as the number of boosting iterations. In this paper, we propose suitable weak learners and parameter settings for the boosting algorithm for the desired task. We have initially analyzed the problem using base learners. Then we have applied boosting algorithm with suitable weak learners and parameter settings such as the number of boosting iterations. We propose a Naive Bayes classifier as a suitable weak learner for the boosting algorithm. It achieves maximum performance with very few boosting iterations.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Appavu, S., Rajaram, R.: Suspicious email detection via decision tree: A data mining approach. J. Comput. Inform. Technol. 15, 161–169 (2007)
Google Scholar
Appavu, S., Rajaram, R.: Association rule mining for suspicious email detection: A data mining approach. IEEE International Conference on Intelligence and Security Informatics, pp. 316–323. (2007)
Google Scholar
Appavu, S., Rajaram, R.: Learning to Classify threatening e-mail. Int. J. Artif. Intell. Soft Comput. 1, 39–51 (2008)
Article Google Scholar
Allanach, J., Tu, H., Singh, S., Willet, P., Pattipati, K.: Detecting, Tracking and Counteracting Terrorist Networks Via Hidden Markov Model. In: IEEE Aerospace Conference, pp. 3246–3257 (2004)
Google Scholar
Bylander, T., Tate, L.: Using Validation Sets to Avoid Overfitting in AdaBoost. In: 19th International Florida Artificial Intelligence Research Society Conference, pp. 544–549. (2006)
Google Scholar
Carnegie Mellom Universiy. http://www.cs.cmu.edu/\~enron/\AQPlease provide Publication year for reference “(6)".
Clayton, R.: Email traffic: A quantitative snapshot. In: CEAS 2007-Fourth Conference on Email and Anti-Spam, Mountain View, California USA (2007)
Google Scholar
Ferris Research Report: Spam Control: Problems and opportunities”, http://www.ferris.com. Accessed on 25-08-2010
Freund, Y., Schapire, R.E.: Experiments with a New Boosting Algorithm. In: Machine Learning: 13th International Conference on Machine Learning, pp. 148–156. (1996)
Google Scholar
Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55, 119–139 (1997)
Article MATH MathSciNet Google Scholar
Fette, I., Sadeh, N., Tomasic, A.: Learning to Detect Phishing Emails. Technical Report. Carnegie Mellon Cyber Laboratory (2006)
Google Scholar
Federal Energy Regulatory Commission. A report downloaded from http://www.ferc.gov/. Accessed on 20-08-2010
Graham, P.: A plan for Spam. http://www.paulgraham.com/spam.html. An Internet article. Accessed on 23-08-2010
Joachims, T: A Statistical Learning Model of Text Classification for Support Vector Machines. In: 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. (2001)
Google Scholar
Lim, M.J.H.: Computational Intelligence in Email Traffic Analysis. Ph.D. Dissertation, University of Tasmania. (2008)
Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Ian H. Witten, I. H.: The WEKA Data Mining Software: An Update; SIGKDD Explorations, vol. 11(1). (2009)
Google Scholar
McCallum, A., Nigam, K.: A Comparison of Event Models for Naive Bayes Text Classification. Technical Report. Workshop on Learning for Text Categorization, pp. 41–48. (1998)
Google Scholar
Meir, R., Rastch, G.: An Introduction to Boosting and Leveraging. Advanced lectures on Machine Learning, pp. 118–183. Springer, New York (2003)
Google Scholar
Metsis, V., Androutsopoulos, I., Paliouras, G.: Spam Filtering with Naive Bayes – Which Naive Bayes. In: 3rd Conference on Email and Anti-Spam, pp. 1702–1761. (2006)
Google Scholar
National Commission on Terrorist Attacks Upon the United States. http://govinfo.library.unt.edu/911/report/911Report.pdf, (2004). Accessed on 25-08-2010
Quinlan, J.R.: Induction of Decision Trees. J. Mach. Learn. 1, 81–106 (1986)
Google Scholar
Quinlan, J.R.: C4.5: Programs for machine learning. Machine Learning, vol. 16, pp. 235–240. Springer, Berlin (1993)
Google Scholar
Renuka, D.K., Hamsapriya, T.: Email Classification for Spam Detection using Word Stemming. Int. J. Comput. Appl. 1, 45–47 (2010)
Google Scholar
pc]Please provide Publication year for reference “(25)".Schlimmer, J.C., Fisher, D.: A case study of incremental concept induction. In: 5th National Conference on Artificial Intelligence, pp. 496–501. (1986)
Google Scholar
Spambase dataset. Downloaded from UCI Machine Learning Repository. http://archive.ics.uci.edu/ml/datasets/Spambase
Shawkat, A., S., Xiang, Y.: Spam classification using adaptive boosting algorithm. In: IEEE 6th Conference on Computer and Information Science, pp. 972–976. (2007)
Google Scholar
Tan, P.N., Michael Steinbach, M., Kumar, V.: Introduction to Data Mining. pp. 285–290. (2006)
Google Scholar
Utgoff, P.E.: ID5: An incremental ID3. In: 5th International Conference on Machine Learning, pp. 107–120. (1988)
Google Scholar
Utgoff, P.E.: Incremental induction of decision trees. Mach. Learn. 4, 161–186. (1989)
Article Google Scholar
Utgoff, P.E., Berkman, N.C., Clouse, J.A.: Decision tree induction based on efficient tree restructuring. Mach. Learn. 29, 5–44 (1997)
Article MATH Google Scholar
Vapnik, V.: The Nature of Statistical Theory. Springer, New York (1995)
Book MATH Google Scholar
Weber, R., Waldstein, I., Deshpande, A., Proctor, M.J.: Integrated approach to detect inconspicuous contents. LNAI. 304–315. (2005)
Google Scholar
Youn, S., Dennis, M.: A comparative study for email classification. Advances and Innovations in Systems, Computing Sciences and Software Engineering, pp. 387–391. Springer, Berlin (2007)
Google Scholar
Youn, S., Dennis, M.: Efficient spam email filtering using an adaptive ontology. In: IEEE 4th International Conference on Information Technology: New Generations (ITNG), pp. 249–254. (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

Counterterrorism Research Lab, The Maersk Mc-Kinney Moller Institute, University of Southern Denmark, Odense, Denmark
Sarwat Nizamani, Nasrullah Memon & Uffe Kock Wiil
University of Sindh, Jamshoro, Pakistan
Sarwat Nizamani
Hellenic American University, Manchester, NH, USA
Nasrullah Memon

Authors

Sarwat Nizamani
View author publications
You can also search for this author in PubMed Google Scholar
Nasrullah Memon
View author publications
You can also search for this author in PubMed Google Scholar
Uffe Kock Wiil
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sarwat Nizamani .

Editor information

Editors and Affiliations

The Maersk McKinney Moller Institute, University of Southern Denmark, Campusvej 55, 5230, Odense, Denmark
Uffe Kock Wiil

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Nizamani, S., Memon, N., Wiil, U.K. (2011). Detection of Illegitimate Emails Using Boosting Algorithm. In: Wiil, U.K. (eds) Counterterrorism and Open Source Intelligence. Lecture Notes in Social Networks. Springer, Vienna. https://doi.org/10.1007/978-3-7091-0388-3_13

Download citation

DOI: https://doi.org/10.1007/978-3-7091-0388-3_13
Published: 26 May 2011
Publisher Name: Springer, Vienna
Print ISBN: 978-3-7091-0387-6
Online ISBN: 978-3-7091-0388-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics