Skip to main content
Log in

The hybrid ant colony optimization and ensemble method for solving the data stream e-mail foldering problem

  • S.I. : 2018 India Intl. Congress on Computational Intelligence
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

The e-mail foldering problem is a special classification problem. It concerns a situation where e-mail users create new folders and, at the same time, stop using some of the folders created in the past. Additionally, messages arrive in the system at different time stamps. This article proposes a novel approach to ant colony optimization adapted to data stream analysis. The article is related to the revision of the ant colony optimization algorithm in the e-mail foldering problem and the proposition of a new solution adapted to the data stream. The goal of this work is to allow the classification of messages arriving at the system as data packages; however, due to the large number of decision classes (folders in the inbox), successive packages lead to a large concept drift. To assure the stability of the algorithm, an approach based on the memory being represented as a pheromone trail is introduced. This concept is known from the ant colony optimization methods. At the same time, multiple numbers of classifiers (similar to an ensemble method) are included. The proposed approach was tested on real-world data from the Enron e-mail dataset. An analysis of the two proposed methods related to the data stream was proposed. Both methods were compared with the methods used in the literature. The results achieved, in terms of the accuracy as well as the stability, confirm that (according to a statistical analysis) the proposed solutions are capable of better classifying e-mail messages derived from the system as data packages.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Babcock B, Babu S, Datar M, Motwani R, Widom J (2002) Models and issues in data stream systems. In: Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems. ACM, pp 1–16

  2. Bekkerman R, McCallum A, Huang G (2004) Automatic categorization of email into folders: benchmark experiments on Enron and SRI corpora. Center for Intelligent Information Retrieval, Technical Report IR

  3. Boryczka U, Kozak J (2012) Ant colony decision forest meta-ensemble. In: International conference on computational collective intelligence. Springer Berlin Heidelberg, pp 473–482

  4. Boryczka U, Kozak J, Skinderowicz R (2013) Heterarchy in constructing decision trees–parallel ACDT, In: Proceedings of the transactions on computational collective intelligence , vol 10, pp 177–192

  5. Boryczka U, Probierz B, Kozak J (2014) An ant colony optimization algorithm for an automatic categorization of emails. In: 6th international conference computational collective intelligence. Technologies and applications, ICCCI 2014, Seoul, Korea, September 24–26, 2014, pp 583–592

  6. Boryczka U, Probierz B, Kozak J (2015) Adaptive ant colony decision forest in automatic categorization of emails. In: Asian conference on intelligent information and database systems. Springer, pp 451–461

  7. Boryczka U, Probierz B, Kozak J (2016) Automatic categorization of email into folders by ant colony decision tree and social networks. In: Intelligent decision technologies 2016. Springer International Publishing, pp 71–81

  8. Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140

    MATH  Google Scholar 

  9. Breiman L (2001) Random forests. Mach Learn 45:5–32

    Article  MATH  Google Scholar 

  10. Brzeziński D, Stefanowski J (2011) Accuracy updated ensemble for data streams with concept drift. In: International conference on hybrid artificial intelligence systems. Springer, Berlin, pp 155–163

  11. Bühlmann P, Hothorn T (2007) Boosting algorithms: regularization, prediction and model fitting. Stat Sci 22(4):477–505

    Article  MathSciNet  MATH  Google Scholar 

  12. Cheung DW, Wong C, Han J, Ng VT, (1996) Maintenance of discovered association rules in large databases: an incremental updating techniques. In: icde. IEEE, 106

  13. Dorigo M, Birattari M, Stützle T (2006) Ant colony optimization - artificial ants as a computational intelligence technique. IEEE Comput Intell Mag 1:28–39

    Article  Google Scholar 

  14. Dorigo M, Di Caro G, Gambardella LM (1999) Ant algorithms for distributed discrete optimization. Artif Life 5(2):137–172

    Article  Google Scholar 

  15. Dorigo M, Gambardella LM (1997) Ant colony system: a cooperative learning approach to the traveling salesman problem. IEEE Trans Evol Comput 1:53–66

    Article  Google Scholar 

  16. Dorigo M, Stützle T (2004) Ant colony optimization. MIT Press, Cambridge

    Book  MATH  Google Scholar 

  17. Fahy C, Yang S (2017) Dynamic stream clustering using ants. In: Advances in computational intelligence systems. Springer, pp 495–508

  18. Fahy C, Yang S, Gongora M (2017) Finding multi-density clusters in non-stationary data streams using an ant colony with adaptive parameters. In: 2017 IEEE congress on evolutionary computation (CEC). IEEE, pp 673–680

  19. Fahy C, Yang S, Gongora M (2018) Ant colony stream clustering: a fast density clustering algorithm for dynamic data streams. IEEE Trans Cybernet 49:2215–2228

    Article  Google Scholar 

  20. Friedman J, Hastie T, Tibshirani R (2000) Additive logistic regression: a statistical view of boosting. Ann Stat 28(2):337–407

    Article  MathSciNet  MATH  Google Scholar 

  21. Gomes HM, Barddal JP, Enembreck F, Bifet A (2017) A survey on ensemble learning for data stream classification. ACM Comput Surv (CSUR) 50(2):23

    Article  Google Scholar 

  22. Grasse PP (1959) La reconstruction du nid et les coordinations inter-individuelles chez bellicositermes natalensis et cubitermes sp. La theorie de la stigmerie. Insects Soc 6:41–80

    Article  Google Scholar 

  23. Grasse PP (1984) Termitologia, vol II. Masson, Paris

    Google Scholar 

  24. Hulten G, Spencer L, Domingos P (2001) Mining time-changing data streams. In: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 97–106

  25. Jiang N, Gruenwald L (2006) Research issues in data stream association rule mining. ACM Sigmod Record 35(1):14–19

    Article  Google Scholar 

  26. Kozak J (2018) Decision tree and ensemble learning based on ant colony optimization. Springer, Berlin

    Google Scholar 

  27. Kozak J, Boryczka U (2013) Dynamic version of the acdt/acdf algorithm for h-bond dataset analysis. In: ICCCI. pp 701–710

  28. Kozak J, Boryczka U (2015) Multiple boosting in the ant colony decision forest meta-classifier. Knowl-Based Syst 75:141–151

    Article  Google Scholar 

  29. Kozak J, Boryczka U (2016) Collective data mining in the ant colony decision tree approach. Inf Sci 372:126–147

    Article  Google Scholar 

  30. Kozak J, Juszczuk P (2017) Association ACDT as a tool for discovering the financial data rules. In: IEEE international conference on innovations in intelligent systems and applications, INISTA 2017, Gdynia, Poland, July 3–5, 2017, pp 241–246

  31. Kozak J, Juszczuk P (2018) The ACDF algorithm in the stream data analysis for the bank telemarketing campaign, IEEE 5th international conference on soft computing and machine intelligence, ISCMI 2018. Nairobi, Kenya, pp 49–53

  32. Lawal IA (2019) Incremental svm learning. In: Learning from data streams in evolving environments. Springer, pp 279–296

  33. Masmoudi N, Azzag H, Lebbah M, Bertelle C, Jemaa MB (2016) Cl-AntInc algorithm for clustering binary data streams using the ants behavior. Procedia Comput Sci 96:187–196

    Article  Google Scholar 

  34. Muthukrishnan S (2005) Data streams: algorithms and applications. Found Trends® Theoret Comput Sci 1(2):117–236

    Article  MathSciNet  MATH  Google Scholar 

  35. Otero FEB, Freitas AA, Johnson CG (2012) Inducing decision trees with an ant colony optimization algorithm. Appl Soft Comput 12(11):3615–3626

    Article  Google Scholar 

  36. Schapire RE (1990) The strength of weak learnability. Mach Learn 5:197–227

    Google Scholar 

  37. Shetty J, Adibi J (2004) The enron email dataset database schema and brief statistical report. Information sciences institute technical report. Univ South Calif 4(1):120–128

    Google Scholar 

  38. Singh A, Batra S (2018) Ensemble based spam detection in social IoT using probabilistic data structures. Future Gener Comput Syst 81:359–371

    Article  Google Scholar 

  39. Sousa R, Gama J (2018) Multi-label classification from high-speed data streams with adaptive model rules and random rules. In: Progress in artificial intelligence, pp 1–11

  40. Street WN, Kim Y (2001) A streaming ensemble algorithm (sea) for large-scale classification. In: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 377–382

  41. Strobl C, Boulesteix AL, Zeileis A, Hothorn T (2007) Bias in random forest variable importance measures: Illustrations, sources and a solution. BMC Bioinform 8(1):25

    Article  Google Scholar 

  42. Stützle T, Hoos H (1997) The MAX–MIN ant system and local search for the traveling salesman problem. In: Baeck, T, Michalewicz Z, Yao X (eds) Proceedings of IEEE–ICEC–EPS’97, IEEE international conference on evolutionary computation and evolutionary programming conference. IEEE Press, pp 309–314

  43. Surjandari I, Dhini A, Rachman A, Novita R (2015) Estimation of dry docking duration using a numerical ant colony decision tree. Int J Appl Manag Sci 7(2):164–175

    Article  Google Scholar 

  44. Wang H, Fan W, Yu PS, Han J (2003) Mining concept-drifting data streams using ensemble classifiers. In: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining. AcM, pp 226–235

Download references

Acknowledgements

This paper is co-funded by the National Science Centre, Poland: 2017/01/X/ST6/01477.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Przemysław Juszczuk.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kozak, J., Juszczuk, P. & Probierz, B. The hybrid ant colony optimization and ensemble method for solving the data stream e-mail foldering problem. Neural Comput & Applic 32, 15429–15443 (2020). https://doi.org/10.1007/s00521-019-04672-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-019-04672-1

Keywords

Navigation