Skip to main content

An Approach Based on Contrast Patterns for Bot Detection on Web Log Files

  • Conference paper
  • First Online:
Book cover Advances in Soft Computing (MICAI 2018)

Abstract

Nowadays, companies invest resources in detecting non-human accesses on their web traffics. Usually, non-human accesses are a few compared with the human accesses, which is considered as a class imbalance problem, and as a consequence, classifiers bias their classification results toward the human accesses obviating, in this way, the non-human accesses. In some classification problems, such as the non-human traffic detection, high accuracy is not only the desired quality, the model provided by the classifier should be understood by experts. For that, in this paper, we study the use of contrast pattern-based classifiers for building an understandable and accurate model for detecting non-human traffic on web log files. Our experiments over five databases show that the contrast pattern-based approach obtains significantly better AUC results than other state-of-the-art classifiers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    www.nic.mx.

  2. 2.

    www.ipts.com.

  3. 3.

    ipjingling.blogspot.com.

  4. 4.

    w3af.org.

References

  1. Dong, G.: Preliminaries. In: Dong, G., Bailey, J. (eds.) Contrast Data Mining: Concepts, Algorithms, and Applications. Data Mining and Knowledge Discovery Series, chap. 1, pp. 3–12. Chapman & Hall/CRC (2012)

    Google Scholar 

  2. Dong, G., Li, J.: Efficient mining of emerging patterns: discovering trends and differences. In: Proceedings of the fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 1999, pp. 43–52. ACM, New York (1999)

    Google Scholar 

  3. Dong, G., Zhang, X., Wong, L., Li, J.: CAEP: classification by aggregating emerging patterns. In: Arikawa, S., Furukawa, K. (eds.) DS 1999. LNCS (LNAI), vol. 1721, pp. 30–42. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-46846-3_4

    Chapter  Google Scholar 

  4. García, S., Fernández, A., Luengo, J., Herrera, F.: Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power. Inf. Sci. 180(10), 2044–2064 (2010)

    Article  Google Scholar 

  5. García-Borroto, M., Martínez-Trinidad, J.F., Carrasco-Ochoa, J.A.: Finding the best diversity generation procedures for mining contrast patterns. Expert Syst. Appl. 42(11), 4859–4866 (2015)

    Article  Google Scholar 

  6. García-Borroto, M., Martínez-Trinidad, J.F., Carrasco-Ochoa, J.A., Medina-Pérez, M.A., Ruiz-Shulcloper, J.: LCMine: an efficient algorithm for mining discriminative regularities and its application in supervised classification. Pattern Recogn. 43(9), 3025–3034 (2010)

    Article  Google Scholar 

  7. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. 11(1), 10–18 (2009)

    Article  Google Scholar 

  8. Hallam-Baker, P.M., Behlendorf, B.: W3C - Extended Log File Format. www.w3.org, https://www.w3.org/TR/WD-logfile.html

  9. Huang, J., Ling, C.X.: Using AUC and accuracy in evaluating learning algorithms. IEEE Trans. Knowl. Data Eng. 17(3), 299–310 (2005)

    Article  Google Scholar 

  10. Iqbal, M.S., Zulkernine, M., Jaafar, F., Gu, Y.: FCFraud: fighting click-fraud from the user side. In: 17th International Symposium on High Assurance Systems Engineering (HASE), pp. 157–164, January 2016

    Google Scholar 

  11. Knobbe, A., Crémilleux, B., Fürnkranz, J., Scholz, M.: From local patterns to global models: the LeGo approach to data mining. In: International Workshop from Local Patterns to Global Models (ECML 2008), pp. 1–16. LeGo (2008)

    Google Scholar 

  12. Loyola-González, O., Martínez-Trinidad, J.F., Carrasco-Ochoa, J.A., García-Borroto, M.: Study of the impact of resampling methods for contrast pattern based classifiers in imbalanced databases. Neurocomputing 175(Part B), 935–947 (2016)

    Article  Google Scholar 

  13. Loyola-González, O., Medina-Pérez, M.A., Martínez-Trinidad, J.F., Carrasco-Ochoa, J.A., Monroy, R., García-Borroto, M.: PBC4cip: a new contrast pattern-based classifier for class imbalance problems. Knowl.-Based Syst. 115, 100–109 (2017)

    Article  Google Scholar 

  14. Martens, D., Baesens, B., Gestel, T.V., Vanthienen, J.: Comprehensible credit scoring models using rule extraction from support vector machines. Eur. J. Oper. Res. 183(3), 1466–1476 (2007)

    Article  Google Scholar 

  15. Perera, K.S., Neupane, B., Faisal, M.A., Aung, Z., Woon, W.L.: A novel ensemble learning-based approach for click fraud detection in mobile advertising. In: Prasath, R., Kathirvalavakumar, T. (eds.) MIKE 2013. LNCS (LNAI), vol. 8284, pp. 370–382. Springer, Cham (2013). https://doi.org/10.1007/978-3-319-03844-5_38

    Chapter  Google Scholar 

  16. Soldo, F., Metwally, A.: Traffic anomaly detection based on the IP size distribution. In: International Conference on Computer Communications, pp. 2005–2013 (2012)

    Google Scholar 

  17. Taneja, M., Garg, K., Purwar, A., Sharma, S.: Prediction of click frauds in mobile advertising. In: Eighth International Conference on Contemporary Computing (IC3), pp. 162–166 (2015). https://doi.org/10.1109/IC3.2015.7346672

  18. Zhang, X., Dong, G.: Overview and analysis of contrast pattern based classification. In: Dong, G., Bailey, J. (eds.) Contrast Data Mining: Concepts, Algorithms, and Applications. Data Mining and Knowledge Discovery Series, chap. 11, pp. 151–170. Chapman & Hall/CRC (2012)

    Google Scholar 

  19. Zhang, X., Dong, G., Ramamohanarao, K.: Information-based classification by aggregating emerging patterns. In: Leung, K.S., Chan, L.-W., Meng, H. (eds.) IDEAL 2000. LNCS, vol. 1983, pp. 48–53. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-44491-2_8

    Chapter  Google Scholar 

Download references

Acknowledgment

This research was partly supported by Google incorporation under the APRU project “AI for Everyone”. Authors are thankful to Robinson Mas del Risco and Fernando Gómez Herrera for providing bot software, and for helping on bot execution throughout our experimentations, respectively.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Octavio Loyola-González .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Loyola-González, O., Monroy, R., Medina-Pérez, M.A., Cervantes, B., Grimaldo-Tijerina, J.E. (2018). An Approach Based on Contrast Patterns for Bot Detection on Web Log Files. In: Batyrshin, I., Martínez-Villaseñor, M., Ponce Espinosa, H. (eds) Advances in Soft Computing. MICAI 2018. Lecture Notes in Computer Science(), vol 11288. Springer, Cham. https://doi.org/10.1007/978-3-030-04491-6_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-04491-6_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-04490-9

  • Online ISBN: 978-3-030-04491-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics