Identification of Data Breaches from Public Forums

Adnan, Md. Akhtaruzzaman; Younus, Atika; Kawser, Md. Harun Al; Adhikary, Natasha; Habib, Ahsan; Haque, Rakib Ul

doi:10.1007/978-3-031-17510-7_4

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13195))

Included in the following conference series:

International Conference on Information Technology and Communications Security

375 Accesses

Abstract

Adversaries initiate their cyberattacks towards different entities such as healthcare or business institutes, and a successful attack causes data breaches. They publish their success stories in public forums for ranking purposes. The victim entities can be informed early about the data breach event if these forums are analyzed properly. Though few studies already focused on this sector, their data sets and codes are not public. Most importantly, the sources of their data sets do not exist today, which makes their novelty unclear and unreliable. To address and handle the above concerns, this study reinvestigates this domain with Machine Learning, Ensemble Learning, and Deep Learning. A web crawler is developed for downloading the dataset from the public forum of Nulled website. Feature extraction is done using TF-IDF and GloVe. Performance analysis showed that SVM achieved at most 90.80% accuracy with linear kernel. Implementations are published with a GitHub link.

This research work is supported by University of Asia Pacific.

A. Younus, M. H. Al Kawser and N. Adhikary—All of them contributed equally.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A Comparison of Classical Versus Deep Learning Techniques for Abusive Content Detection on Social Media Sites

Machine Learning-Based Phishing Detection Using URL Features: A Comprehensive Review

Staying ahead of phishers: a review of recent advances and emerging methodologies in phishing detection

Article Open access 20 December 2024

References

Keshta, I., Odeh, A.: Security and privacy of electronic health records: concerns and challenges. Egypt. Inf. J. 22(2), 177–183 (2021)
Google Scholar
Ong, R., Sabapathy, S.: Hong Kong’s data breach notification scheme: from the stakeholders’ perspectives. Comput. Law Sec. Rev. 42, 105579 (2021)
Article Google Scholar
D’Arcy, J., Adjerid, I., Angst, C.M., Glavas, A.: Too good to be true: firm social performance and the risk of data breach. Inf. Syst. Res. 31(4), 1200–1223 (2020)
Article Google Scholar
Fang, Y., Guo, Y., Huang, C., Liu, L.: Analyzing and identifying data breaches in underground forums. IEEE Access 7, 48770–48777 (2019)
Article Google Scholar
Haque, R.U., et al.: Privacy-preserving K-nearest neighbors training over blockchain-based encrypted health data. Electronics 9(12), 2096 (2020)
Article Google Scholar
Haque, R.U., Hasan, A.S.M.T.: Privacy-preserving multivariant regression analysis over blockchain-based encrypted IoMT data. In: Maleh, Y., Baddi, Y., Alazab, M., Tawalbeh, L., Romdhani, I. (eds.) Artificial Intelligence and Blockchain for Future Cybersecurity Applications. SBD, vol. 90, pp. 45–59. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-74575-2_3
Chapter Google Scholar
Haque, R.U., Hasan, A.S.M.T., Nishat, T., Adnan, M.A.: Privacy-preserving k-means clustering over blockchain-based encrypted IoMT data. In: Maleh, Y., Tawalbeh, L., Motahhir, S., Hafid, A.S. (eds.) Advances in Blockchain Technology for Cyber Physical Systems. IT, pp. 109–123. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-93646-4_5
Chapter Google Scholar
Haque, R.U., Hasan, A.S.M.T.: Overview of blockchain-based privacy preserving machine learning for IoMT. In: Baddi, Y., Gahi, Y., Maleh, Y., Alazab, M., Tawalbeh, L. (eds.) Big Data Intelligence for Smart Applications. SCI, vol. 994, pp. 265–278. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-87954-9_12
Chapter Google Scholar
Papadimitriou, P., Garcia-Molina, H.: Data leakage detection. IEEE Trans. Knowl. Data Eng. 23(1), 51–63 (2010)
Article Google Scholar
Kale, S.A., Kulkarni, S.V.: Data leakage detection. Int. J. Adv. Res. Comput. Commun. Eng. 1(9), 668–678 (2012)
Google Scholar
Lu, M., Chang, P., Li, J., Fan, T., Zhu, W.: Data leakage prevention for resource limited device, U.S. Patent 8 286 253 B1, 9 October 2012
Google Scholar
Brown, T.G., Mann, B.S.: System and method for data leakage prevention, U.S. Patent 8 578 504 B2, 5 November 2013
Google Scholar
Katz, G., Elovici, Y., Shapira, B.: CoBan: a context based model for data leakage prevention. Inf. Sci. 262, 137–158 (2014)
Article MathSciNet Google Scholar
Onaolapo, J., Mariconti, E., Stringhini, G.: What happens after you are PWND: understanding the use of leaked Webmail credentials in the wild. In: Proceedings of the Internet Measurement Conference, pp. 65–79 (2016)
Google Scholar
Jaeger, D., Graupner, H., Sapegin, A., Cheng, F., Meinel, C.: Gathering and analyzing identity leaks for security awareness. In: Mjølsnes, S.F. (ed.) PASSWORDS 2014. LNCS, vol. 9393, pp. 102–115. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24192-0_7
Chapter Google Scholar
Thomas, K., et al.: Data breaches, phishing, or malware?: understanding the risks of stolen credentials. In: Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, pp. 1421–1434 (2017)
Google Scholar
Shu, X., Tian, K., Ciambrone, A., Yao, D.: Breaking the target: an analysis of target data breach and lessons learned. (2017). https://arxiv.org/abs/1701.04940
Butler, B., Wardman, B., Pratt, N.: REAPER: an automated, scalable solution for mass credential harvesting and OSINT. In: Proceedings APWG Symposium on Electronic Crime Research (eCrime), pp. 1–10 (2016)
Google Scholar
Li, W., Yin, J., Chen, H.: Targeting key data breach services in underground supply chain. In: Proceedings of the IEEE Conference Intelligence and Security Informatics (ISI), pp. 322–324 (2016)
Google Scholar
Overdorf, R., Troncoso, C., Greenstadt, R., McCoy, D.: Under the underground: predicting private interactions in underground forums (2018). https://arxiv.org/abs/1805.04494
Zhang, Y., Fan, Y., Hou, S., Liu, J., Ye, Y., Bourlai, T.: iDetector: automate underground forum analysis based on heterogeneous information network. In: Proceedings IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp. 1071–1078 (2018)
Google Scholar
Portnoff, R.S., et al.: Tools for automated analysis of cybercriminal markets. In: Proceedings 26th International Conference World Wide Web Steering Committee, pp. 657–666 (2017)
Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar
Ramage, D., Hall, D., Nallapati, R., Manning, C.D.: Labeled LDA: a supervised topic model for credit attribution in multi-labeled corpora. In: Proceedings of the Conference Empirical Methods Natural Lang. Processing, Association for Computational Linguistics, vol. 1, pp. 248–256 (2009)
Google Scholar
Tasci, S., Gungor, T.: LDA-based keyword selection in text categorization. In: Proceedings of the 24th International Symposium on Computer and Information Sciences (ISCIS), pp. 230–235 (2009)
Google Scholar
Cui, L., Meng, F., Shi, Y., Li, M., Liu, A.: A hierarchy method based on LDA and SVM for news classification. In: Proceedings of the IEEE International Conference Data Mining Workshop (ICDMW), pp. 60–64 (2014)
Google Scholar
Wei, Y., Wang, W., Wang, B., Yang, B., Liu, Y.: A method for topic classification of web pages using LDA-SVM model. In: Deng, Z. (ed.) CIAC 2017. LNEE, vol. 458, pp. 589–596. Springer, Singapore (2018). https://doi.org/10.1007/978-981-10-6445-6_64
Chapter Google Scholar
Quercia, D., Askham, H., Crowcroft, J.: TweetLDA: supervised topic classification and link prediction in twitter. In: Proceedings of the 4th Annual ACM Web Science Conference, pp. 247–250 (2012)
Google Scholar
Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Proceedings of the 2014 conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)
Google Scholar
Dey, A., Jenamani, M., Thakkar, J.J.: Lexical TF-IDF: an n-gram feature space for cross-domain classification of sentiment reviews. In: Shankar, B.U., Ghosh, K., Mandal, D.P., Ray, S.S., Zhang, D., Pal, S.K. (eds.) PReMI 2017. LNCS, vol. 10597, pp. 380–386. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-69900-4_48
Chapter Google Scholar
Nulled. https://www.Nulled.to/. Accessed 14 Sep 2021

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, University of Asia Pacific, Dhaka, 1205, Bangladesh
Md. Akhtaruzzaman Adnan, Atika Younus, Md. Harun Al Kawser & Natasha Adhikary
Indetechs Software Limited, House: 31, Road: 20, Block: K, Banani, 1212, Dhaka, Bangladesh
Ahsan Habib
School of Computer Science & Technology, University of Chinese Academy of Sciences, Shijingshan, Beijing, 100049, China
Rakib Ul Haque

Authors

Md. Akhtaruzzaman Adnan
View author publications
You can also search for this author in PubMed Google Scholar
Atika Younus
View author publications
You can also search for this author in PubMed Google Scholar
Md. Harun Al Kawser
View author publications
You can also search for this author in PubMed Google Scholar
Natasha Adhikary
View author publications
You can also search for this author in PubMed Google Scholar
Ahsan Habib
View author publications
You can also search for this author in PubMed Google Scholar
Rakib Ul Haque
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rakib Ul Haque .

Editor information

Editors and Affiliations

University of Luxembourg, Esch-sur-Alzette, Luxembourg
Peter Y.A. Ryan
Bucharest University of Economic Studies, Bucharest, Romania
Cristian Toma

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Adnan, M.A., Younus, A., Kawser, M.H.A., Adhikary, N., Habib, A., Haque, R.U. (2022). Identification of Data Breaches from Public Forums. In: Ryan, P.Y., Toma, C. (eds) Innovative Security Solutions for Information Technology and Communications. SecITC 2021. Lecture Notes in Computer Science, vol 13195. Springer, Cham. https://doi.org/10.1007/978-3-031-17510-7_4

Download citation

DOI: https://doi.org/10.1007/978-3-031-17510-7_4
Published: 13 October 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-17509-1
Online ISBN: 978-3-031-17510-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Identification of Data Breaches from Public Forums