skip to main content
10.1145/3378936.3378981acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicsimConference Proceedingsconference-collections
research-article

Big Data Analyses of ZeroNet Sites for Exploring the New Generation DarkWeb

Published: 07 March 2020 Publication History

Abstract

ZeroNet is a new generation typical dark web, which uses the Bitcoin encryption algorithm and BitTorrent technology to build a distributed and censored-resistant communication network. Based on our cumulative studies on the onion router, we present a big data analyses framework for automated multi-categorization of ZeroNet websites to facilitate analyst situational awareness of new content that emerges from this dynamic landscape. Over the last two years, our team has developed a distributed crawling infrastructure called ZeroCrawler that automatically crawls and updates ZeroNet websites in realtime. It stores data into a research repository designed to help better understand ZeroNet's hidden service ecosystem. The analysis component of our framework is called Automated Multi-Categorization Labeling (AMCL), which introduces a three-stage thematic labeling strategy: (1) it learns descriptive and discriminative keywords for different categories, and (2) get a probability distribution of the keywords for different categories, and then (3) uses these terms to map ZeroNet website content to several labels. We also present empirical results of AMCL and our ongoing experimentation with it, as we have gained experience applying it to the entirety of our ZeroNet repository, now over 3000 indexed websites. The experimental results show that AMCL can discover categories on previously unlabeled websites, and we discuss applications of AMCL in supporting various analyses and investigations of the ZeroNet websites.

References

[1]
Boswell, W. (2016). How to mine the invisible web: The ultimate guide.
[2]
Noor, U., Rashid, Z., & Rauf, A. (2011). A survey of automatic deep web classification techniques. International Journal of Computer Applications, 19(6), 43--50.
[3]
Christin, N. (2013, May). Traveling the Silk Road: A measurement analysis of a large anonymous online marketplace. In Proceedings of the 22nd international conference on World Wide Web (pp. 213--224). ACM.
[4]
Soska, K., & Christin, N. (2015). Measuring the longitudinal evolution of the online anonymous marketplace ecosystem. In 24th {USENIX} Security Symposium ({USENIX} Security 15)(pp. 33--48).
[5]
Yoon, C., Kim, K., Kim, Y., Shin, S., & Son, S. (2019, May). Doppelgängers on the Dark Web: A Large-scale Assessment on Phishing Hidden Web Services. In The World Wide Web Conference (pp. 2225--2235). ACM.
[6]
Biryukov, A., Pustogarov, I., & Weinmann, R. P. (2013, May). Trawling for tor hidden services: Detection, measurement, deanonymization. In 2013 IEEE Symposium on Security and Privacy (pp. 80--94). IEEE.
[7]
Biryukov, A., Pustogarov, I., Thill, F., & Weinmann, R. P. (2014, June). Content and popularity analysis of Tor hidden services. In 2014 IEEE 34th International Conference on Distributed Computing Systems Workshops (ICDCSW) (pp. 188--193). IEEE.
[8]
Owen, G., & Savage, N. (2016). Empirical analysis of Tor hidden services. IET Information Security, 10(3), 113--118.
[9]
Niu, F., Zhang, C., Ré, C., & Shavlik, J. W. (2012). DeepDive: Web-scale Knowledge-base Construction using Statistical Learning and Inference. VLDS, 12, 25--28.
[10]
Christin, N. (2013, May). Traveling the Silk Road: A measurement analysis of a large anonymous online marketplace. In Proceedings of the 22nd international conference on World Wide Web (pp. 213--224). ACM.
[11]
Carter, K. M., Idika, N., & Streilein, W. W. (2013, May). Probabilistic threat propagation for malicious activity detection. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 2940--2944). IEEE.
[12]
Xu, H., Jiang, A. X., Sinha, A., Rabinovich, Z., Dughmi, S., & Tambe, M. (2015, June). Security games with information leakage: Modeling and computation. In Twenty-Fourth International Joint Conference on Artificial Intelligence.
[13]
Winterrose, M. L., Carter, K. M., Wagner, N., & Streilein, W. W. (2014). Adaptive attacker strategy development against moving target cyber defenses. arXiv preprint arXiv:1407.8540.
[14]
Foulds, J., Geumlek, J., Welling, M., & Chaudhuri, K. (2016). On the theory and practice of privacy-preserving Bayesian data analysis. arXiv preprint arXiv:1603.07294.
[15]
Yang, Y., Yang, L., Yang, M., Yu, H., Zhu, G., Chen, Z., & Chen, L. (2019, May). Dark web forum correlation analysis research. In 2019 IEEE 8th Joint International Information Technology and Artificial Intelligence Conference (ITAIC) (pp. 1216--1220). IEEE.
[16]
Burago, I., & Lowd, D. (2015, October). Automated attacks on compression-based classifiers. In Proceedings of the 8th ACM Workshop on Artificial Intelligence and Security (pp. 69--80). ACM.
[17]
Freeman, D., Jain, S., Dürmuth, M., Biggio, B., & Giacinto, G. (2016, February). Who Are You? A Statistical Approach to Measuring User Authenticity. In NDSS (pp. 1--15).
[18]
Xiao, C., Freeman, D. M., & Hwa, T. (2015, October). Detecting clusters of fake accounts in online social networks. In Proceedings of the 8th ACM Workshop on Artificial Intelligence and Security (pp. 91--101). ACM.
[19]
Sabbah, T., Selamat, A., Selamat, M. H., Ibrahim, R., & Fujita, H. (2016). Hybridized term-weighting method for dark web classification. Neurocomputing, 173, 1908--1926.
[20]
Gehl, R. W. (2018). Archives for the Dark Web: A Field Guide for Study. In Research Methods for the Digital Humanities (pp. 31--51). Palgrave Macmillan, Cham.
[21]
Dalins, J., Wilson, C., & Carman, M. (2018). Criminal motivation on the dark web: A categorisation model for law enforcement. Digital Investigation, 24, 62--71.
[22]
Huang, R., Zhou, P., & Zhang, L. (2014). A LDA-based approach for semi-supervised document clustering. International Journal of Machine Learning and Computing, 4(4), 313. DOI = http://dx.doi.org/10.7763/IJMLC.2014.V4.430
[23]
Hoffman, M., Bach, F. R., & Blei, D. M. (2010). Online learning for latent dirichlet allocation. In advances in neural information processing systems (pp. 856--864).

Cited By

View all
  • (2024)Forensic Analysis of I2P Communication Network in Android and macOS Environments2024 11th International Conference on Computing for Sustainable Global Development (INDIACom)10.23919/INDIACom61295.2024.10498888(1102-1108)Online publication date: 28-Feb-2024
  • (2024)An Analysis of Topic Modeling Approaches for Unlabeled Dark Web Data ClassificationInnovations and Advances in Cognitive Systems10.1007/978-3-031-69201-7_12(150-162)Online publication date: 25-Sep-2024
  • (2022)Structure Analysis of Tor Hidden Services Using DOM-Inspired GraphsUsing Computational Intelligence for the Dark Web and Illicit Behavior Detection10.4018/978-1-6684-6444-1.ch003(33-55)Online publication date: 6-May-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICSIM '20: Proceedings of the 3rd International Conference on Software Engineering and Information Management
January 2020
258 pages
ISBN:9781450376907
DOI:10.1145/3378936
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

  • University of Science and Technology of China: University of Science and Technology of China

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 March 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Dark web conent analysis
  2. Multi-categarization labels
  3. ZeroNet

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ICSIM '20

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)13
  • Downloads (Last 6 weeks)1
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Forensic Analysis of I2P Communication Network in Android and macOS Environments2024 11th International Conference on Computing for Sustainable Global Development (INDIACom)10.23919/INDIACom61295.2024.10498888(1102-1108)Online publication date: 28-Feb-2024
  • (2024)An Analysis of Topic Modeling Approaches for Unlabeled Dark Web Data ClassificationInnovations and Advances in Cognitive Systems10.1007/978-3-031-69201-7_12(150-162)Online publication date: 25-Sep-2024
  • (2022)Structure Analysis of Tor Hidden Services Using DOM-Inspired GraphsUsing Computational Intelligence for the Dark Web and Illicit Behavior Detection10.4018/978-1-6684-6444-1.ch003(33-55)Online publication date: 6-May-2022
  • (2022)Darknet Traffic Analysis and Network Management for Malicious Intent Detection by Neural Network FrameworksUsing Computational Intelligence for the Dark Web and Illicit Behavior Detection10.4018/978-1-6684-6444-1.ch001(1-19)Online publication date: 6-May-2022
  • (2022)SoK: An Evaluation of the Secure End User Experience on the Dark Net through Systematic Literature ReviewJournal of Cybersecurity and Privacy10.3390/jcp20200182:2(329-357)Online publication date: 27-May-2022
  • (2022)Algorithms for the classification of text documents, taking into account proximity in the attribute spaceModeling of systems and processes10.12737/2219-0767-2022-15-1-36-4315:1(36-43)Online publication date: 8-Apr-2022
  • (2022)Security of Cyber-Physical Systems Through the Lenses of the Dark WebProceedings of International Conference on Intelligent Cyber-Physical Systems10.1007/978-981-16-7136-4_4(39-50)Online publication date: 24-Jan-2022
  • (2021)Darknet Traffic Big-Data Analysis and Network Management for Real-Time Automating of the Malicious Intent Detection Process by a Weight Agnostic Neural Networks FrameworkElectronics10.3390/electronics1007078110:7(781)Online publication date: 25-Mar-2021
  • (2021)Blockchain for decentralization of internet: prospects, trends, and challengesCluster Computing10.1007/s10586-021-03301-8Online publication date: 15-May-2021

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media