Abstract
Although Dark Net Market (DNM) has attracted more and more researchers’ interests, we found most works focus on the markets while ignore the forums related with them. Ignoring DNM forums is undoubtedly a huge waste of informative intelligence. Previous works usually utilize LDA for darknet data mining. However, traditional topic models cannot handle the posts in forums with various lengths, which incurs unaffordable complexity or performance degradation. In this paper, an improved Bi-term Topic Model named Filtered Bi-term Model, is proposed to extract potential topics in DNM forums for balancing both overhead and performance. Experimental results prove that the topical words extracted by FBTM are more coherent than LDA and DMM. Furthermore, we proposed a general framework named pyDNetTopic for content extracting and topic modeling uncovering DNM forums automatically. The full results we apply pyDNetTopic to Agora forum demonstrate the capability of FBTM to capture informative intelligence in DNM forums as well as the practicality of pyDNetTopic.
This work is supported by the National Key Research and Development Program of China (No. 2017YFB0802300).
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
The code is available in https://github.com/blade-prayer/pyDNetTopic.
References
Aletras, N., Stevenson, M.: Evaluating topic coherence using distributional semantics, pp. 13–22, March 2013
Almukaynizi, M., Grimm, A., Nunes, E., Shakarian, J., Shakarian, P.: Predicting cyber threats through hacker social networks in darkweb and deepweb forums, pp. 1–7, October 2017. https://doi.org/10.1145/3145574.3145590
Biddle, P., England, P., Peinado, M., Willman, B.: The darknet and the future of content protection. In: Feigenbaum, J. (ed.) DRM 2002. LNCS, vol. 2696, pp. 155–176. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-44993-5_10
Blei, D., Ng, A., Jordan, M.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993 (2013)
Branwen, G., et al.: Dark net market archives, 2011–2015. www.gwern.net/Blackmarket%20archives (2015)
Christin, N.: Traveling the silk road: a measurement analysis of a large anonymous online marketplace, pp. 213–224, May 2013. https://doi.org/10.1145/2488388.2488408
Deliu, I., Leichter, C., Franke, K.: Collecting cyber threat intelligence from hacker forums via a two-stage, hybrid process using support vector machines and latent dirichlet allocation, pp. 5008–5013, December2018. https://doi.org/10.1109/BigData.2018.8622469
Dittus, M., Wright, J., Graham, M.: Platform criminalism: The ‘last-mile’ geography of the darknet market supply chain, pp. 277–286, April 2018. https://doi.org/10.1145/3178876.3186094
Eimer, T., Luimers, J.: Onion governance: Securing drug transactions in dark net market platforms, August 08 2019
Grisham, J., Barreras, C., Afarin, C., Patton, M.: Identifying top listers in alphabay using latent dirichlet allocation, p. 219, September 2016. https://doi.org/10.1109/ISI.2016.7745477
Hout, M.C., Bingham, T.: ‘Surfing the silk road’: a study of users’ experiences. Int. J. Drug Policy 24, 524–529 (2013). https://doi.org/10.1016/j.drugpo.2013.08.011
Jin, O., Liu, N., Zhao, K., Yu, Y., Yang, Q.: Transferring topical knowledge from auxiliary long texts for short text clustering, pp. 775–784, October 2011. https://doi.org/10.1145/2063576.2063689
Larochelle, H., Lauly, S.: A neural autoregressive topic model. In: Advances in Neural Information Processing Systems, vol. 4, pp. 2708–2716, January 01 2012
Mimno, D., Wallach, H., Talley, E., Leenders, M., Mccallum, A.: Optimizing semantic coherence in topic models, pp. 262–272, January 2011
Newman, D., Lau, J., Grieser, K., Baldwin, T.: Automatic evaluation of topic coherence, pp. 100–108, January 2010
Nunes, E., et al.: Darknet and deepnet mining for proactive cybersecurity threat intelligence, July 2016
Phan, X., Nguyen, L., Horiguchi, S.: Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In: Proceedings 17th International Conference on World Wide Web, pp. 91–100, February 2020
Phan, X.H., Nguyen, C.T., Le, D.T., Nguyen, L., Horiguchi, S., Ha, Q.: A hidden topic-based framework toward building applications with short web documents. IEEE Trans. Knowl. Data Eng. 23, 961–976 (2011). https://doi.org/10.1109/TKDE.2010.27
Porter, K.: Analyzing the DarkNetMarkets subreddit for evolutions of tools and trends using LDA topic modeling. Digit. Invest. Int. J. Digit. Forensics Incid. Response 26, S87–S97 (2018)
Röder, M., Both, A., Hinneburg, A.: Exploring the space of topic coherence measures. In: WSDM 2015 - Proceedings of the 8th ACM International Conference on Web Search and Data Mining, pp. 399–408, February 2015. https://doi.org/10.1145/2684822.2685324
Salakhutdinov, R., Hinton, G.: Replicated softmax: an undirected topic model. pp. 1607–1614, January 2009
Samtani, S., Chinn, R., Chen, H.: Exploring hacker assets in underground forums, pp. 31–36, May 2015. https://doi.org/10.1109/ISI.2015.7165935
Samtani, S., Chinn, R., Chen, H., Nunamaker, J.: Exploring emerging hacker assets and key hackers for proactive cyber threat intelligence. J. Manag. Inf. Syst. 34, 1023–1053 (2017). https://doi.org/10.1080/07421222.2017.1394049
Sapienza, A., Bessi, A., Damodaran, S., Shakarian, P., Lerman, K., Ferrara, E.: Early warnings of cyber threats in online discussions, January 2018
Sievert, C., Shirley, K.: Ldavis: A method for visualizing and interpreting topics, June 2014. https://doi.org/10.13140/2.1.1394.3043
Srivastava, A., Sutton, C.: Autoencoding variational inference for topic models, March 2017
Xia, Y., Tang, N., Hussain, A., Cambria, E.: Discriminative bi-term topic model for headline-based social news clustering. In: FLAIRS Conference (2015)
Yan, X., Guo, J., Lan, Y., Cheng, X.: A biterm topic model for short texts. pp. 1445–1456, May 2013. https://doi.org/10.1145/2488388.2488514
Yin, J., Wang, J.: A dirichlet multinomial mixture model-based approach for short text clustering. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 2014. https://doi.org/10.1145/2623330.2623715
Zhang, H., Chen, B., Guo, D., Zhou, M.: Whai: Weibull hybrid autoencoding inference for deep topic modeling, March 2018
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
Appendix A List of Additional Stop Words
The listing words are some common words among all topics that provide no useful information. We regard such words as general stop words in pyDNetTopic and remove them in preprocessing.
fuck, get, got, shit, see, u0e2a, would, use, think, like, xa0, sr, know, u0e3f, good, tquot, u2591, u25ac, make, fe, day, although, ands, soooo, yet, favs, So, ll, went, br, en, often, knowing, liking, one, get, thinking, even, could, go, going, fucking, fuck, shit, also, use, using, much, got, good, make, making, really, see, want, need, sure, right, still, take, taking (Tables 4, 5, 6, 7, 8, 9, 10, 11, 12, 13) .
Appendix B Full Topic Results of Agora Forums in 2014
Rights and permissions
Copyright information
© 2020 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
About this paper
Cite this paper
Yang, J., Ye, H., Zou, F. (2020). pyDNetTopic: A Framework for Uncovering What Darknet Market Users Talking About. In: Park, N., Sun, K., Foresti, S., Butler, K., Saxena, N. (eds) Security and Privacy in Communication Networks. SecureComm 2020. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 335. Springer, Cham. https://doi.org/10.1007/978-3-030-63086-7_8
Download citation
DOI: https://doi.org/10.1007/978-3-030-63086-7_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-63085-0
Online ISBN: 978-3-030-63086-7
eBook Packages: Computer ScienceComputer Science (R0)