pyDNetTopic: A Framework for Uncovering What Darknet Market Users Talking About

Yang, Jingcheng; Ye, Haowei; Zou, Futai

doi:10.1007/978-3-030-63086-7_8

pyDNetTopic: A Framework for Uncovering What Darknet Market Users Talking About

Jingcheng Yang²⁰,
Haowei Ye²¹ &
Futai Zou²⁰

Conference paper
First Online: 12 December 2020

1009 Accesses

Part of the book series: Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering ((LNICST,volume 335))

Abstract

Although Dark Net Market (DNM) has attracted more and more researchers’ interests, we found most works focus on the markets while ignore the forums related with them. Ignoring DNM forums is undoubtedly a huge waste of informative intelligence. Previous works usually utilize LDA for darknet data mining. However, traditional topic models cannot handle the posts in forums with various lengths, which incurs unaffordable complexity or performance degradation. In this paper, an improved Bi-term Topic Model named Filtered Bi-term Model, is proposed to extract potential topics in DNM forums for balancing both overhead and performance. Experimental results prove that the topical words extracted by FBTM are more coherent than LDA and DMM. Furthermore, we proposed a general framework named pyDNetTopic for content extracting and topic modeling uncovering DNM forums automatically. The full results we apply pyDNetTopic to Agora forum demonstrate the capability of FBTM to capture informative intelligence in DNM forums as well as the practicality of pyDNetTopic.

This work is supported by the National Key Research and Development Program of China (No. 2017YFB0802300).

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
The code is available in https://github.com/blade-prayer/pyDNetTopic.

References

Aletras, N., Stevenson, M.: Evaluating topic coherence using distributional semantics, pp. 13–22, March 2013
Google Scholar
Almukaynizi, M., Grimm, A., Nunes, E., Shakarian, J., Shakarian, P.: Predicting cyber threats through hacker social networks in darkweb and deepweb forums, pp. 1–7, October 2017. https://doi.org/10.1145/3145574.3145590
Biddle, P., England, P., Peinado, M., Willman, B.: The darknet and the future of content protection. In: Feigenbaum, J. (ed.) DRM 2002. LNCS, vol. 2696, pp. 155–176. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-44993-5_10
Chapter Google Scholar
Blei, D., Ng, A., Jordan, M.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993 (2013)
MATH Google Scholar
Branwen, G., et al.: Dark net market archives, 2011–2015. www.gwern.net/Blackmarket%20archives (2015)
Christin, N.: Traveling the silk road: a measurement analysis of a large anonymous online marketplace, pp. 213–224, May 2013. https://doi.org/10.1145/2488388.2488408
Deliu, I., Leichter, C., Franke, K.: Collecting cyber threat intelligence from hacker forums via a two-stage, hybrid process using support vector machines and latent dirichlet allocation, pp. 5008–5013, December2018. https://doi.org/10.1109/BigData.2018.8622469
Dittus, M., Wright, J., Graham, M.: Platform criminalism: The ‘last-mile’ geography of the darknet market supply chain, pp. 277–286, April 2018. https://doi.org/10.1145/3178876.3186094
Eimer, T., Luimers, J.: Onion governance: Securing drug transactions in dark net market platforms, August 08 2019
Google Scholar
Grisham, J., Barreras, C., Afarin, C., Patton, M.: Identifying top listers in alphabay using latent dirichlet allocation, p. 219, September 2016. https://doi.org/10.1109/ISI.2016.7745477
Hout, M.C., Bingham, T.: ‘Surfing the silk road’: a study of users’ experiences. Int. J. Drug Policy 24, 524–529 (2013). https://doi.org/10.1016/j.drugpo.2013.08.011
Article Google Scholar
Jin, O., Liu, N., Zhao, K., Yu, Y., Yang, Q.: Transferring topical knowledge from auxiliary long texts for short text clustering, pp. 775–784, October 2011. https://doi.org/10.1145/2063576.2063689
Larochelle, H., Lauly, S.: A neural autoregressive topic model. In: Advances in Neural Information Processing Systems, vol. 4, pp. 2708–2716, January 01 2012
Google Scholar
Mimno, D., Wallach, H., Talley, E., Leenders, M., Mccallum, A.: Optimizing semantic coherence in topic models, pp. 262–272, January 2011
Google Scholar
Newman, D., Lau, J., Grieser, K., Baldwin, T.: Automatic evaluation of topic coherence, pp. 100–108, January 2010
Google Scholar
Nunes, E., et al.: Darknet and deepnet mining for proactive cybersecurity threat intelligence, July 2016
Google Scholar
Phan, X., Nguyen, L., Horiguchi, S.: Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In: Proceedings 17th International Conference on World Wide Web, pp. 91–100, February 2020
Google Scholar
Phan, X.H., Nguyen, C.T., Le, D.T., Nguyen, L., Horiguchi, S., Ha, Q.: A hidden topic-based framework toward building applications with short web documents. IEEE Trans. Knowl. Data Eng. 23, 961–976 (2011). https://doi.org/10.1109/TKDE.2010.27
Article Google Scholar
Porter, K.: Analyzing the DarkNetMarkets subreddit for evolutions of tools and trends using LDA topic modeling. Digit. Invest. Int. J. Digit. Forensics Incid. Response 26, S87–S97 (2018)
Google Scholar
Röder, M., Both, A., Hinneburg, A.: Exploring the space of topic coherence measures. In: WSDM 2015 - Proceedings of the 8th ACM International Conference on Web Search and Data Mining, pp. 399–408, February 2015. https://doi.org/10.1145/2684822.2685324
Salakhutdinov, R., Hinton, G.: Replicated softmax: an undirected topic model. pp. 1607–1614, January 2009
Google Scholar
Samtani, S., Chinn, R., Chen, H.: Exploring hacker assets in underground forums, pp. 31–36, May 2015. https://doi.org/10.1109/ISI.2015.7165935
Samtani, S., Chinn, R., Chen, H., Nunamaker, J.: Exploring emerging hacker assets and key hackers for proactive cyber threat intelligence. J. Manag. Inf. Syst. 34, 1023–1053 (2017). https://doi.org/10.1080/07421222.2017.1394049
Article Google Scholar
Sapienza, A., Bessi, A., Damodaran, S., Shakarian, P., Lerman, K., Ferrara, E.: Early warnings of cyber threats in online discussions, January 2018
Google Scholar
Sievert, C., Shirley, K.: Ldavis: A method for visualizing and interpreting topics, June 2014. https://doi.org/10.13140/2.1.1394.3043
Srivastava, A., Sutton, C.: Autoencoding variational inference for topic models, March 2017
Google Scholar
Xia, Y., Tang, N., Hussain, A., Cambria, E.: Discriminative bi-term topic model for headline-based social news clustering. In: FLAIRS Conference (2015)
Google Scholar
Yan, X., Guo, J., Lan, Y., Cheng, X.: A biterm topic model for short texts. pp. 1445–1456, May 2013. https://doi.org/10.1145/2488388.2488514
Yin, J., Wang, J.: A dirichlet multinomial mixture model-based approach for short text clustering. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 2014. https://doi.org/10.1145/2623330.2623715
Zhang, H., Chen, B., Guo, D., Zhou, M.: Whai: Weibull hybrid autoencoding inference for deep topic modeling, March 2018
Google Scholar

Download references

Author information

Authors and Affiliations

School of Cyper Science and Engineering, Shanghai Jiao Tong University, Shanghai, China
Jingcheng Yang & Futai Zou
School of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu, China
Haowei Ye

Authors

Jingcheng Yang
View author publications
You can also search for this author in PubMed Google Scholar
Haowei Ye
View author publications
You can also search for this author in PubMed Google Scholar
Futai Zou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Futai Zou .

Editor information

Editors and Affiliations

Yonsei University, Seoul, Korea (Republic of)
Noseong Park
George Mason University, Fairfax, VA, USA
Kun Sun
Dipartimento di Informatica, Universita degli Studi, Milan, Milano, Italy
Sara Foresti
University of Florida, Gainesville, FL, USA
Kevin Butler
Division of Nephrology, University of Alabama, Birmingham, AL, USA
Nitesh Saxena

Appendices

Appendix A List of Additional Stop Words

The listing words are some common words among all topics that provide no useful information. We regard such words as general stop words in pyDNetTopic and remove them in preprocessing.

fuck, get, got, shit, see, u0e2a, would, use, think, like, xa0, sr, know, u0e3f, good, tquot, u2591, u25ac, make, fe, day, although, ands, soooo, yet, favs, So, ll, went, br, en, often, knowing, liking, one, get, thinking, even, could, go, going, fucking, fuck, shit, also, use, using, much, got, good, make, making, really, see, want, need, sure, right, still, take, taking (Tables 4, 5, 6, 7, 8, 9, 10, 11, 12, 13) .

Appendix B Full Topic Results of Agora Forums in 2014

Table 4. Result of 2014-01-09

Full size table

Table 5. Result of 2014-02-03

Full size table

Table 6. Result of 2014-02-24

Full size table

Table 7. Result of 2014-03-19

Full size table

Table 8. Result of 2014-05-10

Full size table

Table 9. Result of 2014-06-04

Full size table

Table 10. Result of 2014-09-08

Full size table

Table 11. Result of 2014-10-29

Full size table

Table 12. Rresult of 2014-11-11

Full size table

Table 13. Result of 2014-11-19

Full size table

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yang, J., Ye, H., Zou, F. (2020). pyDNetTopic: A Framework for Uncovering What Darknet Market Users Talking About. In: Park, N., Sun, K., Foresti, S., Butler, K., Saxena, N. (eds) Security and Privacy in Communication Networks. SecureComm 2020. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 335. Springer, Cham. https://doi.org/10.1007/978-3-030-63086-7_8

Download citation

DOI: https://doi.org/10.1007/978-3-030-63086-7_8
Published: 12 December 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-63085-0
Online ISBN: 978-3-030-63086-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Abstract

Buying options

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendices

Appendix A List of Additional Stop Words

Appendix B Full Topic Results of Agora Forums in 2014

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation