Skip to main content

pyDNetTopic: A Framework for Uncovering What Darknet Market Users Talking About

  • Conference paper
  • First Online:
  • 1009 Accesses

Abstract

Although Dark Net Market (DNM) has attracted more and more researchers’ interests, we found most works focus on the markets while ignore the forums related with them. Ignoring DNM forums is undoubtedly a huge waste of informative intelligence. Previous works usually utilize LDA for darknet data mining. However, traditional topic models cannot handle the posts in forums with various lengths, which incurs unaffordable complexity or performance degradation. In this paper, an improved Bi-term Topic Model named Filtered Bi-term Model, is proposed to extract potential topics in DNM forums for balancing both overhead and performance. Experimental results prove that the topical words extracted by FBTM are more coherent than LDA and DMM. Furthermore, we proposed a general framework named pyDNetTopic for content extracting and topic modeling uncovering DNM forums automatically. The full results we apply pyDNetTopic to Agora forum demonstrate the capability of FBTM to capture informative intelligence in DNM forums as well as the practicality of pyDNetTopic.

This work is supported by the National Key Research and Development Program of China (No. 2017YFB0802300).

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    The code is available in https://github.com/blade-prayer/pyDNetTopic.

References

  1. Aletras, N., Stevenson, M.: Evaluating topic coherence using distributional semantics, pp. 13–22, March 2013

    Google Scholar 

  2. Almukaynizi, M., Grimm, A., Nunes, E., Shakarian, J., Shakarian, P.: Predicting cyber threats through hacker social networks in darkweb and deepweb forums, pp. 1–7, October 2017. https://doi.org/10.1145/3145574.3145590

  3. Biddle, P., England, P., Peinado, M., Willman, B.: The darknet and the future of content protection. In: Feigenbaum, J. (ed.) DRM 2002. LNCS, vol. 2696, pp. 155–176. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-44993-5_10

    Chapter  Google Scholar 

  4. Blei, D., Ng, A., Jordan, M.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993 (2013)

    MATH  Google Scholar 

  5. Branwen, G., et al.: Dark net market archives, 2011–2015. www.gwern.net/Blackmarket%20archives (2015)

  6. Christin, N.: Traveling the silk road: a measurement analysis of a large anonymous online marketplace, pp. 213–224, May 2013. https://doi.org/10.1145/2488388.2488408

  7. Deliu, I., Leichter, C., Franke, K.: Collecting cyber threat intelligence from hacker forums via a two-stage, hybrid process using support vector machines and latent dirichlet allocation, pp. 5008–5013, December2018. https://doi.org/10.1109/BigData.2018.8622469

  8. Dittus, M., Wright, J., Graham, M.: Platform criminalism: The ‘last-mile’ geography of the darknet market supply chain, pp. 277–286, April 2018. https://doi.org/10.1145/3178876.3186094

  9. Eimer, T., Luimers, J.: Onion governance: Securing drug transactions in dark net market platforms, August 08 2019

    Google Scholar 

  10. Grisham, J., Barreras, C., Afarin, C., Patton, M.: Identifying top listers in alphabay using latent dirichlet allocation, p. 219, September 2016. https://doi.org/10.1109/ISI.2016.7745477

  11. Hout, M.C., Bingham, T.: ‘Surfing the silk road’: a study of users’ experiences. Int. J. Drug Policy 24, 524–529 (2013). https://doi.org/10.1016/j.drugpo.2013.08.011

    Article  Google Scholar 

  12. Jin, O., Liu, N., Zhao, K., Yu, Y., Yang, Q.: Transferring topical knowledge from auxiliary long texts for short text clustering, pp. 775–784, October 2011. https://doi.org/10.1145/2063576.2063689

  13. Larochelle, H., Lauly, S.: A neural autoregressive topic model. In: Advances in Neural Information Processing Systems, vol. 4, pp. 2708–2716, January 01 2012

    Google Scholar 

  14. Mimno, D., Wallach, H., Talley, E., Leenders, M., Mccallum, A.: Optimizing semantic coherence in topic models, pp. 262–272, January 2011

    Google Scholar 

  15. Newman, D., Lau, J., Grieser, K., Baldwin, T.: Automatic evaluation of topic coherence, pp. 100–108, January 2010

    Google Scholar 

  16. Nunes, E., et al.: Darknet and deepnet mining for proactive cybersecurity threat intelligence, July 2016

    Google Scholar 

  17. Phan, X., Nguyen, L., Horiguchi, S.: Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In: Proceedings 17th International Conference on World Wide Web, pp. 91–100, February 2020

    Google Scholar 

  18. Phan, X.H., Nguyen, C.T., Le, D.T., Nguyen, L., Horiguchi, S., Ha, Q.: A hidden topic-based framework toward building applications with short web documents. IEEE Trans. Knowl. Data Eng. 23, 961–976 (2011). https://doi.org/10.1109/TKDE.2010.27

    Article  Google Scholar 

  19. Porter, K.: Analyzing the DarkNetMarkets subreddit for evolutions of tools and trends using LDA topic modeling. Digit. Invest. Int. J. Digit. Forensics Incid. Response 26, S87–S97 (2018)

    Google Scholar 

  20. Röder, M., Both, A., Hinneburg, A.: Exploring the space of topic coherence measures. In: WSDM 2015 - Proceedings of the 8th ACM International Conference on Web Search and Data Mining, pp. 399–408, February 2015. https://doi.org/10.1145/2684822.2685324

  21. Salakhutdinov, R., Hinton, G.: Replicated softmax: an undirected topic model. pp. 1607–1614, January 2009

    Google Scholar 

  22. Samtani, S., Chinn, R., Chen, H.: Exploring hacker assets in underground forums, pp. 31–36, May 2015. https://doi.org/10.1109/ISI.2015.7165935

  23. Samtani, S., Chinn, R., Chen, H., Nunamaker, J.: Exploring emerging hacker assets and key hackers for proactive cyber threat intelligence. J. Manag. Inf. Syst. 34, 1023–1053 (2017). https://doi.org/10.1080/07421222.2017.1394049

    Article  Google Scholar 

  24. Sapienza, A., Bessi, A., Damodaran, S., Shakarian, P., Lerman, K., Ferrara, E.: Early warnings of cyber threats in online discussions, January 2018

    Google Scholar 

  25. Sievert, C., Shirley, K.: Ldavis: A method for visualizing and interpreting topics, June 2014. https://doi.org/10.13140/2.1.1394.3043

  26. Srivastava, A., Sutton, C.: Autoencoding variational inference for topic models, March 2017

    Google Scholar 

  27. Xia, Y., Tang, N., Hussain, A., Cambria, E.: Discriminative bi-term topic model for headline-based social news clustering. In: FLAIRS Conference (2015)

    Google Scholar 

  28. Yan, X., Guo, J., Lan, Y., Cheng, X.: A biterm topic model for short texts. pp. 1445–1456, May 2013. https://doi.org/10.1145/2488388.2488514

  29. Yin, J., Wang, J.: A dirichlet multinomial mixture model-based approach for short text clustering. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 2014. https://doi.org/10.1145/2623330.2623715

  30. Zhang, H., Chen, B., Guo, D., Zhou, M.: Whai: Weibull hybrid autoencoding inference for deep topic modeling, March 2018

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Futai Zou .

Editor information

Editors and Affiliations

Appendices

Appendix A List of Additional Stop Words

The listing words are some common words among all topics that provide no useful information. We regard such words as general stop words in pyDNetTopic and remove them in preprocessing.

fuck, get, got, shit, see, u0e2a, would, use, think, like, xa0, sr, know, u0e3f, good, tquot, u2591, u25ac, make, fe, day, although, ands, soooo, yet, favs, So, ll, went, br, en, often, knowing, liking, one, get, thinking, even, could, go, going, fucking, fuck, shit, also, use, using, much, got, good, make, making, really, see, want, need, sure, right, still, take, taking (Tables 4, 5, 6, 7, 8, 9, 10, 11, 12, 13) .

Appendix B Full Topic Results of Agora Forums in 2014

Table 4. Result of 2014-01-09
Table 5. Result of 2014-02-03
Table 6. Result of 2014-02-24
Table 7. Result of 2014-03-19
Table 8. Result of 2014-05-10
Table 9. Result of 2014-06-04
Table 10. Result of 2014-09-08
Table 11. Result of 2014-10-29
Table 12. Rresult of 2014-11-11
Table 13. Result of 2014-11-19

Rights and permissions

Reprints and permissions

Copyright information

© 2020 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Yang, J., Ye, H., Zou, F. (2020). pyDNetTopic: A Framework for Uncovering What Darknet Market Users Talking About. In: Park, N., Sun, K., Foresti, S., Butler, K., Saxena, N. (eds) Security and Privacy in Communication Networks. SecureComm 2020. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 335. Springer, Cham. https://doi.org/10.1007/978-3-030-63086-7_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-63086-7_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-63085-0

  • Online ISBN: 978-3-030-63086-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics