From Cracked Accounts to Fake IDs: User Profiling on German Telegram Black Market Channels

Büsgen, André; Klöser, Lars; Kohl, Philipp; Schmidts, Oliver; Kraft, Bodo; Zündorf, Albert

doi:10.1007/978-3-031-37890-4_9

André Büsgen⁹,
Lars Klöser⁹,
Philipp Kohl⁹,
Oliver Schmidts⁹,
Bodo Kraft⁹ &
…
Albert Zündorf¹⁰

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1860))

Included in the following conference series:

228 Accesses

Abstract

Messenger apps like WhatsApp and Telegram are frequently used for everyday communication, but they can also be utilized as a platform for illegal activity. Telegram allows public groups with up to 200.000 participants. Criminals use these public groups for trading illegal commodities and services, which becomes a concern for law enforcement agencies, who manually monitor suspicious activity in these chat rooms. This research demonstrates how natural language processing (NLP) can assist in analyzing these chat rooms, providing an explorative overview of the domain and facilitating purposeful analyses of user behavior. We provide a publicly available corpus of annotated text messages with entities and relations from four self-proclaimed black market chat rooms. Our pipeline approach aggregates the extracted product attributes from user messages to profiles and uses these with their sold products as features for clustering. The extracted structured information is the foundation for further data exploration, such as identifying the top vendors or fine-granular price analyses. Our evaluation shows that pretrained word vectors perform better for unsupervised clustering than state-of-the-art transformer models, while the latter is still superior for sequence labeling.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://github.com/Abuesgen/From-Cracked-Accounts-to-Fake-IDs.git.
2.
Google request for the keyphrase “telegram groups” gives many results for search engines.
3.
https://core.telegram.org/api/mentions.
4.
https://core.telegram.org/.
5.
The lower boundary of the second-best score.
6.
https://mlco2.github.io/impact#compute.
7.
Following Krippendorff’s alpha.
8.
At the time of writing this paper, the monthly premium package prices are 5.29 € for NordVPN and 30,00 € for the first year, and 66,90 € afterward for Sky in Germany.

References

Sklearn.cluster.AgglomerativeClustering. https://scikit-learn.org/stable/modules/generated/sklearn.cluster.AgglomerativeClustering.html. Accessed 01 Mar 2022
T-Systems-onsite/cross-en-de-roberta-sentence-transformer \(\cdot \) Hugging Face. https://huggingface.co/T-Systems-onsite/cross-en-de-roberta-sentence-transformer. Accessed 14 Dec 2022
Akbik, A., Bergmann, T., Blythe, D., Rasul, K., Schweter, S., Vollgraf, R.: FLAIR: an easy-to-use framework for state-of-the-art NLP. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), pp. 54–59. Association for Computational Linguistics, Minneapolis, June 2019. https://doi.org/10.18653/v1/N19-4010. https://aclanthology.org/N19-4010
Akbik, A., Blythe, D., Vollgraf, R.: Contextual string embeddings for sequence labeling. In: Proceedings of the 27th International Conference on Computational Linguistics, pp. 1638–1649. Association for Computational Linguistics, Santa Fe, August 2018. https://aclanthology.org/C18-1139
Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: a next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2019, pp. 2623–2631. Association for Computing Machinery, New York, July 2019. https://doi.org/10.1145/3292500.3330701
Baravalle, A., Lopez, M.S., Lee, S.W.: Mining the dark web: drugs and fake ids. In: 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW), pp. 350–356, December 2016. https://doi.org/10.1109/ICDMW.2016.0056
Benikova, D., Biemann, C., Reznicek, M.: NoSta-D named entity annotation for German: guidelines and dataset. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014), pp. 2524–2531. European Language Resources Association (ELRA), Reykjavik, May 2014. http://www.lrec-conf.org/proceedings/lrec2014/pdf/276_Paper.pdf
Bitkom: Neun von zehn Internetnutzern verwenden Messenger | Bitkom Main (2018). http://www.bitkom.org/Presse/Presseinformation/Neun-von-zehn-Internetnutzern-verwenden-Messenger.html. Accessed 18 Feb 2022
Blankers, M., van der Gouwe, D., Stegemann, L., Smit-Rigter, L.: Changes in online psychoactive substance trade via telegram during the COVID-19 pandemic. Eur. Addict. Res. 27(6), 469–474 (2021). https://doi.org/10.1159/000516853. https://www.karger.com/Article/FullText/516853
Büsgen, A., Klöser, L., Kohl, P., Schmidts, O., Kraft, B., Zündorf, A.: Exploratory analysis of chat-based black market profiles with natural language processing. In: Proceedings of the 11th International Conference on Data Science, Technology and Applications, pp. 83–94. SCITEPRESS - Science and Technology Publications, Lisbon (2022). https://doi.org/10.5220/0011271400003269. https://www.scitepress.org/DigitalLibrary/Link.aspx?doi=10.5220/0011271400003269
Camacho-Collados, J., Doval, Y., Martínez-Cámara, E., Espinosa-Anke, L., Barbieri, F., Schockaert, S.: Learning cross-lingual embeddings from Twitter via distant supervision, March 2020. http://arxiv.org/abs/1905.07358
Chan, B., Schweter, S., Möller, T.: German’s next language model. arXiv:2010.10906 [cs], December 2020
Chauhan, P., Sharma, N., Sikka, G.: The emergence of social media data and sentiment analysis in election prediction. J. Ambient Intell. Human. Comput. 12(2), 2601–2627 (2021). https://doi.org/10.1007/s12652-020-02423-y
Article Google Scholar
Christin, N.: Traveling the silk road: a measurement analysis of a large anonymous online marketplace. In: Proceedings of the 22nd International Conference on World Wide Web (2013). https://doi.org/10.1145/2488388.2488408
Dangi, D., Dixit, D.K., Bhagat, A.: Sentiment analysis of COVID-19 social media data through machine learning. Multimedia Tools Appl. 81(29), 42261–42283 (2022). https://doi.org/10.1007/s11042-022-13492-w
Article Google Scholar
Dargahi Nobari, A., Sarraf, M., Neshati, M., Daneshvar, F.: Characteristics of viral messages on Telegram; the world’s largest hybrid public and private messenger. Expert Syst. Appl. 168, 114303 (2020). https://doi.org/10.1016/j.eswa.2020.114303
Article Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805 [cs], May 2019
Doddington, G., Mitchell, A., Przybocki, M.A., Ramshaw, L., Strassel, S., Weischedel, R.: The automatic content extraction (ACE) program - tasks, data, and evaluation. In: International Conference on Language Resources and Evaluation (2004). https://www.semanticscholar.org/paper/The-Automatic-Content-Extraction-(ACE)-Program-and-Doddington-Mitchell/0617dd6924df7a3491c299772b70e90507b195dc
Eberts, M., Ulges, A.: Span-based joint entity and relation extraction with transformer pre-training, June 2021. https://doi.org/10.3233/FAIA200321. http://arxiv.org/abs/1909.07755
Gomathi, C.: Social tagging system for community detecting using NLP technique. Int. J. Res. Appl. Sci. Eng. Technol. 6, 1665–1671 (2018). https://doi.org/10.22214/ijraset.2018.4279
Article Google Scholar
Griffith, V., Xu, Y., Ratti, C.: Graph theoretic properties of the darkweb. arXiv:1704.07525 [cs] (2017)
Hennig, L., Truong, P.T., Gabryszak, A.: MobIE: a German dataset for named entity recognition, entity linking and relation extraction in the mobility domain. In: Proceedings of the 17th Conference on Natural Language Processing (KONVENS 2021), pp. 223–227. KONVENS 2021 Organizers, Düsseldorf (2021). https://aclanthology.org/2021.konvens-1.22
Hoseini, M., Melo, P., Benevenuto, F., Feldmann, A., Zannettou, S.: On the globalization of the QAnon conspiracy theory through Telegram. ArXiv, May 2021. https://www.semanticscholar.org/paper/On-the-Globalization-of-the-QAnon-Conspiracy-Theory-Hoseini-Melo/1b0f3a6da334b898ddb070657c980349d31be4e2
Huang, Z., Xu, W., Yu, K.: Bidirectional LSTM-CRF models for sequence tagging. arXiv:1508.01991 [cs], August 2015
Jin, D., et al.: A survey of community detection approaches: from statistical modeling to deep learning. IEEE Trans. Knowl. Data Eng. 35(2), 1149–1170 (2021). https://doi.org/10.1109/TKDE.2021.3104155. https://ieeexplore.ieee.org/document/9511798/
Kartal, G.: What’s up with WhatsApp? a critical analysis of mobile instant messaging research in language learning. Int. J. Contemp. Educ. Res. 6(2), 352–365 (2019). https://doi.org/10.33200/ijcer.599138. https://dergipark.org.tr/en/doi/10.33200/ijcer.599138
Klöser, L., Kohl, P., Kraft, B., Zündorf, A.: Multi-attribute relation extraction (MARE) - simplifying the application of relation extraction. In: Proceedings of the 2nd International Conference on Deep Learning Theory and Applications, pp. 148–156 (2021). https://doi.org/10.5220/0010559201480156. http://arxiv.org/abs/2111.09035
Krippendorff, K.: Reliability. In: Content Analysis: An Introduction to Its Methodology, Revised edition. Sage Publications Inc., Los Angeles, April 2012
Google Scholar
Lacoste, A., Luccioni, A., Schmidt, V., Dandres, T.: Quantifying the carbon emissions of machine learning. arXiv:1910.09700 [cs], November 2019
Landis, J.R., Koch, G.G.: The measurement of observer agreement for categorical data. Biometrics 33(1), 159 (1977). https://doi.org/10.2307/2529310. https://www.jstor.org/stable/2529310?origin=crossref
van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(86), 2579–2605 (2008). http://jmlr.org/papers/v9/vandermaaten08a.html
McLean, G., Osei-Frimpong, K.: Examining satisfaction with the experience during a live chat service encounter-implications for website providers. Comput. Hum. Behav. 76, 494–508 (2017). https://doi.org/10.1016/j.chb.2017.08.005. https://linkinghub.elsevier.com/retrieve/pii/S0747563217304727
Naseri, M., Zamani, H.: Analyzing and predicting news popularity in an instant messaging service. In: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1053–1056, July 2019. https://doi.org/10.1145/3331184.3331301
Newman, M.E.J.: Finding community structure in networks using the eigenvectors of matrices. Phys. Rev. E 74(3), 036104 (2006). https://doi.org/10.1103/PhysRevE.74.036104. http://arxiv.org/abs/physics/0605087
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12(85), 2825–2830 (2011). http://jmlr.org/papers/v12/pedregosa11a.html
Sang, E.F.T.K., De Meulder, F.: Introduction to the CoNLL-2003 shared task: language-independent named entity recognition. arXiv:cs/0306050, Jun 2003
Su, X., et al.: A comprehensive survey on community detection with deep learning. IEEE Trans. Neural Netw. Learn. Syst. 1–21 (2022). https://doi.org/10.1109/TNNLS.2021.3137396. https://ieeexplore.ieee.org/document/9732192/
Subhashini, L.D.C.S., Li, Y., Zhang, J., Atukorale, A.S., Wu, Y.: Mining and classifying customer reviews: a survey. Artif. Intell. Rev. 54(8), 6343–6389 (2021). https://doi.org/10.1007/s10462-021-09955-5
Article Google Scholar
Tsao, S.F., Chen, H., Tisseverasinghe, T., Yang, Y., Li, L., Butt, Z.A.: What social media told us in the time of COVID-19: a scoping review. Lancet Digit. Health 3(3), e175–e194 (2021). https://doi.org/10.1016/S2589-7500(20)30315-0. https://linkinghub.elsevier.com/retrieve/pii/S2589750020303150
Vajjala, S., Majumder, B., Gupta, A., Surana, H.: Social media. In: Practical Natural Language Processing. O’Reilly Media, Inc., June 2020. https://www.oreilly.com/library/view/practical-natural-language/9781492054047/
Wattenberg, M., Viégas, F., Johnson, I.: How to use t-SNE effectively. Distill 1(10), e2 (2016). https://doi.org/10.23915/distill.00002. http://distill.pub/2016/misread-tsne
Zhang, X., et al.: TwHIN-BERT: a socially-enriched pre-trained language model for multilingual tweet representations, September 2022. https://doi.org/10.48550/arXiv.2209.07562. http://arxiv.org/abs/2209.07562
Zhong, Z., Chen, D.: A frustratingly easy approach for entity and relation extraction. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 50–61 (2021). https://doi.org/10.18653/v1/2021.naacl-main.5. https://aclanthology.org/2021.naacl-main.5

Download references

Author information

Authors and Affiliations

Aachen University of Applied Sciences, 52066, Aachen, Germany
André Büsgen, Lars Klöser, Philipp Kohl, Oliver Schmidts & Bodo Kraft
University of Kassel, 34109, Kassel, Germany
Albert Zündorf

Authors

André Büsgen
View author publications
You can also search for this author in PubMed Google Scholar
Lars Klöser
View author publications
You can also search for this author in PubMed Google Scholar
Philipp Kohl
View author publications
You can also search for this author in PubMed Google Scholar
Oliver Schmidts
View author publications
You can also search for this author in PubMed Google Scholar
Bodo Kraft
View author publications
You can also search for this author in PubMed Google Scholar
Albert Zündorf
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to André Büsgen , Lars Klöser , Philipp Kohl , Oliver Schmidts , Bodo Kraft or Albert Zündorf .

Editor information

Editors and Affiliations

University of Calabria, Rende, Italy
Alfredo Cuzzocrea
Ford Motor Company, Commerce Township, MI, USA
Oleg Gusikhin
Siège du Groupe ESEO, Angers, France
Slimane Hammoudi
Hochschule Niederrhein, Krefeld, Nordrhein-Westfalen, Germany
Christoph Quix

Appendix

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Büsgen, A., Klöser, L., Kohl, P., Schmidts, O., Kraft, B., Zündorf, A. (2023). From Cracked Accounts to Fake IDs: User Profiling on German Telegram Black Market Channels. In: Cuzzocrea, A., Gusikhin, O., Hammoudi, S., Quix, C. (eds) Data Management Technologies and Applications. DATA DATA 2022 2021. Communications in Computer and Information Science, vol 1860. Springer, Cham. https://doi.org/10.1007/978-3-031-37890-4_9

Download citation

DOI: https://doi.org/10.1007/978-3-031-37890-4_9
Published: 23 July 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-37889-8
Online ISBN: 978-3-031-37890-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

From Cracked Accounts to Fake IDs: User Profiling on German Telegram Black Market Channels

Abstract

Access this chapter

Notes

References

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Appendix

Appendix

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation