Skip to main content

Parallel and Distributed Architecture for Multilingual Open Source Intelligence Systems

  • Conference paper
  • First Online:
Software Architecture. ECSA 2023 Tracks, Workshops, and Doctoral Symposium (ECSA 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14590))

Included in the following conference series:

  • 231 Accesses

Abstract

The proliferation of publicly available information across multiple languages presents both unique challenges and opportunities for Open Source Intelligence (OSINT) systems. This paper proposes a novel architecture for multilingual OSINT that is both parallel and distributed. The architecture integrates language identification and translation capabilities, enabling it to handle linguistically diverse data by transforming it into a unified format for efficient analysis. Designed specifically to address the challenges of parallel and distributed processing in OSINT systems, this architecture aims to offer scalability and performance benefits when dealing with massive data volumes. Our primary focus has been on devising strategies and tactics that address these concerns, providing a robust solution for the collection, processing and analysis of data in various languages. This work marks a significant step towards the development of more globally inclusive OSINT systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Amdahl, G.M.: Validity of the single processor approach to achieving large scale computing capabilities. In: Proceedings of the April 18–20, 1967, Spring Joint Computer Conference, pp. 483–485 (1967)

    Google Scholar 

  2. Bahrami, M., Singhal, M., Zhuang, Z.: A cloud-based web crawler architecture. In: 2015 18th International Conference on Intelligence in Next Generation Networks, pp. 216–223. IEEE (2015)

    Google Scholar 

  3. Barbaresi, A.: Trafilatura: a web scraping library and command-line tool for text discovery and extraction. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations, pp. 122–131 (2021)

    Google Scholar 

  4. Bevendorff, J., Gupta, S., Kiesel, J., Stein, B.: An empirical comparison of web content extraction algorithms (2023)

    Google Scholar 

  5. Celery (2023). https://docs.celeryq.dev/en/stable/userguide/workers.html. Accessed 17 May 2023

  6. Coleman, S., Secker, A., Bawden, R., Haddow, B., Birch, A.: Architecture of a scalable, secure and resilient translation platform for multilingual news media. In: 1st International Workshop on Language Technology Platforms, pp. 16–21 (2020)

    Google Scholar 

  7. FastText (2023). https://fasttext.cc. Accessed 17 May 2023

  8. Heydon, A., Najork, M.: Mercator: a scalable, extensible web crawler. World Wide Web 2(4), 219–229 (1999). https://doi.org/10.1023/A:1019213109274

    Article  Google Scholar 

  9. Kiczales, G., Hilsdale, E., Hugunin, J., Kersten, M., Palm, J., Griswold, W.G.: An overview of AspectJ. In: Knudsen, J.L. (ed.) ECOOP 2001. LNCS, vol. 2072, pp. 327–354. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-45337-7_18

    Chapter  Google Scholar 

  10. Klein, G., Kim, Y., Deng, Y., Senellart, J., Rush, A.M.: OpenNMT: open-source toolkit for neural machine translation. arXiv preprint arXiv:1701.02810 (2017)

  11. Nutch (2023). https://nutch.apache.org. Accessed 09 Dec 2023

  12. PYCLD2 (2023). https://github.com/aboSamoor/pycld2. Accessed 17 May 2023

  13. Quoc, D.L., Fetzer, C., Felber, P., Rivière, , Schiavoni, V., Sutra, P.: UniCrawl: a practical geographically distributed web crawler. In: 2015 IEEE 8th International Conference on Cloud Computing, pp. 389–396 (2015). https://doi.org/10.1109/CLOUD.2015.59

  14. Ranade, P., Mittal, S., Joshi, A., Joshi, K.: Using deep neural networks to translate multi-lingual threat intelligence. In: 2018 IEEE International Conference on Intelligence and Security Informatics (ISI), pp. 238–243. IEEE (2018)

    Google Scholar 

  15. Redis (2023). https://redis.io. Accessed 17 May 2023

  16. Scrapy (2023). https://scrapy.org. Accessed 17 May 2023

  17. Splash (2023). https://splash.readthedocs.io/en/stable. Accessed 17 May 2023

  18. Steinberger, R., Ehrmann, M., Pajzs, J., Ebrahim, M., Steinberger, J., Turchi, M.: Multilingual media monitoring and text analysis – challenges for highly inflected languages. In: Habernal, I., Matoušek, V. (eds.) TSD 2013. LNCS (LNAI), vol. 8082, pp. 22–33. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40585-3_3

    Chapter  Google Scholar 

  19. Wolf, T., et al.: HuggingFace’s transformers: state-of-the-art natural language processing. arXiv preprint arXiv:1910.03771 (2019)

  20. Yang, D., Thiengburanathum, P.: Scalability and robustness testing for open source web crawlers. In: 2021 Joint International Conference on Digital Arts, Media and Technology with ECTI Northern Section Conference on Electrical, Electronics, Computer and Telecommunication Engineering, pp. 197–201. IEEE (2021)

    Google Scholar 

  21. Zavarella, V., Tanev, H., Linge, J., Piskorski, J., Atkinson, M., Steinberger, R.: Exploiting multilingual grammars and machine learning techniques to build an event extraction system for Portuguese. In: Pardo, T.A.S., Branco, A., Klautau, A., Vieira, R., de Lima, V.L.S. (eds.) PROPOR 2010. LNCS (LNAI), vol. 6001, pp. 21–24. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12320-7_3

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alper Karamanlioglu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Karamanlioglu, A., Yurtalan, G., Karatas, Y.B. (2024). Parallel and Distributed Architecture for Multilingual Open Source Intelligence Systems. In: Tekinerdoğan, B., Spalazzese, R., Sözer, H., Bonfanti, S., Weyns, D. (eds) Software Architecture. ECSA 2023 Tracks, Workshops, and Doctoral Symposium. ECSA 2023. Lecture Notes in Computer Science, vol 14590. Springer, Cham. https://doi.org/10.1007/978-3-031-66326-0_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-66326-0_27

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-66325-3

  • Online ISBN: 978-3-031-66326-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics