Skip to main content
Log in

Owner named entity recognition in website based on multidimensional text guidance and space alignment co-attention

  • Regular Paper
  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract

In recent research, the task of Owner Named Entity Recognition (ONER) in websites has been proposed as a specific and practical application of Multimodal Named Entity Recognition (MNER). The ONER aims to identify the true owner of websites on the Internet, which plays a crucial role in network security. The existing method involves identifying the website owner’s name through the text, image, and domain in the content of the website, where the owner information usually appears. However, most of the previous methods simply extracted features from the image and the domain as two independent modalities and did not fully utilize the text information in them. Additionally, these methods do not consider that different modality features are trained on their respective modality space, which makes it difficult to model cross-modal interactions due to different feature spaces. To address these two issues, this paper proposes a Multidimensional Text Guidance and Space Alignment Co-Attention (MTGSAC) model to realize owner named entity recognition in websites. The MTGSAC model can utilize the text information in the image and the domain modalities to guide the text modality for features extraction. Meanwhile, the model designs a features fusion module based on Transformer and co-attention gate mechanism to effectively model cross-modal interactions. Furthermore, to address the problems of insufficient data samples and poor data diversity in the existing ONER dataset, we extended the ONER dataset and proposed the ONER-2.0 dataset. Experimental results on both the ONER and ONER-2.0 datasets show that our model achieves state-of-the-art performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Data availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author or the first author on reasonable request.

References

  1. Li, G., Yanan, C., Majing, S., Yanmin, S., Yujia, Z., Peng, Z., Chuan, Z.: Cyberspace resources surveying and mapping: the concepts and technologies. J. Cyber Secur. 3(4), 1 (2018)

    Google Scholar 

  2. Ren, Y., Li, H., Liu, P., Liu, J., Zhu, H., Sun, L.: Owner name entity recognition in websites based on multiscale features and multimodal co-attention. Expert Syst. Appl. 224, 120014 (2023)

    Article  Google Scholar 

  3. Ruiz-Sánchez, M.Á., Biersack, E.W., Dabbous, W.: Survey and taxonomy of ip address lookup algorithms. IEEE Netw. 15(2), 8–23 (2001)

    Article  Google Scholar 

  4. Fiebig, T., Borgolte, K., Hao, S., Kruegel, C., Vigna, G.: Something from nothing (there): collecting global ipv6 datasets from dns. In: Passive and Active Measurement: 18th International Conference, PAM 2017, Sydney, NSW, Australia, March 30-31, 2017, Proceedings 18, pp. 30–43. Springer (2017)

  5. Moon, S., Neves, L., Carvalho, V.: Multimodal named entity recognition for short social media posts. In: Proceedings of the 2018 Conference of the North Ameri- can Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1 (Long Papers), pp. 852–860 (2018)

  6. Zhang, Q., Fu, J., Liu, X., Huang, X.: Adaptive co-attention network for named entity recognition in tweets. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)

  7. Yu, J., Jiang, J., Yang, L., Xia, R.: Improving multimodal named entity recognition via entity span detection with unified multimodal transformer. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 3342–3352 (2020)

  8. Wang, X., Gui, M., Jiang, Y., Jia, Z., Bach, N., Wang, T., Huang, Z., Tu, K.: ITA: Image-text alignments for multi-modal named entity recognition. In: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 3176–3189 (2022)

  9. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 6000–6010 (2017)

  10. Hou, Y., Chen, X., Hao, Y., Shi, Z., Yang, S.: Survey of cyberspace resources scanning and analyzing. In: Innovative Mobile and Internet Services in Ubiquitous Computing: Proceedings of the 14th International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing (IMIS-2020), pp. 279–291. Springer (2021)

  11. Daigle, L.: Whois protocol specification. Rfc 49(8), 756–757 (2004)

  12. Romero-Gomez, R., Nadji, Y., Antonakakis, M.: Towards designing effective visualizations for dns-based network threat analysis. In: 2017 IEEE Symposium on Visualization for Cyber Security (VizSec), pp. 1–8 (2017)

  13. Wang, Y., Wang, X., Zhu, H., Zhao, H., Li, H., Sun, L.: One-geo: client-independent ip geolocation based on owner name extraction. In: Wireless Algorithms, Systems, and Applications: 14th International Conference, WASA 2019, Honolulu, HI, USA, June 24–26, 2019, Proceedings 14, pp. 346–357. Springer (2019)

  14. Wang, Y., Burgener, D., Flores, M., Kuzmanovic, A., Huang, C.: Towards street-level client-independent ip geolocation. Nsdi 11, 27 (2011)

  15. Arshad, O., Gallo, I., Nawaz, S., Calefati, A.: Aiding intra-text representations with visual context for multimodal named entity recognition. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 337–342 (2019)

  16. Sun, L., Wang, J., Su, Y., Weng, F., Sun, Y., Zheng, Z., Chen, Y.: Riva: a pre-trained tweet multimodal model based on text-image relation for multimodal ner. In: Proceedings of the 28th International Conference on Computational Linguistics, pp. 1852–1862 (2020)

  17. Chen, D., Li, Z., Gu, B., Chen, Z.: Multimodal named entity recognition with image attributes and image knowledge. In: Database Systems for Advanced Applications: 26th International Conference, DASFAA 2021, Taipei, Taiwan, April 11–14, 2021, Proceedings, Part II 26, pp. 186–201. Springer (2021)

  18. Baltrušaitis, T., Ahuja, C., Morency, L.-P.: Multimodal machine learning: a survey and taxonomy. IEEE Trans. Pattern Anal. Mach. Intell. 41(2), 423–443 (2018)

    Article  Google Scholar 

  19. Wang, D., Mao, K.: Learning semantic text features for web text-aided image classification. IEEE Trans. Multimed. 21(12), 2985–2996 (2019)

    Article  Google Scholar 

  20. Su, W., Zhu, X., Cao, Y., Li, B., Lu, L., Wei, F., Dai, J.: Vl-bert: Pre-training of generic visual-linguistic representations. In: International Conference on Learning Representations (2020)

  21. Xu, B., Huang, S., Sha, C., Wang, H.: Maf: a general matching and alignment framework for multimodal named entity recognition. In: Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining, pp. 1215–1223 (2022)

  22. Gao, T., Yao, X., Chen, D.: SimCSE: Simple contrastive learning of sentence embeddings. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 6894–6910 (2021)

  23. Tjong Kim Sang, E.F., Veenstra, J.: Representing text chunks. In: Ninth Conference of the European Chapter of the Association for Computational Linguistics, Bergen, Norway, pp. 173–179 (1999)

  24. Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1 (Long and Short Papers), pp. 4171–4186 (2019)

  25. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  26. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

  27. Liu, P., Wang, G., Li, H., Liu, J., Ren, Y., Zhu, H., Sun, L.: Multi-granularity cross-modality representation learning for named entity recognition on social media. arXiv preprint arXiv:2210.14163 (2022)

  28. Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of the Eighteenth International Conference on Machine Learning, pp. 282–289 (2001)

  29. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (2015)

  30. Sun, C., Shrivastava, A., Singh, S., Gupta, A.: Revisiting unreasonable effectiveness of data in deep learning era. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 843–852 (2017)

Download references

Acknowledgements

This work was supported by the Major Science and Technology Special Project of Henan Province (No.201300210400), the Science and Technology Department of Henan Province (No. 222102520006).

Funding

This article is funded by Major Science and Technology Special Project of Henan Province (No. 201300210400); the Science and Technology Department of Henan Province (No. 222102520006).

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the study’s conception and design. The first draft of the manuscript was prepared by XZ, while XH was responsible for reviewing and editing the paper. YR and XZ carried out the data collection and analysis for the study. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Xin He or Yimo Ren.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Communicated by B. Bao.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zheng, X., He, X., Ren, Y. et al. Owner named entity recognition in website based on multidimensional text guidance and space alignment co-attention. Multimedia Systems 29, 3757–3770 (2023). https://doi.org/10.1007/s00530-023-01170-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00530-023-01170-2

Keywords

Navigation