Skip to main content

Knowledge Extraction: Automatic Classification of Matching Rules

  • Chapter
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12647))

Abstract

With the fast development of information technologies, more massive amounts of data are produced in cyberspace. Traditional web search methods cannot satisfy users’ demands timely and accurately, and it is an urgent task to develop big search techniques in cyberspace. MDATA (Multi-dimensional Data Association and Intelligent Analysis) is a knowledge representation model with temporal and spatial characteristics. Through the effective expression of temporal and spatial characteristics, it supports efficient updating of dynamic knowledge. Pattern matching is often used to extract the needed knowledge from massive data for constructing the MDATA. Pattern matching requires matching rules to acquire needed substrings from a string. In practical application scenarios, some matching rules can be divided into several categories. The same category of the matching rules has the same meaning, but with different expressions. Regular expressions can aggregate matching rules with consistent structure and strong regularity together. However, in practical scenarios such as cyber security knowledge, such homogeneous matching rules are rare, and most of them are random and disordered. For random matching rules, manually designing regular expressions to aggregate them becomes time consuming and laborious. In order to address the problem, we apply word embedding algorithm to automatic classifying matching rules. Word embedding is a kind of representation learning algorithms which is usually adopted in recommendation systems, relation mining, text similarity matching and so on. It can convert words into low-dimensional space vectors based on neural network models. However, word embedding algorithms take into account the relationship between semantic information and context, which needs a large number of data. When we only consider the matching rules in pattern matching, such data is insufficient to reflect the context relationship, which leads to the failure of deriving accurate results. In this chapter, we design an automatic classification method which only needs a small number of data to meet the practical requirement.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   84.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Jia, Y., Fang, B., Gu, Z., et al.: Network Security Situation Awareness. Electronic Industry Press (2020)

    Google Scholar 

  2. Friedl, J.E.F.: Mastering Regular Expressions - Powerful Techniques for Perl and Other Tools. Journal of the ACM (1997)

    Google Scholar 

  3. Mikolov, T.: Statistical language models based on neural networks. Brno University of Technology (2012)

    Google Scholar 

  4. Collobert, R., Weston, J.: A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of ICML (2008)

    Google Scholar 

  5. Mnih, A., Hinton, G.E.: A scalable hierarchical distributed language model. In: Proceedings of NIPS (2009)

    Google Scholar 

  6. Barkan, O.: Bayesian neural word embedding. arXiv preprint

    Google Scholar 

  7. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: International Conference on Learning Representations (2013)

    Google Scholar 

  8. Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of the 26th International Conference on Neural Information Processing Systems (NIPS 2013), vol. 2, pp. 3111–3119 (2013)

    Google Scholar 

  9. Rong, X.: Word2vec parameter learning explained. Comput. Sci. (2014)

    Google Scholar 

  10. Barkan, O., Koenigstein, N.: Item2vec: neural item embedding for collaborative filtering. In: MLSP (2016)

    Google Scholar 

  11. Perozzi, B., Al-Rfou, R., Skiena, S.: DeepWalk: online learning of social representations. In: KDD (2014)

    Google Scholar 

  12. Tang, J., Qu, M., Wang, M., Zhang, M., Yan, J., Mei, Q.: LINE - large-scale information network embedding. In: MSRA (2015)

    Google Scholar 

  13. Wang, D., Cui, P., Zhu, W.: Structural deep network embedding. In: KDD (2016)

    Google Scholar 

  14. Lilleberg, J., Zhu, Y., Zhang, Y.: Support vector machines and Word2vec for text classification with semantic features. In: KDD (2016)

    Google Scholar 

  15. Saunders, C., Stitson, M.O., Weston, J.: Support vector machine. Comput. Sci. 1(4), 1–28 (2002)

    Google Scholar 

  16. Bai, X., Chen, F., Zhan, S.: A study on sentiment computing and classification of Sina Weibo with Word2vec. IEEE (2014)

    Google Scholar 

  17. Treshansky, A., McGraw, R.: An overview of clustering algorithms. In: Proceedings of SPIE (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhaoquan Gu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Tang, Y., Wang, L., Chen, X., Gu, Z., Tian, Z. (2021). Knowledge Extraction: Automatic Classification of Matching Rules. In: Jia, Y., Gu, Z., Li, A. (eds) MDATA: A New Knowledge Representation Model. Lecture Notes in Computer Science(), vol 12647. Springer, Cham. https://doi.org/10.1007/978-3-030-71590-8_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-71590-8_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-71589-2

  • Online ISBN: 978-3-030-71590-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics