Knowledge Extraction: Automatic Classification of Matching Rules

Tang, Yunyi; Wang, Le; Chen, Xiaolong; Gu, Zhaoquan; Tian, Zhihong

doi:10.1007/978-3-030-71590-8_7

Knowledge Extraction: Automatic Classification of Matching Rules

Yunyi Tang¹¹,
Le Wang¹¹,
Xiaolong Chen¹¹,
Zhaoquan Gu¹¹ &
…
Zhihong Tian¹¹

Chapter
First Online: 07 March 2021

639 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12647))

Abstract

With the fast development of information technologies, more massive amounts of data are produced in cyberspace. Traditional web search methods cannot satisfy users’ demands timely and accurately, and it is an urgent task to develop big search techniques in cyberspace. MDATA (Multi-dimensional Data Association and Intelligent Analysis) is a knowledge representation model with temporal and spatial characteristics. Through the effective expression of temporal and spatial characteristics, it supports efficient updating of dynamic knowledge. Pattern matching is often used to extract the needed knowledge from massive data for constructing the MDATA. Pattern matching requires matching rules to acquire needed substrings from a string. In practical application scenarios, some matching rules can be divided into several categories. The same category of the matching rules has the same meaning, but with different expressions. Regular expressions can aggregate matching rules with consistent structure and strong regularity together. However, in practical scenarios such as cyber security knowledge, such homogeneous matching rules are rare, and most of them are random and disordered. For random matching rules, manually designing regular expressions to aggregate them becomes time consuming and laborious. In order to address the problem, we apply word embedding algorithm to automatic classifying matching rules. Word embedding is a kind of representation learning algorithms which is usually adopted in recommendation systems, relation mining, text similarity matching and so on. It can convert words into low-dimensional space vectors based on neural network models. However, word embedding algorithms take into account the relationship between semantic information and context, which needs a large number of data. When we only consider the matching rules in pattern matching, such data is insufficient to reflect the context relationship, which leads to the failure of deriving accurate results. In this chapter, we design an automatic classification method which only needs a small number of data to meet the practical requirement.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Jia, Y., Fang, B., Gu, Z., et al.: Network Security Situation Awareness. Electronic Industry Press (2020)
Google Scholar
Friedl, J.E.F.: Mastering Regular Expressions - Powerful Techniques for Perl and Other Tools. Journal of the ACM (1997)
Google Scholar
Mikolov, T.: Statistical language models based on neural networks. Brno University of Technology (2012)
Google Scholar
Collobert, R., Weston, J.: A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of ICML (2008)
Google Scholar
Mnih, A., Hinton, G.E.: A scalable hierarchical distributed language model. In: Proceedings of NIPS (2009)
Google Scholar
Barkan, O.: Bayesian neural word embedding. arXiv preprint
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: International Conference on Learning Representations (2013)
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of the 26th International Conference on Neural Information Processing Systems (NIPS 2013), vol. 2, pp. 3111–3119 (2013)
Google Scholar
Rong, X.: Word2vec parameter learning explained. Comput. Sci. (2014)
Google Scholar
Barkan, O., Koenigstein, N.: Item2vec: neural item embedding for collaborative filtering. In: MLSP (2016)
Google Scholar
Perozzi, B., Al-Rfou, R., Skiena, S.: DeepWalk: online learning of social representations. In: KDD (2014)
Google Scholar
Tang, J., Qu, M., Wang, M., Zhang, M., Yan, J., Mei, Q.: LINE - large-scale information network embedding. In: MSRA (2015)
Google Scholar
Wang, D., Cui, P., Zhu, W.: Structural deep network embedding. In: KDD (2016)
Google Scholar
Lilleberg, J., Zhu, Y., Zhang, Y.: Support vector machines and Word2vec for text classification with semantic features. In: KDD (2016)
Google Scholar
Saunders, C., Stitson, M.O., Weston, J.: Support vector machine. Comput. Sci. 1(4), 1–28 (2002)
Google Scholar
Bai, X., Chen, F., Zhan, S.: A study on sentiment computing and classification of Sina Weibo with Word2vec. IEEE (2014)
Google Scholar
Treshansky, A., McGraw, R.: An overview of clustering algorithms. In: Proceedings of SPIE (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou, 510006, China
Yunyi Tang, Le Wang, Xiaolong Chen, Zhaoquan Gu & Zhihong Tian

Authors

Yunyi Tang
View author publications
You can also search for this author in PubMed Google Scholar
Le Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaolong Chen
View author publications
You can also search for this author in PubMed Google Scholar
Zhaoquan Gu
View author publications
You can also search for this author in PubMed Google Scholar
Zhihong Tian
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhaoquan Gu .

Editor information

Editors and Affiliations

Harbin Institute of Technology, Shenzhen, Shenzhen, China
Yan Jia
Guangzhou University, Guangzhou, China
Zhaoquan Gu
National University of Defense Technology, Changsha, China
Aiping Li

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Tang, Y., Wang, L., Chen, X., Gu, Z., Tian, Z. (2021). Knowledge Extraction: Automatic Classification of Matching Rules. In: Jia, Y., Gu, Z., Li, A. (eds) MDATA: A New Knowledge Representation Model. Lecture Notes in Computer Science(), vol 12647. Springer, Cham. https://doi.org/10.1007/978-3-030-71590-8_7

Download citation

DOI: https://doi.org/10.1007/978-3-030-71590-8_7
Published: 07 March 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-71589-2
Online ISBN: 978-3-030-71590-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics