Abstract
With the fast development of information technologies, more massive amounts of data are produced in cyberspace. Traditional web search methods cannot satisfy users’ demands timely and accurately, and it is an urgent task to develop big search techniques in cyberspace. MDATA (Multi-dimensional Data Association and Intelligent Analysis) is a knowledge representation model with temporal and spatial characteristics. Through the effective expression of temporal and spatial characteristics, it supports efficient updating of dynamic knowledge. Pattern matching is often used to extract the needed knowledge from massive data for constructing the MDATA. Pattern matching requires matching rules to acquire needed substrings from a string. In practical application scenarios, some matching rules can be divided into several categories. The same category of the matching rules has the same meaning, but with different expressions. Regular expressions can aggregate matching rules with consistent structure and strong regularity together. However, in practical scenarios such as cyber security knowledge, such homogeneous matching rules are rare, and most of them are random and disordered. For random matching rules, manually designing regular expressions to aggregate them becomes time consuming and laborious. In order to address the problem, we apply word embedding algorithm to automatic classifying matching rules. Word embedding is a kind of representation learning algorithms which is usually adopted in recommendation systems, relation mining, text similarity matching and so on. It can convert words into low-dimensional space vectors based on neural network models. However, word embedding algorithms take into account the relationship between semantic information and context, which needs a large number of data. When we only consider the matching rules in pattern matching, such data is insufficient to reflect the context relationship, which leads to the failure of deriving accurate results. In this chapter, we design an automatic classification method which only needs a small number of data to meet the practical requirement.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Jia, Y., Fang, B., Gu, Z., et al.: Network Security Situation Awareness. Electronic Industry Press (2020)
Friedl, J.E.F.: Mastering Regular Expressions - Powerful Techniques for Perl and Other Tools. Journal of the ACM (1997)
Mikolov, T.: Statistical language models based on neural networks. Brno University of Technology (2012)
Collobert, R., Weston, J.: A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of ICML (2008)
Mnih, A., Hinton, G.E.: A scalable hierarchical distributed language model. In: Proceedings of NIPS (2009)
Barkan, O.: Bayesian neural word embedding. arXiv preprint
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: International Conference on Learning Representations (2013)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of the 26th International Conference on Neural Information Processing Systems (NIPS 2013), vol. 2, pp. 3111–3119 (2013)
Rong, X.: Word2vec parameter learning explained. Comput. Sci. (2014)
Barkan, O., Koenigstein, N.: Item2vec: neural item embedding for collaborative filtering. In: MLSP (2016)
Perozzi, B., Al-Rfou, R., Skiena, S.: DeepWalk: online learning of social representations. In: KDD (2014)
Tang, J., Qu, M., Wang, M., Zhang, M., Yan, J., Mei, Q.: LINE - large-scale information network embedding. In: MSRA (2015)
Wang, D., Cui, P., Zhu, W.: Structural deep network embedding. In: KDD (2016)
Lilleberg, J., Zhu, Y., Zhang, Y.: Support vector machines and Word2vec for text classification with semantic features. In: KDD (2016)
Saunders, C., Stitson, M.O., Weston, J.: Support vector machine. Comput. Sci. 1(4), 1–28 (2002)
Bai, X., Chen, F., Zhan, S.: A study on sentiment computing and classification of Sina Weibo with Word2vec. IEEE (2014)
Treshansky, A., McGraw, R.: An overview of clustering algorithms. In: Proceedings of SPIE (2001)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Tang, Y., Wang, L., Chen, X., Gu, Z., Tian, Z. (2021). Knowledge Extraction: Automatic Classification of Matching Rules. In: Jia, Y., Gu, Z., Li, A. (eds) MDATA: A New Knowledge Representation Model. Lecture Notes in Computer Science(), vol 12647. Springer, Cham. https://doi.org/10.1007/978-3-030-71590-8_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-71590-8_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-71589-2
Online ISBN: 978-3-030-71590-8
eBook Packages: Computer ScienceComputer Science (R0)