Abstract
With the continuous development of information technology, massive information processing has become an important problem in business systems. However, the metadata information from different business systems lacks a unified and standardized description method. Mapping data by the manual way greatly reduces the efficiency. Therefore, an automated data mapping method is very necessary. In this paper, we regard data mapping as a text classification problem based on the following reasons: 1) the text classification technology has become more and more mature in the field of the natural language processing (NLP), which is very suitable for processing massive data; 2) a large number of heterogeneous mapping data can be treated as text. In order to implement automated data mapping, in this paper, we propose a classification model based on FastText and long-short term memory (LSTM) for data mapping in business systems. By observing the characteristics of mapping data in business systems, we firstly use FastText to learn word representation containing semantic information, and then adopt the LSTM model to extract features for text classification automatically. Experimental results show that the proposed method can automatically classify mapping data in business systems with common quality.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
El-Sappagh, S.H.A., Hendawi, A.M.A., El Bastawissy, A.H.: A proposed model for data warehouse ETL processes. J. King Saud Univ.-Comput. Inf. Sci. 23(2), 91–104 (2011)
Sreemathy, J., Nisha, S., Gokula, P.R.M.: Data integration in ETL using TALEND. In: 2020 6th International Conference on Advanced Computing and Communication Systems (ICACCS), pp. 1444–1448. IEEE (2020)
Zhang, W., Tang, X., Yoshida, T.: Text classification with support vector machine and back propagation neural network. In: Shi, Y., van Albada, G.D., Dongarra, J., Sloot, P.M.A. (eds.) Computational Science – ICCS 2007. ICCS 2007. Lecture Notes in Computer Science, vol. 4490. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-72590-9_21
Paccanaro, A., Hinton, G.E.: Learning distributed representations of concepts using linear relational embedding. IEEE Trans. Knowl. Data Eng. 13(2), 232–244 (2001)
Kuang, Q., Xu, X.: Improvement and application of TF•IDF method based on text classification. In: 2010 International Conference on Internet Technology and Applications, pp. 1–4. IEEE (2010)
Zhang, X., Wu, B.: Short text classification based on feature extension using the n-gram model. In: 2015 12th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD), pp. 710–716. IEEE (2015)
Bengio, Y.: Neural net language models. Scholarpedia 3(1), 3881 (2008)
Mikolov, T., Chen, K., Corrado, G., et al.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Partridge, C., Mitchell, A., Cook, A., et al.: A survey of top-level ontologies-to inform the ontological choices for a foundation data model (2020)
Lilleberg, J., Zhu, Y., Zhang, Y.: Support vector machines and word2vec for text classification with semantic features. In: 2015 IEEE 14th International Conference on Cognitive Informatics & Cognitive Computing (ICCI* CC), pp. 136–140. IEEE (2015)
Joulin, A., Grave, E., Bojanowski, P., et al.: Fasttext. zip: compressing text classification models. arXiv preprint arXiv:1612.03651 (2016)
Zhang, Y., Yuan, H., Wang, J., et al.: YNU-HPCC at EmoInt-2017: using a CNN-LSTM model for sentiment intensity prediction. In: Proceedings of the 8th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pp. 200–204 (2017)
Li, Y., Wang, X., Xu, P.: Chinese text classification model based on deep learning. Future Internet 10(11), 113 (2018)
Huang, G., Liu, Z., Van Der Maaten, L., et al.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017)
Chen, L., Li, J.: Text feature selection methods based on word vector. J. Chin. Comput. Syst. 39(5), 991–994 (2018)
Liu, H., Yin, Q., Wang, W.Y.: Towards explainable NLP: a generative explanation framework for text classification. arXiv preprint arXiv:1811.00196 (2018)
Liang, J., Chai, Y., Yuan, H., et al.: Emotional analysis based on polarity transfer and LSTM recursive network. J. Chin. Inf. Sci 29(5), 152–159 (2015)
Lu, C., Huang, H., Jian, P., Wang, D., Guo, Y.D.: A P-LSTM neural network for sentiment classification. In: Kim, J., Shim, K., Cao, L., Lee, JG., Lin, X., Moon, YS. (eds.) Advances in Knowledge Discovery and Data Mining. PAKDD 2017. Lecture Notes in Computer Science, vol. 10234. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-57454-7_41
Yang, M., Qu, Q., Chen, X., et al.: Feature-enhanced attention network for target-dependent sentiment classification. Neurocomputing 307, 91–97 (2018)
Bojanowski, P., Grave, E., Joulin, A., et al.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017)
Acknowledgements
This paper is supported by the Shenzhen Development and Reform Commission subject (XMHT20200105010).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Liu, Z., Hu, H. (2022). Automated Data Mapping Based on FastText and LSTM for Business Systems. In: Yang, Y., Wang, X., Zhang, LJ. (eds) Cognitive Computing – ICCC 2022. ICCC 2022. Lecture Notes in Computer Science, vol 13734. Springer, Cham. https://doi.org/10.1007/978-3-031-23585-6_7
Download citation
DOI: https://doi.org/10.1007/978-3-031-23585-6_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-23584-9
Online ISBN: 978-3-031-23585-6
eBook Packages: Computer ScienceComputer Science (R0)