Abstract
Data classification for distributed and heterogeneous XML data sources is always an open challenge. A considerable number of algorithms for classification of XML documents have been proposed in the literature. Yet, the existing approaches fall short in ability to classify the fuzzy XML documents. In this paper, we provide a KPCA-KELM classification framework for the fuzzy XML documents based on Kernel Extreme Learning Machine (KELM). Firstly, we propose a novel fuzzy XML document tree model to represent fuzzy XML documents. Secondly, we employ an effective vector space model to represent the semantic structure of fuzzy XML documents based on the proposed fuzzy XML document tree model. Thirdly, we classify the fuzzy XML document using KELM after feature extraction using Kernel Principal Component Analysis (KPCA). The corresponding experimental results demonstrate that our proposed KPCA-KELM approach shortens the training time while maintaining the same level of accuracy as the state-of-the-art baseline models.
Similar content being viewed by others
References
Abiteboul, S., Segoufin, L., & Vianu, V. (2006). Representing and querying XML with incomplete information. ACM Transactions on Database Systems, 31(1), 208–254.
Agreste, S., Meo, P. D., Ferrara, E., & Ursino, D. (2014). XML Matchers: approaches and challenges. Knowledge-Based Systems, 66, 190–209.
Blatman, G., & Sudret, B. (2011). Adaptive sparse polynomial chaos expansion based on least angle regression. Journal of Computational Physics, 230(6), 2345–2367.
Brzezinski, D., & Piernik, M. (2015). Structural XML classification in concept drifting data streams. New Generation Computing, 33(4), 345–366.
Dalamagas, T., Cheng, T., Winkel, K. J., et al. (2006). A methodology for clustering XML documents by structure. Information Systems, 31(3), 187–228.
Fletcher, R. (1981). Practical methods of optimization. Constrained Optim., 2.
Gaurav A, Alhajj R (2006) Incorporating fuzziness in XML and mapping fuzzy relational data into fuzzy XML. In: Proceedings of the 2006 ACM symposium on applied computing, ACM, Dijon, pp. 456–460
Guha, S., Jagadish, H. V., Koudas, N., & Srivastava, D. (2006). Integrating XML data sources using approximate joins. ACM Transactions on Database Systems, 31(1), 161–207.
Gupta, P., Chauhan, S., & Jaiswal, M. P. (2019). Classification of smart city research - a descriptive literature review and future research agenda. Information Systems Frontiers, 21(3), 661–685.
Huang, G. B. (2014). An insight into extreme learning machines: Random neurons, random features and kernels. Cognitive Computation, 6(3), 376–390.
Huang, G. B., & Chen, L. (2007). Convex incremental extreme learning machine. Neurocomputing, 70(16), 3056–3062.
Huang, G., Song, S., Gupta, J. N. D., & Wu, C. (2014). Semi-supervised and unsupervised extreme learning machines. IEEE Trans. Cybern., 44(12), 2405–2417.
Huang, S., Wang, B., et al. (2016). Parallel ensemble of online sequential extreme learning machine based on MapReduce. Neurocomputing, 174, 352–367.
Huang, G. B., Zhou, H., Ding, X., & Zhang, R. (2012). Extreme learning machine for regression and multiclass classification. IEEE Trans. Syst. Man Cybern. Part B, 42(2), 513–529.
Huang, G. B., Zhu, Q. Y., & Siew, C. K. (2006). Extreme learning machine: Theory and applications. Neurocomputing, 70(1), 489–501.
Iosifidis, A., Tefas, A., & Pitas, I. (2015). On the kernel extreme learning machine classifier. Pattern Recognition Letters, 54, 11–17.
Kamgar-Parsi, B., & Kanal, L. N. (2010). An improved branch and bound algorithm for computing k-nearest neighbors. Pattern Recognition Letters, 3(1), 7–12.
Li, T., & Ma, Z. M. (2017). Object-stack: an object-oriented approach for top-k keyword querying over fuzzy xml. Information Systems Frontiers, 19(3), 669–697.
Ma, Z. M., & Yan, L. (2007). Fuzzy XML data modeling with the UML and relational data models. Data & Knowledge Engineering, 63, 972–996.
A.G. Maguitman, F. Menczer, H. Roinestad, et al., (2005) Algorithmic detection of semantic similarity. In: Proc. of the 14th International Conference on World Wide Web, ACM, Chiba, pp. 107–116.
Negoita, C., Zadeh, L. A., & Zimmermann, H. (1978). Fuzzy sets as a basis for a theory of possibility. Fuzzy Sets and Systems, 1, 3–28.
Nierrman, A., & Jagadish, H. V. (2002). ProTDB: Probabilistic data in XML, in: Proceedings of the 28th international conference on vary large data bases (pp. 646–657). Hong Kong: VLDB Endowment.
Oliboni, B., Pozzani, G. (2008) Representing fuzzy information by using XML schema, in: Proceedings of the 19th international conference on database and expert systems application, Turin, pp. 683–687
Paliwal, M., & Kumar, U. A. (2009). Neural networks and statistical techniques: A review of applications. Expert Systems with Applications an International Journal, 36(1), 2–17.
Palshikar, G. K., Apte, M., & Pandita, D. (2018). Weakly supervised and online learning of word models for classification to detect disaster reporting tweets. Information Systems Frontiers, 20(5), 949–959.
L. Ribeiro, T. Härder (2006) Entity identification in XML documents. In: 18th GI-Workshop on the Foundations of Databases, pp. 130–134.
Salton, G., & McGill, M. (1983). Introduction to modern information retrieval. New York: McGrawHill Book Company.
Schölkopf, B., Smola, A., & Müller, K. R. (1998). Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10(5), 1299–1319.
Suykens, J., & Vandewalle, J. (1999). Least squares support vector machine classifiers. Neural Processing Letters, 9(3), 293–300.
Tang, J., Deng, C., & Huang, G. B. (2016). Extreme Learning Machine for Multilayer Perceptron. IEEE Transactions on Neural Networks & Learning Systems, 27(4), 809–821.
Tekli, J., & Chbeir, R. (2012). A novel XML document structure comparison framework based-on sub-tree commonalities and label semantics. Web Semantics: Science, Services and Agents on the World Wide Web, 11(3), 14–40.
Tekli, J., Chbeir, R., et al. (2015). Approximate XML structure validation based on document-grammar tree similarity. Information Sciences, 295, 258–302.
Thasleena, N. T., & Varghese, S. C. (2015). Enhanced associative classification of XML documents supported by semantic concepts. Procedia Computer Science, 46, 194–201.
Thomo, A.., Venkatesh, S. (2008) Rewriting of visibly pushdown languages for xml data integration. In: Proc. of the 17th ACM Conference on Information and Knowledge Management, ACM, Napa Valley, pp. 521–530
Turowski, K., & Weng, U. (2002). Representing and processing fuzzy information-an XML-based approach. Knowledge-Based Systems, 15(1), 67–75.
Yan, L., Ma, Z. M., & Liu, J. (2009). Fuzzy data modeling based on XML schema, in: Proceedings of 2009 ACM symposium on applied computing (pp. 1563–1567). Honolulu: ACM.
Yang, J., & Chen, X. (2002). A semi-structured document model for text mining. Journal of Computer Science and Technology, 17(5), 603–610.
Zhang, X. L., Yang, T., Fan, B. Q., et al. (2012). A Novel Method for Measuring Structure and Semantic Similarity of XML Documents Based on Extended Adjacency Matrix. Physics Procedia, 24, 1452–1461.
Zhao, X., Bi, X., et al. (2016). Uncertain XML documents classification using extreme learning machine. Neurocomputing, 174, 375–382.
Zhao, Z., Ma, Z. M., Zhang, F., et al. (2017). Classification of fuzzy XML documents based on double hidden layer ELM. Computer Engineering and Applications, 53(4), 19–24.
Zhao, X., Wang, G., Bi, X., et al. (2011). XML document classification based on ELM. Neurocomputing, 74(16), 2444–2451.
Acknowledgements
The authors wish to thank the anonymous referees for their valuable comments and suggestions, which improved the technical content and the presentation of the paper. This work was supported by the National Natural Science Foundation of China (61772269, 61370075 & 61976027) and the Scientific Research Projects of Liaoning Educational Committee (LQ2017003).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zhao, Z., Ma, Z. & Yan, L. An Efficient Classification of Fuzzy XML Documents Based on Kernel ELM. Inf Syst Front 23, 515–530 (2021). https://doi.org/10.1007/s10796-019-09973-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10796-019-09973-3