Skip to main content
Log in

An Efficient Classification of Fuzzy XML Documents Based on Kernel ELM

  • Published:
Information Systems Frontiers Aims and scope Submit manuscript

Abstract

Data classification for distributed and heterogeneous XML data sources is always an open challenge. A considerable number of algorithms for classification of XML documents have been proposed in the literature. Yet, the existing approaches fall short in ability to classify the fuzzy XML documents. In this paper, we provide a KPCA-KELM classification framework for the fuzzy XML documents based on Kernel Extreme Learning Machine (KELM). Firstly, we propose a novel fuzzy XML document tree model to represent fuzzy XML documents. Secondly, we employ an effective vector space model to represent the semantic structure of fuzzy XML documents based on the proposed fuzzy XML document tree model. Thirdly, we classify the fuzzy XML document using KELM after feature extraction using Kernel Principal Component Analysis (KPCA). The corresponding experimental results demonstrate that our proposed KPCA-KELM approach shortens the training time while maintaining the same level of accuracy as the state-of-the-art baseline models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. http://www.ntu.edu.sg/home/egbhuang/

  2. http://www.cs.washington.edu/research/xmldatasets/

  3. http://abcnews.go.com/

  4. http://www.ibm.com/developerworks/develop/

  5. http://wikipedia.c3sl.ufpr.br/

References

  • Abiteboul, S., Segoufin, L., & Vianu, V. (2006). Representing and querying XML with incomplete information. ACM Transactions on Database Systems, 31(1), 208–254.

    Article  Google Scholar 

  • Agreste, S., Meo, P. D., Ferrara, E., & Ursino, D. (2014). XML Matchers: approaches and challenges. Knowledge-Based Systems, 66, 190–209.

    Article  Google Scholar 

  • Blatman, G., & Sudret, B. (2011). Adaptive sparse polynomial chaos expansion based on least angle regression. Journal of Computational Physics, 230(6), 2345–2367.

    Article  Google Scholar 

  • Brzezinski, D., & Piernik, M. (2015). Structural XML classification in concept drifting data streams. New Generation Computing, 33(4), 345–366.

    Article  Google Scholar 

  • Dalamagas, T., Cheng, T., Winkel, K. J., et al. (2006). A methodology for clustering XML documents by structure. Information Systems, 31(3), 187–228.

    Article  Google Scholar 

  • Fletcher, R. (1981). Practical methods of optimization. Constrained Optim., 2.

  • Gaurav A, Alhajj R (2006) Incorporating fuzziness in XML and mapping fuzzy relational data into fuzzy XML. In: Proceedings of the 2006 ACM symposium on applied computing, ACM, Dijon, pp. 456–460

  • Guha, S., Jagadish, H. V., Koudas, N., & Srivastava, D. (2006). Integrating XML data sources using approximate joins. ACM Transactions on Database Systems, 31(1), 161–207.

    Article  Google Scholar 

  • Gupta, P., Chauhan, S., & Jaiswal, M. P. (2019). Classification of smart city research - a descriptive literature review and future research agenda. Information Systems Frontiers, 21(3), 661–685.

    Article  Google Scholar 

  • Huang, G. B. (2014). An insight into extreme learning machines: Random neurons, random features and kernels. Cognitive Computation, 6(3), 376–390.

    Article  Google Scholar 

  • Huang, G. B., & Chen, L. (2007). Convex incremental extreme learning machine. Neurocomputing, 70(16), 3056–3062.

    Article  Google Scholar 

  • Huang, G., Song, S., Gupta, J. N. D., & Wu, C. (2014). Semi-supervised and unsupervised extreme learning machines. IEEE Trans. Cybern., 44(12), 2405–2417.

    Article  Google Scholar 

  • Huang, S., Wang, B., et al. (2016). Parallel ensemble of online sequential extreme learning machine based on MapReduce. Neurocomputing, 174, 352–367.

    Article  Google Scholar 

  • Huang, G. B., Zhou, H., Ding, X., & Zhang, R. (2012). Extreme learning machine for regression and multiclass classification. IEEE Trans. Syst. Man Cybern. Part B, 42(2), 513–529.

    Article  Google Scholar 

  • Huang, G. B., Zhu, Q. Y., & Siew, C. K. (2006). Extreme learning machine: Theory and applications. Neurocomputing, 70(1), 489–501.

    Article  Google Scholar 

  • Iosifidis, A., Tefas, A., & Pitas, I. (2015). On the kernel extreme learning machine classifier. Pattern Recognition Letters, 54, 11–17.

    Article  Google Scholar 

  • Kamgar-Parsi, B., & Kanal, L. N. (2010). An improved branch and bound algorithm for computing k-nearest neighbors. Pattern Recognition Letters, 3(1), 7–12.

    Article  Google Scholar 

  • Li, T., & Ma, Z. M. (2017). Object-stack: an object-oriented approach for top-k keyword querying over fuzzy xml. Information Systems Frontiers, 19(3), 669–697.

    Article  Google Scholar 

  • Ma, Z. M., & Yan, L. (2007). Fuzzy XML data modeling with the UML and relational data models. Data & Knowledge Engineering, 63, 972–996.

    Article  Google Scholar 

  • A.G. Maguitman, F. Menczer, H. Roinestad, et al., (2005) Algorithmic detection of semantic similarity. In: Proc. of the 14th International Conference on World Wide Web, ACM, Chiba, pp. 107–116.

  • Negoita, C., Zadeh, L. A., & Zimmermann, H. (1978). Fuzzy sets as a basis for a theory of possibility. Fuzzy Sets and Systems, 1, 3–28.

    Article  Google Scholar 

  • Nierrman, A., & Jagadish, H. V. (2002). ProTDB: Probabilistic data in XML, in: Proceedings of the 28th international conference on vary large data bases (pp. 646–657). Hong Kong: VLDB Endowment.

    Google Scholar 

  • Oliboni, B., Pozzani, G. (2008) Representing fuzzy information by using XML schema, in: Proceedings of the 19th international conference on database and expert systems application, Turin, pp. 683–687

  • Paliwal, M., & Kumar, U. A. (2009). Neural networks and statistical techniques: A review of applications. Expert Systems with Applications an International Journal, 36(1), 2–17.

    Article  Google Scholar 

  • Palshikar, G. K., Apte, M., & Pandita, D. (2018). Weakly supervised and online learning of word models for classification to detect disaster reporting tweets. Information Systems Frontiers, 20(5), 949–959.

    Article  Google Scholar 

  • L. Ribeiro, T. Härder (2006) Entity identification in XML documents. In: 18th GI-Workshop on the Foundations of Databases, pp. 130–134.

  • Salton, G., & McGill, M. (1983). Introduction to modern information retrieval. New York: McGrawHill Book Company.

    Google Scholar 

  • Schölkopf, B., Smola, A., & Müller, K. R. (1998). Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10(5), 1299–1319.

    Article  Google Scholar 

  • Suykens, J., & Vandewalle, J. (1999). Least squares support vector machine classifiers. Neural Processing Letters, 9(3), 293–300.

    Article  Google Scholar 

  • Tang, J., Deng, C., & Huang, G. B. (2016). Extreme Learning Machine for Multilayer Perceptron. IEEE Transactions on Neural Networks & Learning Systems, 27(4), 809–821.

    Article  Google Scholar 

  • Tekli, J., & Chbeir, R. (2012). A novel XML document structure comparison framework based-on sub-tree commonalities and label semantics. Web Semantics: Science, Services and Agents on the World Wide Web, 11(3), 14–40.

    Article  Google Scholar 

  • Tekli, J., Chbeir, R., et al. (2015). Approximate XML structure validation based on document-grammar tree similarity. Information Sciences, 295, 258–302.

    Article  Google Scholar 

  • Thasleena, N. T., & Varghese, S. C. (2015). Enhanced associative classification of XML documents supported by semantic concepts. Procedia Computer Science, 46, 194–201.

    Article  Google Scholar 

  • Thomo, A.., Venkatesh, S. (2008) Rewriting of visibly pushdown languages for xml data integration. In: Proc. of the 17th ACM Conference on Information and Knowledge Management, ACM, Napa Valley, pp. 521–530

  • Turowski, K., & Weng, U. (2002). Representing and processing fuzzy information-an XML-based approach. Knowledge-Based Systems, 15(1), 67–75.

    Article  Google Scholar 

  • Yan, L., Ma, Z. M., & Liu, J. (2009). Fuzzy data modeling based on XML schema, in: Proceedings of 2009 ACM symposium on applied computing (pp. 1563–1567). Honolulu: ACM.

    Book  Google Scholar 

  • Yang, J., & Chen, X. (2002). A semi-structured document model for text mining. Journal of Computer Science and Technology, 17(5), 603–610.

    Article  Google Scholar 

  • Zhang, X. L., Yang, T., Fan, B. Q., et al. (2012). A Novel Method for Measuring Structure and Semantic Similarity of XML Documents Based on Extended Adjacency Matrix. Physics Procedia, 24, 1452–1461.

    Article  Google Scholar 

  • Zhao, X., Bi, X., et al. (2016). Uncertain XML documents classification using extreme learning machine. Neurocomputing, 174, 375–382.

    Article  Google Scholar 

  • Zhao, Z., Ma, Z. M., Zhang, F., et al. (2017). Classification of fuzzy XML documents based on double hidden layer ELM. Computer Engineering and Applications, 53(4), 19–24.

    Google Scholar 

  • Zhao, X., Wang, G., Bi, X., et al. (2011). XML document classification based on ELM. Neurocomputing, 74(16), 2444–2451.

    Article  Google Scholar 

Download references

Acknowledgements

The authors wish to thank the anonymous referees for their valuable comments and suggestions, which improved the technical content and the presentation of the paper. This work was supported by the National Natural Science Foundation of China (61772269, 61370075 & 61976027) and the Scientific Research Projects of Liaoning Educational Committee (LQ2017003).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zongmin Ma.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhao, Z., Ma, Z. & Yan, L. An Efficient Classification of Fuzzy XML Documents Based on Kernel ELM. Inf Syst Front 23, 515–530 (2021). https://doi.org/10.1007/s10796-019-09973-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10796-019-09973-3

Keywords

Navigation