Skip to main content

Adaptive XML Stream Classification Using Partial Tree-Edit Distance

  • Conference paper
Book cover Foundations of Intelligent Systems (ISMIS 2014)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8502))

Included in the following conference series:

Abstract

XML classification finds many applications, ranging from data integration to e-commerce. However, existing classification algorithms are designed for static XML collections, while modern information systems frequently deal with streaming data that needs to be processed on-line using limited resources. Furthermore, data stream classifiers have to be able to react to concept drifts, i.e., changes of the streams underlying data distribution. In this paper, we propose XStreamClass, an XML classifier capable of processing streams of documents and reacting to concept drifts. The algorithm combines incremental frequent tree mining with partial tree-edit distance and associative classification. XStreamClass was experimentally compared with four state-of-the-art data stream ensembles and provided best average classification accuracy on real and synthetic datasets simulating different drift scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Zaki, M.J., Aggarwal, C.C.: Xrules: An effective algorithm for structural classification of xml data. Machine Learning 62(1-2), 137–170 (2006)

    Article  Google Scholar 

  2. Costa, G., et al.: X-class: Associative classification of xml documents by structure. ACM Trans. Inf. Syst. 31(1), 1–3 (2013)

    Article  Google Scholar 

  3. Brzezinski, D., et al.: XCleaner: A new method for clustering XML documents by structure. Control and Cybernetics 40(3), 877–891 (2011)

    Google Scholar 

  4. Mayorga, V., Polyzotis, N.: Sketch-based summarization of ordered XML streams. In: Ioannidis, Y.E., Lee, D.L., Ng, R.T. (eds.) ICDE, pp. 541–552. IEEE (2009)

    Google Scholar 

  5. Gama, J.: Knowledge Discovery from Data Streams. Chapman and Hall (2010)

    Google Scholar 

  6. Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proc. 6th ACM SIGKDD Int. Conf. Knowl. Disc. Data Min., pp. 71–80 (2000)

    Google Scholar 

  7. Oza, N.C., Russell, S.J.: Experimental comparisons of online and batch versions of bagging and boosting. In: Proc. 7th ACM SIGKDD Int. Conf. Knowl. Disc. Data Min., pp. 359–364 (2001)

    Google Scholar 

  8. Brzezinski, D., Stefanowski, J.: Reacting to different types of concept drift: The accuracy updated ensemble algorithm. IEEE Trans. on Neural Netw. Learn. Syst. 25(1), 81–94 (2014)

    Article  Google Scholar 

  9. Bifet, A., Gavaldà, R.: Adaptive xml tree classification on evolving data streams. In: Buntine, W., Grobelnik, M., Mladenić, D., Shawe-Taylor, J. (eds.) ECML PKDD 2009, Part I. LNCS, vol. 5781, pp. 147–162. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  10. Wang, H., et al.: Mining concept-drifting data streams using ensemble classifiers. In: Proc. 9th ACM SIGKDD Int. Conf. Knowl. Disc. Data Min., pp. 226–235 (2003)

    Google Scholar 

  11. Elwell, R., Polikar, R.: Incremental learning of concept drift in nonstationary environments. IEEE Trans. Neural Netw. 22(10), 1517–1531 (2011)

    Article  Google Scholar 

  12. Piernik, M., Morzy, T.: Partial tree-edit distance. Technical Report RA-10/2013, Poznan University of Technology (2013), http://www.cs.put.poznan.pl/mpiernik/publications/PTED.pdf

  13. Valiente, G.: Constrained tree inclusion. J. Discrete Alg. 3(2-4), 431–447 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  14. Pawlik, M., Augsten, N.: RTED: A robust algorithm for the tree edit distance. PVLDB 5(4), 334–345 (2011)

    Google Scholar 

  15. Bifet, A., et al.: MOA: Massive Online Analysis. J. Mach. Learn. Res. 11, 1601–1604 (2010)

    Google Scholar 

  16. Demsar, J.: Statistical comparisons of classifiers over multiple data sets. J. Machine Learning Research 7, 1–30 (2006)

    MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Brzezinski, D., Piernik, M. (2014). Adaptive XML Stream Classification Using Partial Tree-Edit Distance. In: Andreasen, T., Christiansen, H., Cubero, JC., Raś, Z.W. (eds) Foundations of Intelligent Systems. ISMIS 2014. Lecture Notes in Computer Science(), vol 8502. Springer, Cham. https://doi.org/10.1007/978-3-319-08326-1_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-08326-1_2

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-08325-4

  • Online ISBN: 978-3-319-08326-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics