A Kernel Method for Measuring Structural Similarity Between XML Documents

Jeong, Buhwan; Lee, Daewon; Cho, Hyunbo; Kulvatunyou, Boonserm

doi:10.1007/978-3-540-73325-6_57

Buhwan Jeong¹,
Daewon Lee¹,
Hyunbo Cho¹ &
…
Boonserm Kulvatunyou²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4570))

Included in the following conference series:

International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems

1345 Accesses
3 Citations

Abstract

Measuring structural similarity between XML documents has become a key component in various applications, including XML data mining, schema matching, web service discovery, among others. The paper presents a novel structural similarity measure between XML documents using kernel methods. Results on preliminary simulations show that this outperforms conventional ones.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Flesca, S., Manco, G., Masciari, E., Pontieri, L., Pugliese, A.: Fast detection of XML structural similarity. IEEE Transactions on Knowledge and Data Engineering 17(2) (February 2005)
Google Scholar
Yang, J., Cheung, W., Chen, X.: Learning the kernel matrix for XML document clustering. In: Proceedings of the 2005 IEEE International Conference on e-Technology, e-Commerce and e-Service (EEE’05), Washington, DC, pp. 353–358. IEEE Computer Society Press, Los Alamitos (2005)
Chapter Google Scholar
Lee, J., Lee, K., Kim, W.: Preparations for semantics-based XML mining. In: Proceedings of IEEE International Conference on Data Mining (ICDM 2001), pp. 345–352 (2001)
Google Scholar
Nierman, A., Jagadish, H.: Evaluating structural similarity in XML documents. In: Proceedings of the 5th International Workshop on the Web and Database (WebDB2002) (2002)
Google Scholar
Shvaiko, P., Euzenat, J.: A survey of scham-based matching. In: Spaccapietra, S. (ed.) Journal on Data Semantics IV. LNCS, vol. 3730, pp. 14–171. Springer, Heidelberg (2005)
Chapter Google Scholar
Jeong, B., Kulvatunyou, B., Ivezic, N., Cho, H., Jones, A.: Enhance reuse of standard e-business XML schema documents. In: Proceedings of International Workshop on Contexts and Ontology: Theory, Practice and Application (C&O’05) in the 20th National Conference on Artificial Intelligence (AAAI’05) (2005)
Google Scholar
Ivezic, N., Kulvatunyou, B., Frechette, S., Jones, A., Cho, H., Jeong, B.: An interoperability testing study: Automotive inventory visibility and interoperability. In: Proceedings of e-Challenges (2004)
Google Scholar
Muller, K., Mika, S., Ratsch, G., Tsuda, K., Schölkopf, B.: An introduction to kernel-based learning algorithms. IEEE Transactions on Neural Networks 12(2), 181–201 (2001)
Article Google Scholar
Kobayashi, M., Aono, M.: Vector Space Models for Search and Cluster Mining, pp. 103–122. Springer, New York (2003)
Google Scholar
Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., Watkins, C.: Text classification using string kernels. Journal of Machine Learning Research 2, 419–444 (2002)
Article MATH Google Scholar
Vert, J., Tsuda, K., Schölkopf, B.: A Primer on Kernel Methods, pp. 35–70. MIT Press, Cambridge (2004)
Google Scholar
Saunders, C., Tschach, H., Shawe-Taylor, J.: Syllables and other string kernel extensions. In: Proceedings of the 19th International Conference on Machine Learning (ICML’02) (2002)
Google Scholar
Cancedda, N., Gaussier, E., Goutte, C., Renders, J.: Word-sequence kernels. Journal of Machine Learning Research 3, 1059–1082 (2003)
Article MATH Google Scholar
Jeong, B.: Machine Learning-based Semantic Similarity Measures to Assist Discovery and Reuse of Data Exchange XML Schemas. PhD thesis, Department of Industrial and Management Engineering, Pohang University of Science and Technology (2006)
Google Scholar
Willett, P.: The porter stemming algorithm: Then and now. Electronic Library and Information Systems 40(3), 219–223 (2006)
Article Google Scholar
Zhang, Z., Li, R., Cao, S., Zhu, Y.: Similarity metric for XML documents. In: Proceedings of Workshop on Knowledge and Experience Management (FGWM2003) (2003)
Google Scholar
Reynolds, A., Richards, G., Rayward-Smith, V.: The application of k-medoids and PAM to the clustering of rules. In: Yang, Z.R., Yin, H., Everson, R.M. (eds.) IDEAL 2004. LNCS, vol. 3177, pp. 173–178. Springer, Heidelberg (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Industrial and Management Engineering, Pohang University of Science and Technology (POSTECH), San 31, Hyoja, Pohang, 790-784, South Korea
Buhwan Jeong, Daewon Lee & Hyunbo Cho
Manufacturing Engineering Laboratory, National Institute of Standards and Technology (NIST), 100 Bureau Dr., Gaithersburg, MD, 20899,
Boonserm Kulvatunyou

Authors

Buhwan Jeong
View author publications
You can also search for this author in PubMed Google Scholar
Daewon Lee
View author publications
You can also search for this author in PubMed Google Scholar
Hyunbo Cho
View author publications
You can also search for this author in PubMed Google Scholar
Boonserm Kulvatunyou
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Hiroshi G. Okuno Moonis Ali

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jeong, B., Lee, D., Cho, H., Kulvatunyou, B. (2007). A Kernel Method for Measuring Structural Similarity Between XML Documents. In: Okuno, H.G., Ali, M. (eds) New Trends in Applied Artificial Intelligence. IEA/AIE 2007. Lecture Notes in Computer Science(), vol 4570. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73325-6_57

Download citation

DOI: https://doi.org/10.1007/978-3-540-73325-6_57
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73322-5
Online ISBN: 978-3-540-73325-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics