Information Extraction Using XPath

Okada, Masashi; Ishii, Naohiro; Torii, Ippei

doi:10.1007/978-3-642-15393-8_13

Masashi Okada²³,
Naohiro Ishii²³ &
Ippei Torii²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6278))

Included in the following conference series:

International Conference on Knowledge-Based and Intelligent Information and Engineering Systems

1305 Accesses

Abstract

To improve the classification accuracy of documents, it will be important to characterize not only words but also their relations among words. The classification method from this point of view will need another approach for the analysis of documents. In this paper, first, how to find the pattern tree in the XML data tree as the embedded sub-tree is developed simply by applying XPath technique. This problem is applicable to the search of the characterized words and their relations in the XML documents. Second, next problem is what kind of words and their relations exist in the XML documents. This problem is how to find the most frequent patterns in the documents, which is called often the most frequent sub-trees in the XML domain. The second problem finding the most frequent sub-trees is solved simply here by applying XPath technique.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Clustering XML Documents Using Frequent Edge-Sets

Machine learning techniques for XML (co-)clustering by structure-constrained phrases

Article 04 August 2017

STEM: a suffix tree-based method for web data records extraction

Article 09 May 2017

References

Ishii, N., Bao, Y., Hoki, Y., Tanaka, H.: Rough Set Reduct Based Classification. In: New Advances in Intelligent Decision Technologies (IDT 2009). Studies in Computational Intelligence, vol. 199, pp. 373–382. Springer, Heidelberg (2009)
Chapter Google Scholar
Bao, Y., Tsuchiya, E., Ishii, N., Du, X.: Classification by Instance-Based Learning Algorithm. In: Gallagher, M., Hogan, J.P., Maire, F. (eds.) IDEAL 2005. LNCS, vol. 3578, pp. 133–140. Springer, Heidelberg (2005)
Google Scholar
Geneves, P., Layaida, N.: A System for the Static Analysis of XPath. ACM transactions on Information Systems 24(4), 475–502 (2006)
Article Google Scholar
Benedikt, M., Koch, C.: XPath Leashed. ACM Computing Surveys 41(1), Article 3, 3:1–3:52 (2008)
Google Scholar
Yang, L.H., Lee, M.L., Hsu, W.: Efficient Mining of XML Query Patterns for Caching. In: Proc. of the 29^th VLDB Conference, vol. 29, pp. 69–80 (2003)
Google Scholar
Zaki, M.J.: Efficiently Mining Frequent Trees in a Forest: Algorithms and Applications. IEEE Trans. on Knowledge and Data Engineering 17(8), 1021–1035 (2005)
Article Google Scholar
Zaki, M.J.: Efficiently Mining Frequent Embedded Unordered Trees. Fundamenta Informatica 65, 1–20 (2005)
MathSciNet Google Scholar
Zaki, M.J., Aggarwal, C.C.: XRules, An Effective Structural Classifier for XML Data. In: Whang, K.-Y., Jeon, J., Shim, K., Srivastava, J. (eds.) PAKDD 2003. LNCS (LNAI), vol. 2637, pp. 316–325. Springer, Heidelberg (2003)
Google Scholar
Asai, T., Arimura, T., Uno, T., Nakano, S.: Discovering Frequent Substructures in Large Unordered Trees. In: Proc. Sixth Int. Conf. Discovery Science, pp. 47–61 (October 2003)
Google Scholar
Chi, Y., Yang, Y., Munz, R.R.: Indexing and Mining Free Trees. In: Proc. Third IEEE Int. Conf. Data Mining, pp. 509–512 (2003)
Google Scholar
http://www.cs.washington.edu/research/xmldatasets/www/repository.html

Download references

Author information

Authors and Affiliations

Aichi Institute of Technology, 1247 Yachigusa, Yakusacho, Toyota, Japan, 470-0392
Masashi Okada, Naohiro Ishii & Ippei Torii

Authors

Masashi Okada
View author publications
You can also search for this author in PubMed Google Scholar
Naohiro Ishii
View author publications
You can also search for this author in PubMed Google Scholar
Ippei Torii
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Engineering, Cardiff University, The Parade, CF24 3AA, Cardiff, UK
Rossitza Setchi
Dept. of Computer Science and Software Engineering, University of Portsmouth, BUckingham Building, Lion Terrace, PO1 3HE, Portsmouth, UK
Ivan Jordanov
KES International, 145-157 St. John Street, EC1V 4PY, London, UK
Robert J. Howlett
School of Electrical and Information Engineering, University of South Australia, Adelaide, Mawson Lakes Campus, 5095, SA, Australia
Lakhmi C. Jain

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Okada, M., Ishii, N., Torii, I. (2010). Information Extraction Using XPath. In: Setchi, R., Jordanov, I., Howlett, R.J., Jain, L.C. (eds) Knowledge-Based and Intelligent Information and Engineering Systems. KES 2010. Lecture Notes in Computer Science(), vol 6278. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15393-8_13

Download citation

DOI: https://doi.org/10.1007/978-3-642-15393-8_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15392-1
Online ISBN: 978-3-642-15393-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics