Skip to main content

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6278))

  • 1305 Accesses

Abstract

To improve the classification accuracy of documents, it will be important to characterize not only words but also their relations among words. The classification method from this point of view will need another approach for the analysis of documents. In this paper, first, how to find the pattern tree in the XML data tree as the embedded sub-tree is developed simply by applying XPath technique. This problem is applicable to the search of the characterized words and their relations in the XML documents. Second, next problem is what kind of words and their relations exist in the XML documents. This problem is how to find the most frequent patterns in the documents, which is called often the most frequent sub-trees in the XML domain. The second problem finding the most frequent sub-trees is solved simply here by applying XPath technique.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Ishii, N., Bao, Y., Hoki, Y., Tanaka, H.: Rough Set Reduct Based Classification. In: New Advances in Intelligent Decision Technologies (IDT 2009). Studies in Computational Intelligence, vol. 199, pp. 373–382. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  2. Bao, Y., Tsuchiya, E., Ishii, N., Du, X.: Classification by Instance-Based Learning Algorithm. In: Gallagher, M., Hogan, J.P., Maire, F. (eds.) IDEAL 2005. LNCS, vol. 3578, pp. 133–140. Springer, Heidelberg (2005)

    Google Scholar 

  3. Geneves, P., Layaida, N.: A System for the Static Analysis of XPath. ACM transactions on Information Systems 24(4), 475–502 (2006)

    Article  Google Scholar 

  4. Benedikt, M., Koch, C.: XPath Leashed. ACM Computing Surveys 41(1), Article 3, 3:1–3:52 (2008)

    Google Scholar 

  5. Yang, L.H., Lee, M.L., Hsu, W.: Efficient Mining of XML Query Patterns for Caching. In: Proc. of the 29th VLDB Conference, vol. 29, pp. 69–80 (2003)

    Google Scholar 

  6. Zaki, M.J.: Efficiently Mining Frequent Trees in a Forest: Algorithms and Applications. IEEE Trans. on Knowledge and Data Engineering 17(8), 1021–1035 (2005)

    Article  Google Scholar 

  7. Zaki, M.J.: Efficiently Mining Frequent Embedded Unordered Trees. Fundamenta Informatica 65, 1–20 (2005)

    MathSciNet  Google Scholar 

  8. Zaki, M.J., Aggarwal, C.C.: XRules, An Effective Structural Classifier for XML Data. In: Whang, K.-Y., Jeon, J., Shim, K., Srivastava, J. (eds.) PAKDD 2003. LNCS (LNAI), vol. 2637, pp. 316–325. Springer, Heidelberg (2003)

    Google Scholar 

  9. Asai, T., Arimura, T., Uno, T., Nakano, S.: Discovering Frequent Substructures in Large Unordered Trees. In: Proc. Sixth Int. Conf. Discovery Science, pp. 47–61 (October 2003)

    Google Scholar 

  10. Chi, Y., Yang, Y., Munz, R.R.: Indexing and Mining Free Trees. In: Proc. Third IEEE Int. Conf. Data Mining, pp. 509–512 (2003)

    Google Scholar 

  11. http://www.cs.washington.edu/research/xmldatasets/www/repository.html

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Okada, M., Ishii, N., Torii, I. (2010). Information Extraction Using XPath. In: Setchi, R., Jordanov, I., Howlett, R.J., Jain, L.C. (eds) Knowledge-Based and Intelligent Information and Engineering Systems. KES 2010. Lecture Notes in Computer Science(), vol 6278. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15393-8_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-15393-8_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-15392-1

  • Online ISBN: 978-3-642-15393-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics