Abstract
To improve the classification accuracy of documents, it will be important to characterize not only words but also their relations among words. The classification method from this point of view will need another approach for the analysis of documents. In this paper, first, how to find the pattern tree in the XML data tree as the embedded sub-tree is developed simply by applying XPath technique. This problem is applicable to the search of the characterized words and their relations in the XML documents. Second, next problem is what kind of words and their relations exist in the XML documents. This problem is how to find the most frequent patterns in the documents, which is called often the most frequent sub-trees in the XML domain. The second problem finding the most frequent sub-trees is solved simply here by applying XPath technique.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Ishii, N., Bao, Y., Hoki, Y., Tanaka, H.: Rough Set Reduct Based Classification. In: New Advances in Intelligent Decision Technologies (IDT 2009). Studies in Computational Intelligence, vol. 199, pp. 373–382. Springer, Heidelberg (2009)
Bao, Y., Tsuchiya, E., Ishii, N., Du, X.: Classification by Instance-Based Learning Algorithm. In: Gallagher, M., Hogan, J.P., Maire, F. (eds.) IDEAL 2005. LNCS, vol. 3578, pp. 133–140. Springer, Heidelberg (2005)
Geneves, P., Layaida, N.: A System for the Static Analysis of XPath. ACM transactions on Information Systems 24(4), 475–502 (2006)
Benedikt, M., Koch, C.: XPath Leashed. ACM Computing Surveys 41(1), Article 3, 3:1–3:52 (2008)
Yang, L.H., Lee, M.L., Hsu, W.: Efficient Mining of XML Query Patterns for Caching. In: Proc. of the 29th VLDB Conference, vol. 29, pp. 69–80 (2003)
Zaki, M.J.: Efficiently Mining Frequent Trees in a Forest: Algorithms and Applications. IEEE Trans. on Knowledge and Data Engineering 17(8), 1021–1035 (2005)
Zaki, M.J.: Efficiently Mining Frequent Embedded Unordered Trees. Fundamenta Informatica 65, 1–20 (2005)
Zaki, M.J., Aggarwal, C.C.: XRules, An Effective Structural Classifier for XML Data. In: Whang, K.-Y., Jeon, J., Shim, K., Srivastava, J. (eds.) PAKDD 2003. LNCS (LNAI), vol. 2637, pp. 316–325. Springer, Heidelberg (2003)
Asai, T., Arimura, T., Uno, T., Nakano, S.: Discovering Frequent Substructures in Large Unordered Trees. In: Proc. Sixth Int. Conf. Discovery Science, pp. 47–61 (October 2003)
Chi, Y., Yang, Y., Munz, R.R.: Indexing and Mining Free Trees. In: Proc. Third IEEE Int. Conf. Data Mining, pp. 509–512 (2003)
http://www.cs.washington.edu/research/xmldatasets/www/repository.html
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Okada, M., Ishii, N., Torii, I. (2010). Information Extraction Using XPath. In: Setchi, R., Jordanov, I., Howlett, R.J., Jain, L.C. (eds) Knowledge-Based and Intelligent Information and Engineering Systems. KES 2010. Lecture Notes in Computer Science(), vol 6278. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15393-8_13
Download citation
DOI: https://doi.org/10.1007/978-3-642-15393-8_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15392-1
Online ISBN: 978-3-642-15393-8
eBook Packages: Computer ScienceComputer Science (R0)