Skip to main content

Information Retrieval of Sequential Data in Heterogeneous XML Databases

  • Conference paper
Adaptive Multimedia Retrieval: User, Context, and Feedback (AMR 2005)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3877))

Included in the following conference series:

  • 279 Accesses

Abstract

The XML language is a W3C standard sustained by both the industry and the scientific community. Therefore, the available information annotated in XML keeps and will keep increasing in size. Furthermore, not only the volume of the XML information is increasing but also its complexity. The XML documents evolved from plain structured text representations, to documents having complex and heterogeneous structures and contents like sequential or time series data. In this article we introduce a retrieval scheme designed to manage sequential data in an XML context based on two levels of approximation: on the structural localization/organization of the sequential data and on its content. To this end we merge methods developed in two different research areas: XML information retrieval and sequence similarity search.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. MidiXML, Standard MIDI File DTD: MIDI XML, Version 1.0 January 13 (2004), http://www.recordare.com/dtds/midixml.html

  2. MusicXML, MusicXML Definition, Version 1.0 (January 2004), http://www.recordare.com/xml.html

  3. Robinson, A.: XML’s and DTD’s for Biology. In: An XML Workshop for Biologists and Bioinformaticians (2000), http://industry.ebi.ac.uk/~alan/XMLWorkshop/

  4. Fuhr, N., Groβjohann, K.: XIRQL: An XML query language based on information retrieval concepts. ACM Transactions on Information Systems (TOIS) 22(2), 313–356 (2004)

    Article  Google Scholar 

  5. Amer-Yahia, S., Koudas, N., Srivastava, D.: Approximate Matching in XML. In: Advanced Technology Seminar 5, ICDE 2003 (2003)

    Google Scholar 

  6. Amer-Yahia, S., Lakshmanan, L.V.S., Pandi, S.: FleXPath: Flexible Structure and Full-Text Querying for XML. In: SIGMOD Conference, Paris France, June 2004, pp. 83–94 (2004)

    Google Scholar 

  7. Carmel, D., Maarek, Y.S., Mandelbrod, M., Mass, Y., Soffer, A.: Searching XML documents via XML fragments. In: SIGIR 2003, Toronto, Canada, pp. 151–158 (2003)

    Google Scholar 

  8. Dorneles, C.F., Heuser, C.A., Lima, A.E.N., Da Silva, A., De Moura, E.: Measuring similarity between collection of values. In: 6th ACM International Workshop on Web Information and Data Management, WIDM (2004)

    Google Scholar 

  9. Clark, J., DeRose, S.: XML Path Language (XPath) Version 1.0, W3C Recommendation, November 16 (1999), http://www.w3.org/TR/xpath.html

  10. Biron, P., Malhotra, A.: XML Schema Part 2: Datatypes Second Edition, W3C Recommendation, October 28 (2004) http://www.w3.org/TR/xmlschema-2/

  11. Seidel, R., Aragon, C.R.: Randomized Binary Search Trees. ALGORITHMICA 16(4/5), 464–497 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  12. Ménier, G., Marteau, P.F.: Information retrieval in heterogeneous XML knowledge bases. In: The 9th International Conference on Information Processing and Magement of Uncertainty in Knowledge-Based Systems, Annecy, France, July 1-5, 2002, IEEE, Los Alamitos (2002)

    Google Scholar 

  13. Navarro, G.A.: Guided Tour to Approximate String Matching. ACM Computing Surveys 33(1), 31–88 (2001)

    Article  Google Scholar 

  14. Meek, C., Patel, J.M., Kasetty, S.: Oasis: An online and accurate technique for local-alignment searches on biological sequences. In: Proc. 2003 Int. Conf. Very Large Data Bases (VLDB 2003), Berlin, Germany, September 2003, pp. 910–921 (2003)

    Google Scholar 

  15. Hunt, E., Atkinson, M.P., Irving, R.W.: Database indexing for large DNA and protein sequence collections. The VLDB Journal 11(3), 256–271 (2002)

    Article  MATH  Google Scholar 

  16. McCreight, E.M.: A Space-Economical Suffix Tree Construction Algorithm. Journal of the ACM 23, 262–272 (1976)

    Article  MathSciNet  MATH  Google Scholar 

  17. Gusfield, D.: Algorithms on strings, trees and sequences. Cambridge University Press, Cambridge (1997)

    Book  MATH  Google Scholar 

  18. Ukkonen, E.: On-line construction of suffix-trees. ALGORITHMICA 14, 249–260 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  19. Tai, K.C.: The tree to tree correction problem. J. ACM 26(3), 422–433 (1979)

    Article  MathSciNet  MATH  Google Scholar 

  20. Wang, T.L.J., Shapiro, B., Shasha, D., Zhang, K., Currey, K.M.: An algorithm for finding the largest approximately common substructures of two trees. J. IEEE Pattern Analysis and Machine Intelligence  20(8) (August 1998)

    Google Scholar 

  21. Levenshtein, A.: Binary Codes Capable of Correcting Deletions, Insertions and Reversals. Sov. Phy. Dohl. 10, 707–710 (1966)

    MathSciNet  MATH  Google Scholar 

  22. Wagner, R., Fisher, M.: The String-to-String Correction Problem. Journal of the Association for Computing Machinery 12(1), 168–173 (1974)

    Article  MathSciNet  MATH  Google Scholar 

  23. Mignet, L., Barbosa, D., Veltri, P.: The Web XML: A First Study (2003), http://citeseer.ist.psu.edu/mignet03web.html

  24. Zhu, Y., Shasha, D.: Warping indexes with envelope transforms for query by humming. In: Proceedings of the 2003 ACM SIGMOD, San Diego, California, pp. 181–192 (2003)

    Google Scholar 

  25. Park, S., Chu, W., Yoon, J., Won, J.: Similarity search of time-warped subsequences via a suffix tree. Information Systems 28(7), 867–883 (2003)

    Article  Google Scholar 

  26. Yi, B., Jagadish, H.V., Faloutsos, C.: Efficient Retrieval of Similar Time Sequences Under Time Warping. In: ICDE, pp. 201–208 (1998)

    Google Scholar 

  27. van Zwol, R., Kazai, G., Lalmas, M.: Multimedia track, INEX (April-December 2005), http://inex.is.informatik.uni-duisburg.de/2005/tracks/media/index.html

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Popovici, E., Marteau, PF., Ménier, G. (2006). Information Retrieval of Sequential Data in Heterogeneous XML Databases. In: Detyniecki, M., Jose, J.M., Nürnberger, A., van Rijsbergen, C.J. (eds) Adaptive Multimedia Retrieval: User, Context, and Feedback. AMR 2005. Lecture Notes in Computer Science, vol 3877. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11670834_19

Download citation

  • DOI: https://doi.org/10.1007/11670834_19

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-32174-3

  • Online ISBN: 978-3-540-32175-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics