Information Retrieval of Sequential Data in Heterogeneous XML Databases

Popovici, Eugen; Marteau, Pierre-François; Ménier, Gildas

doi:10.1007/11670834_19

Eugen Popovici²⁰,
Pierre-François Marteau²⁰ &
Gildas Ménier²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3877))

Included in the following conference series:

International Workshop on Adaptive Multimedia Retrieval

279 Accesses

Abstract

The XML language is a W3C standard sustained by both the industry and the scientific community. Therefore, the available information annotated in XML keeps and will keep increasing in size. Furthermore, not only the volume of the XML information is increasing but also its complexity. The XML documents evolved from plain structured text representations, to documents having complex and heterogeneous structures and contents like sequential or time series data. In this article we introduce a retrieval scheme designed to manage sequential data in an XML context based on two levels of approximation: on the structural localization/organization of the sequential data and on its content. To this end we merge methods developed in two different research areas: XML information retrieval and sequence similarity search.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Information Retrieval in XML Document: State of the Art

Searching for Patterns in Sequential Data: Functionality and Performance Assessment of Commercial and Open-Source Systems

Full-Text and URL Search Over Web Archives

References

MidiXML, Standard MIDI File DTD: MIDI XML, Version 1.0 January 13 (2004), http://www.recordare.com/dtds/midixml.html
MusicXML, MusicXML Definition, Version 1.0 (January 2004), http://www.recordare.com/xml.html
Robinson, A.: XML’s and DTD’s for Biology. In: An XML Workshop for Biologists and Bioinformaticians (2000), http://industry.ebi.ac.uk/~alan/XMLWorkshop/
Fuhr, N., Groβjohann, K.: XIRQL: An XML query language based on information retrieval concepts. ACM Transactions on Information Systems (TOIS) 22(2), 313–356 (2004)
Article Google Scholar
Amer-Yahia, S., Koudas, N., Srivastava, D.: Approximate Matching in XML. In: Advanced Technology Seminar 5, ICDE 2003 (2003)
Google Scholar
Amer-Yahia, S., Lakshmanan, L.V.S., Pandi, S.: FleXPath: Flexible Structure and Full-Text Querying for XML. In: SIGMOD Conference, Paris France, June 2004, pp. 83–94 (2004)
Google Scholar
Carmel, D., Maarek, Y.S., Mandelbrod, M., Mass, Y., Soffer, A.: Searching XML documents via XML fragments. In: SIGIR 2003, Toronto, Canada, pp. 151–158 (2003)
Google Scholar
Dorneles, C.F., Heuser, C.A., Lima, A.E.N., Da Silva, A., De Moura, E.: Measuring similarity between collection of values. In: 6th ACM International Workshop on Web Information and Data Management, WIDM (2004)
Google Scholar
Clark, J., DeRose, S.: XML Path Language (XPath) Version 1.0, W3C Recommendation, November 16 (1999), http://www.w3.org/TR/xpath.html
Biron, P., Malhotra, A.: XML Schema Part 2: Datatypes Second Edition, W3C Recommendation, October 28 (2004) http://www.w3.org/TR/xmlschema-2/
Seidel, R., Aragon, C.R.: Randomized Binary Search Trees. ALGORITHMICA 16(4/5), 464–497 (1996)
Article MathSciNet MATH Google Scholar
Ménier, G., Marteau, P.F.: Information retrieval in heterogeneous XML knowledge bases. In: The 9th International Conference on Information Processing and Magement of Uncertainty in Knowledge-Based Systems, Annecy, France, July 1-5, 2002, IEEE, Los Alamitos (2002)
Google Scholar
Navarro, G.A.: Guided Tour to Approximate String Matching. ACM Computing Surveys 33(1), 31–88 (2001)
Article Google Scholar
Meek, C., Patel, J.M., Kasetty, S.: Oasis: An online and accurate technique for local-alignment searches on biological sequences. In: Proc. 2003 Int. Conf. Very Large Data Bases (VLDB 2003), Berlin, Germany, September 2003, pp. 910–921 (2003)
Google Scholar
Hunt, E., Atkinson, M.P., Irving, R.W.: Database indexing for large DNA and protein sequence collections. The VLDB Journal 11(3), 256–271 (2002)
Article MATH Google Scholar
McCreight, E.M.: A Space-Economical Suffix Tree Construction Algorithm. Journal of the ACM 23, 262–272 (1976)
Article MathSciNet MATH Google Scholar
Gusfield, D.: Algorithms on strings, trees and sequences. Cambridge University Press, Cambridge (1997)
Book MATH Google Scholar
Ukkonen, E.: On-line construction of suffix-trees. ALGORITHMICA 14, 249–260 (1995)
Article MathSciNet MATH Google Scholar
Tai, K.C.: The tree to tree correction problem. J. ACM 26(3), 422–433 (1979)
Article MathSciNet MATH Google Scholar
Wang, T.L.J., Shapiro, B., Shasha, D., Zhang, K., Currey, K.M.: An algorithm for finding the largest approximately common substructures of two trees. J. IEEE Pattern Analysis and Machine Intelligence 20(8) (August 1998)
Google Scholar
Levenshtein, A.: Binary Codes Capable of Correcting Deletions, Insertions and Reversals. Sov. Phy. Dohl. 10, 707–710 (1966)
MathSciNet MATH Google Scholar
Wagner, R., Fisher, M.: The String-to-String Correction Problem. Journal of the Association for Computing Machinery 12(1), 168–173 (1974)
Article MathSciNet MATH Google Scholar
Mignet, L., Barbosa, D., Veltri, P.: The Web XML: A First Study (2003), http://citeseer.ist.psu.edu/mignet03web.html
Zhu, Y., Shasha, D.: Warping indexes with envelope transforms for query by humming. In: Proceedings of the 2003 ACM SIGMOD, San Diego, California, pp. 181–192 (2003)
Google Scholar
Park, S., Chu, W., Yoon, J., Won, J.: Similarity search of time-warped subsequences via a suffix tree. Information Systems 28(7), 867–883 (2003)
Article Google Scholar
Yi, B., Jagadish, H.V., Faloutsos, C.: Efficient Retrieval of Similar Time Sequences Under Time Warping. In: ICDE, pp. 201–208 (1998)
Google Scholar
van Zwol, R., Kazai, G., Lalmas, M.: Multimedia track, INEX (April-December 2005), http://inex.is.informatik.uni-duisburg.de/2005/tracks/media/index.html

Download references

Author information

Authors and Affiliations

Valoria Laboratory, University of South-Brittany, BP 573, 56017 Cedex, Vannes, France
Eugen Popovici, Pierre-François Marteau & Gildas Ménier

Authors

Eugen Popovici
View author publications
You can also search for this author in PubMed Google Scholar
Pierre-François Marteau
View author publications
You can also search for this author in PubMed Google Scholar
Gildas Ménier
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Laboratoire d’Informatique de Paris 6, France
Marcin Detyniecki
Department of Computer Science, University of Glasgow, 17 Lilybank Gardens, G12 8QQ, Glasgow, UK
Joemon M. Jose
Fakultät für Informatik, Otto-von-Guericke Universität Madgeburg, Universitätsplatz 2, 39106, Germany
Andreas Nürnberger
Department of Computing Science, University of Glasgow, G12 8QQ, Glasgow, UK
C. J. van Rijsbergen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Popovici, E., Marteau, PF., Ménier, G. (2006). Information Retrieval of Sequential Data in Heterogeneous XML Databases. In: Detyniecki, M., Jose, J.M., Nürnberger, A., van Rijsbergen, C.J. (eds) Adaptive Multimedia Retrieval: User, Context, and Feedback. AMR 2005. Lecture Notes in Computer Science, vol 3877. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11670834_19

Download citation

DOI: https://doi.org/10.1007/11670834_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-32174-3
Online ISBN: 978-3-540-32175-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Information Retrieval of Sequential Data in Heterogeneous XML Databases

Abstract

Access this chapter

Preview

Similar content being viewed by others

Information Retrieval in XML Document: State of the Art

Searching for Patterns in Sequential Data: Functionality and Performance Assessment of Commercial and Open-Source Systems

Full-Text and URL Search Over Web Archives

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Information Retrieval of Sequential Data in Heterogeneous XML Databases

Abstract

Access this chapter

Preview

Similar content being viewed by others

Information Retrieval in XML Document: State of the Art

Searching for Patterns in Sequential Data: Functionality and Performance Assessment of Commercial and Open-Source Systems

Full-Text and URL Search Over Web Archives

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation