Abstract
The XML language is a W3C standard sustained by both the industry and the scientific community. Therefore, the available information annotated in XML keeps and will keep increasing in size. Furthermore, not only the volume of the XML information is increasing but also its complexity. The XML documents evolved from plain structured text representations, to documents having complex and heterogeneous structures and contents like sequential or time series data. In this article we introduce a retrieval scheme designed to manage sequential data in an XML context based on two levels of approximation: on the structural localization/organization of the sequential data and on its content. To this end we merge methods developed in two different research areas: XML information retrieval and sequence similarity search.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
MidiXML, Standard MIDI File DTD: MIDI XML, Version 1.0 January 13 (2004), http://www.recordare.com/dtds/midixml.html
MusicXML, MusicXML Definition, Version 1.0 (January 2004), http://www.recordare.com/xml.html
Robinson, A.: XML’s and DTD’s for Biology. In: An XML Workshop for Biologists and Bioinformaticians (2000), http://industry.ebi.ac.uk/~alan/XMLWorkshop/
Fuhr, N., Groβjohann, K.: XIRQL: An XML query language based on information retrieval concepts. ACM Transactions on Information Systems (TOIS) 22(2), 313–356 (2004)
Amer-Yahia, S., Koudas, N., Srivastava, D.: Approximate Matching in XML. In: Advanced Technology Seminar 5, ICDE 2003 (2003)
Amer-Yahia, S., Lakshmanan, L.V.S., Pandi, S.: FleXPath: Flexible Structure and Full-Text Querying for XML. In: SIGMOD Conference, Paris France, June 2004, pp. 83–94 (2004)
Carmel, D., Maarek, Y.S., Mandelbrod, M., Mass, Y., Soffer, A.: Searching XML documents via XML fragments. In: SIGIR 2003, Toronto, Canada, pp. 151–158 (2003)
Dorneles, C.F., Heuser, C.A., Lima, A.E.N., Da Silva, A., De Moura, E.: Measuring similarity between collection of values. In: 6th ACM International Workshop on Web Information and Data Management, WIDM (2004)
Clark, J., DeRose, S.: XML Path Language (XPath) Version 1.0, W3C Recommendation, November 16 (1999), http://www.w3.org/TR/xpath.html
Biron, P., Malhotra, A.: XML Schema Part 2: Datatypes Second Edition, W3C Recommendation, October 28 (2004) http://www.w3.org/TR/xmlschema-2/
Seidel, R., Aragon, C.R.: Randomized Binary Search Trees. ALGORITHMICA 16(4/5), 464–497 (1996)
Ménier, G., Marteau, P.F.: Information retrieval in heterogeneous XML knowledge bases. In: The 9th International Conference on Information Processing and Magement of Uncertainty in Knowledge-Based Systems, Annecy, France, July 1-5, 2002, IEEE, Los Alamitos (2002)
Navarro, G.A.: Guided Tour to Approximate String Matching. ACM Computing Surveys 33(1), 31–88 (2001)
Meek, C., Patel, J.M., Kasetty, S.: Oasis: An online and accurate technique for local-alignment searches on biological sequences. In: Proc. 2003 Int. Conf. Very Large Data Bases (VLDB 2003), Berlin, Germany, September 2003, pp. 910–921 (2003)
Hunt, E., Atkinson, M.P., Irving, R.W.: Database indexing for large DNA and protein sequence collections. The VLDB Journal 11(3), 256–271 (2002)
McCreight, E.M.: A Space-Economical Suffix Tree Construction Algorithm. Journal of the ACM 23, 262–272 (1976)
Gusfield, D.: Algorithms on strings, trees and sequences. Cambridge University Press, Cambridge (1997)
Ukkonen, E.: On-line construction of suffix-trees. ALGORITHMICA 14, 249–260 (1995)
Tai, K.C.: The tree to tree correction problem. J. ACM 26(3), 422–433 (1979)
Wang, T.L.J., Shapiro, B., Shasha, D., Zhang, K., Currey, K.M.: An algorithm for finding the largest approximately common substructures of two trees. J. IEEE Pattern Analysis and Machine Intelligence 20(8) (August 1998)
Levenshtein, A.: Binary Codes Capable of Correcting Deletions, Insertions and Reversals. Sov. Phy. Dohl. 10, 707–710 (1966)
Wagner, R., Fisher, M.: The String-to-String Correction Problem. Journal of the Association for Computing Machinery 12(1), 168–173 (1974)
Mignet, L., Barbosa, D., Veltri, P.: The Web XML: A First Study (2003), http://citeseer.ist.psu.edu/mignet03web.html
Zhu, Y., Shasha, D.: Warping indexes with envelope transforms for query by humming. In: Proceedings of the 2003 ACM SIGMOD, San Diego, California, pp. 181–192 (2003)
Park, S., Chu, W., Yoon, J., Won, J.: Similarity search of time-warped subsequences via a suffix tree. Information Systems 28(7), 867–883 (2003)
Yi, B., Jagadish, H.V., Faloutsos, C.: Efficient Retrieval of Similar Time Sequences Under Time Warping. In: ICDE, pp. 201–208 (1998)
van Zwol, R., Kazai, G., Lalmas, M.: Multimedia track, INEX (April-December 2005), http://inex.is.informatik.uni-duisburg.de/2005/tracks/media/index.html
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Popovici, E., Marteau, PF., Ménier, G. (2006). Information Retrieval of Sequential Data in Heterogeneous XML Databases. In: Detyniecki, M., Jose, J.M., Nürnberger, A., van Rijsbergen, C.J. (eds) Adaptive Multimedia Retrieval: User, Context, and Feedback. AMR 2005. Lecture Notes in Computer Science, vol 3877. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11670834_19
Download citation
DOI: https://doi.org/10.1007/11670834_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-32174-3
Online ISBN: 978-3-540-32175-0
eBook Packages: Computer ScienceComputer Science (R0)