skip to main content
10.1145/1031171.1031269acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
Article

Database support for species extraction from the biosystematics literature: a feasibility demonstration

Authors Info & Claims
Published:13 November 2004Publication History

ABSTRACT

A part of the biosystematics literature is currently being digitized and manually marked up with XML. Fast search on such documents shall be feasible. But marking up such documents incurs high costs, and biologists would like to know the value of such an activity in advance. Deploying standard XML database technology in a straightforward way is not feasible, because of two characteristics of biosystematics documents. The first one is that descriptions of taxa are related, i.e., a more specific taxon should inherit from a more general one. The combination of inheritance with information-retrieval mechanisms gives rise to difficulties addressed in this article. The second issue is the frequent occurrence of very specific technical terms in such documents, i.e., geographical information or biological terms. To investigate the characteristics of the search in the presence of such difficulties, we have designed and implemented a respective system, based on relational database technology. We use a collection of XML documents that mimics the characteristics of biosystematics documents, as we will explain. We propose two query-evaluation alternatives and compare them by means of performance experiments. It turns out that our techniques can administer the envisioned corpus of documents efficiently and cope with those problems at the same time.

References

  1. Fuhr, Norbert; Grossjohann, Kai: XIRQL: A Query Language for Information Retrieval. In: Proceedings of the 24th Annual International Conference on Research and Development in Information Retrieval. New York: September 2001, P. 172--180. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Salton, Gerard; Wong, A.; Yang, C. S.: A Vector Space Model for Automatic Indexing. In: Communications of the ACM. New York: November 1975, P. 613--620. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Grust, Torsten: Accelerating XPath location steps. In: Proceedings of the 2002 ACM SIGMOD international conference on Management of data. New York: 2002, P. 109--120. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Deutsch, Alin; Fernandez, Mary; Suciu, Dan: Storing semistructured data with STORED. In: Proceedings of the International Conference on Management of Data (SIGMOD '99). New York: June 1999, P. 431--442. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Mitra, Mandar; Singhal, Amit; Buckley, Chris: Improving Automatic Query Expansion. In: Proceedings of the 21st Annual International ACM (SIGIR '98). August 1998, P. 206--214. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Fagin, Ronald: Combining Fuzzy Information from Multiple Systems. Proceedings of the Fifteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. June 1996. P. 216--226. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Ciaccia, Paolo; Patella, Marco; Zezula, Pavel: Processing Complex Similarity Queries with Distance-Based Access Methods. In: Proceedings of the 6th International Conference on Extending Database Technology, Valencia, Spain, March 23-27, 1998. Springer LNCS 1998. P. 9--23. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Schmidt, Albrecht, et al.: XMark: A Benchmark for XML Data Management.Google ScholarGoogle Scholar
  9. Grossman, David A.; Frieder, Ophir: Information Retrieval: Algorithms and Heuristics; Kluwer Academic Publishers 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Grabs, Torsten: Storage and Retrieval of XML Documents with a Cluster of Database Systems. Ph.D. dissertation, April 2003Google ScholarGoogle Scholar
  11. World Wide Web Consortium: XQuery 1.0: A Query Language for XML. http://www.w3.org/TR/xquery, November 2002Google ScholarGoogle Scholar
  12. World Wide Web Consortium: XML Path Language (XPath) Version 1.0. http://www.w3.org/TR/xpath, November 1999Google ScholarGoogle Scholar
  13. Abiteboul, Serge; Quass, Dallan; McHugh, Jason; Widom, Jennifer; Wiener, Janet: The Lorel Query Language for Semistructured Data. In: International Journal on Digital Libraries, 1997, P. 68--88Google ScholarGoogle ScholarCross RefCross Ref
  14. Chamberlin, Don; Robie, Jonathan; Florescu, Daniela: Quilt: An XML Query Language for Heterogeneous Data Sources. In: Selected Papers - The World Wide Web and Databases, Third International Workshop WebDB, Dallas, Texas, USA, 2000. Springer LNCS 2001, P. 1--25 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Database support for species extraction from the biosystematics literature: a feasibility demonstration

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CIKM '04: Proceedings of the thirteenth ACM international conference on Information and knowledge management
      November 2004
      678 pages
      ISBN:1581138741
      DOI:10.1145/1031171

      Copyright © 2004 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 13 November 2004

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Author Tags

      Qualifiers

      • Article

      Acceptance Rates

      Overall Acceptance Rate1,861of8,427submissions,22%

      Upcoming Conference

    • Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader