Skip to main content

A Dynamic Window Based Passage Extraction Algorithm for Genomics Information Retrieval

  • Conference paper
Foundations of Intelligent Systems (ISMIS 2008)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4994))

Included in the following conference series:


Passage retrieval is important for the users of the biomedical literature. How to extract a passage from a natural paragraph presents a challenge problem. In this paper, we focus on analyzing the gold standard of the TREC 2006 Genomics Track and simulating the distributions of standard passages. Hence, we present an efficient dynamic window based algorithm with a WordSentenceParsed method to extract passages. This algorithm has two important characteristics. First, we obtain the criteria for passage extraction through learning the gold standard, then do a comprehensive study on the 2006 and 2007 Genomics datasets. Second, the algorithm we proposed is dynamic with the criteria, which can adjust to the length of passage. Finally, we find that the proposed dynamic algorithm with the WordSentenceParsed method can boost the passage-level retrieval performance significantly on the 2006 and 2007 Genomics datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others


  1. Beaulieu, M., Gatford, M., Huang, X., Robertson, S.E., Walker, S., Williams, P.: (1996) Okapi at TREC-5. In: Proceedings of 5th Text REtrieval Conference. NIST Special Publication, Gaithersburg, pp. 143–166 (November 1997)

    Google Scholar 

  2. Hersh, W., Cohen, A., Yang, J.: TREC 2005 Genomics Track Overview. In: Proceedings of 14th Text REtrieval Conference. NIST Special Publication, Gaithersburg (November 2005)

    Google Scholar 

  3. Hersh, W., Cohen, A.M., Roberts, P.: TREC 2006 Genomics Track Overview. In: Proceedings of 15th Text REtrieval Conference,, November 2006, NIST Special Publication, Gaithersburg (2006)

    Google Scholar 

  4. Hersh, W., Cohen, A.M., Roberts, P.: TREC 2007 Genomics Track Overview. In: Proceedings of 16th Text REtrieval Conference, NIST Special Publication, Gaithersburg (November 2007)

    Google Scholar 

  5. Huang, X., Zhong, M., Luo, S.: York University at TREC 2005: Genomics Track. In: Proceedings of the 14th Text Retrieval Conference, NIST Special Publication, Gaithersburg (November 2005)

    Google Scholar 

  6. Huang, X., Hu, B., Rohian, H.: York University at TREC 2006: Genomics Track. In: Proceedings of the 15th Text Retrieval Conference, NIST Special Publication, Gaithersburg (November 2006)

    Google Scholar 

  7. Huang, X., Huang, Y., Wen, M., An, A., Liu, Y., Poon, J.: Applying Data Mining to Pseudo-Relevance Feedback for High Performance Text Retrieval. In: Perner, P. (ed.) ICDM 2006. LNCS (LNAI), vol. 4065, Springer, Heidelberg (2006)

    Google Scholar 

  8. Jiang, J., Zhai, C.: An Empirical Study of Tokenization Strategies for Biomedical Information Retrieval. In: Information Retrieval (2007)

    Google Scholar 

  9. Si, L., Kanungo, T., Huang, X.: Boosting Performance of Bio-Entity Recongition by Combining Results from Multiple Systems. In: Proceedings of the 5th ACM SIGKDD Workshop on Data Mining in Bioinformatics (2005)

    Google Scholar 

  10. Zhong, M., Huang, X.: Concept-Based Biomedical Text Retrieval. In: Proceedings of the 29th ACM SIGIR Conference, Washington, August 6-11 (2006)

    Google Scholar 

  11. Zhou, W., Yu, C., Neil, S., Vetle, T., Jie, H.: Knowledge-Intensive Conceptual Retrieval and Passage Extraction of Biomedical Literature. In: Proceedings of the 30th ACM SIGIR Conference, Amsterdam, July 23-27 (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations


Editor information

Aijun An Stan Matwin Zbigniew W. Raś Dominik Ślęzak

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hu, Q., Huang, X. (2008). A Dynamic Window Based Passage Extraction Algorithm for Genomics Information Retrieval. In: An, A., Matwin, S., Raś, Z.W., Ślęzak, D. (eds) Foundations of Intelligent Systems. ISMIS 2008. Lecture Notes in Computer Science(), vol 4994. Springer, Berlin, Heidelberg.

Download citation

  • DOI:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-68122-9

  • Online ISBN: 978-3-540-68123-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics