Skip to main content

Improving Text Retrieval in Medical Collections Through Automatic Categorization

  • Conference paper
Book cover String Processing and Information Retrieval (SPIRE 2003)

Abstract

A current and important research issue is the retrieval of relevant medical information. In fact, while the medical knowledge expands at a rate never observed before, its diffusion is slow. One of the main reasons is the difficulty in locating the relevant information in the modern and large medical text collections of today. In this work, we introduce a framework, based on Bayesian networks, that allows combining information derived from the text of the medical documents with information on the diseases related to these documents (obtained from an automatic categorization method). This leads to a new ranking formula which we evaluate using a medical reference collection, the OHSUMED collection. Our results indicate that this combination of evidences might yield considerable gains in retrieval performance. When the queries are strongly related to diseases, these gains might be as high as 84%. This shows that information generated by an automatic categorization procedure can be used effectively to improve the quality of the answers provided by an information retrieval (IR) system specialized in the medical domain.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Apte, C., Damerau, F., Weiss, S.M.: Automated Learning of Decision Rules for Text Categorization. ACM Transactions on Information Systems 12(3), 233–251 (1994)

    Article  Google Scholar 

  2. Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison Wesley Longman, Harlow (1999)

    Google Scholar 

  3. Broglio, J., Callan, J.P., Croft, W.B., Nachbar, D.W.: Document retrieval and routing using the inquery system. In: Proceedings of the Third Text Retrieval Conference - TREC-3, National Institute of Standards and Technology, Gaithersburg, Maryland, USA, pp. 241–256 (1995) (NIST Special Publication 500-225)

    Google Scholar 

  4. Callan, J.: Document filtering with inference networks. In: Proceedings of the 19th Annual Int’l ACM SIGIR Conference on Research and Development in Information Retrieval, Zurich, Switzerland, pp. 262–269 (1996)

    Google Scholar 

  5. Cohen, W.W., Singer, Y.: Context-Sensitive Learning Methods for Text Categorization. In: Proceedings of the 19th Annual Int’l ACM SIGIR Conference on Research and Development in Information Retrieval, Zurich, Switzerland, pp. 307–315 (1996)

    Google Scholar 

  6. Haines, D., Croft, W.B.: Relevance feedback and inference networks. In: Proceedings of the 16th Annual Int’l ACM SIGIR Conference on Research and Development in Information Retrieval, Pittsburgh, PA, USA, pp. 2–11 (1993)

    Google Scholar 

  7. Hersh, W., Buckley, C., Leone, T., Hickam, D.: OHSUMED: An interactive retrieval evaluation and new large test collection for research. In: Proceedings of the 17th Annual Int’l ACM SIGIR Conference on Research and Development in Information Retrieval, Dublin, Ireland, pp. 192–201 (1994)

    Google Scholar 

  8. Lam, W., Ruiz, M., Srinivasan, P.: Automatic Text Categorization and its Application to Text Retrieval. IEEE Transactions on Knowledge and Data Engineering 11(6), 865–879 (1999)

    Article  Google Scholar 

  9. Larkey, L.S., Croft, W.B.: Automatic assignment of ICD9 codes to discharge summaries. Technical report, Center for Intelligent Information Retrieval at University of Massachusetts, Amherst, Massachusetts (1995)

    Google Scholar 

  10. Larkey, L.S., Croft, W.B.: Combining Classifiers in Text Categorization. In: Proceedings of the 19th Annual Int’l ACM SIGIR Conference on Research and Development in Information Retrieval, Zurich, Switzerland, pp. 289–297 (1996)

    Google Scholar 

  11. Lewis, D.D., Schapire, R.E., Callan, J.P., Papka, R.: Training Algorithms for Linear Text Classifiers. In: Proceedings of the 19th Annual Int’l ACM SIGIR Conference on Research and Development in Information Retrieval, Zurich, Switzerland, pp. 298–306 (1996)

    Google Scholar 

  12. Lima, L.R.S., Laender, A.H.F., Ribeiro-Neto, B.: A Hierarchical Approach to the Automatic Categorization of Medical Documents. In: Proceedings of the 1998 ACM CIKM International Conference on Information and Knowledge Management, Bethesda, Maryland, USA, pp. 132–139 (1998)

    Google Scholar 

  13. Pearl, J.: Probabilistic Reasoning in Intellingent System: Networks of Plausible Inference. Morgan Kaufmann, San Francisco (1988)

    Google Scholar 

  14. Pestotnik, S.L.: Medical informatics: Meeting the information challenges of a changing health care system. Journal of Informed Pharmacotherapy 2(1) (2000)

    Google Scholar 

  15. Ribeiro-Neto, B., Laender, A.H.F., Lima, L.R.S.: An experimental study in automatically categorizing medical documents. Journal of the American Society for Information Science and Technology 52(5), 391–401 (2001)

    Article  Google Scholar 

  16. Ribeiro-Neto, B., Muntz, R.: A Belief Network Model for IR. In: Proceedings of the 19th Annual Int’l ACM SIGIR Conference on Research and Development in Information Retrieval, Zurich, Switzerland, pp. 253–260 (1996)

    Google Scholar 

  17. Ribeiro-Neto, B., Silva, I., Muntz, R.: Bayesian network models for information retrieval. In: Crestani, F., Pasi, G. (eds.) Soft Computing in Information Retrieval, pp. 259–291. Physica-Verlag, Heidelberg (2000)

    Google Scholar 

  18. Salton, G., Buckley, C.: Term-weighting approaches in automatic retrieval. Information Processing & Management 24(5), 513–523 (1988)

    Article  Google Scholar 

  19. Satomura, Y., Amaral, M.B.: Automated diagnostic indexing by natural language processing. Medical Informatics 17(3), 149–163 (1992)

    Article  Google Scholar 

  20. Silva, I., Ribeiro-Neto, B., Calado, P., Moura, E., Ziviani, N.: Link-based and Content-based Evidential Information in a Belief Network Model. In: Proceedings of the 23rd Annual Int’l ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 96–103, Athens, Greece (2000)

    Google Scholar 

  21. Turtle, H., Croft, W.B.: Evaluation of an inference network-based retrieval model. ACM Transactions on Information Systems 9(3), 187–222 (1991)

    Article  Google Scholar 

  22. Yang, Y.: Expert Network: Effective and Efficient Learning from Human Decisions in Text Categorization and Retrieval. In: Proceedings of the 17th Annual Int’l ACM SIGIR Conference on Research and Development in Information Retrieval, Dublin, Ireland, pp. 13–22 (1994)

    Google Scholar 

  23. Yang, Y., Chute, C.: An Application of Least Squares Fit Mapping to Text Information Retrieval. In: Proceedings of the 16th Annual Int’l ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 281–290 (1993)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Vale, R.F., Ribeiro-Neto, B.A., de Lima, L.R.S., Laender, A.H.F., Junior, H.R.F. (2003). Improving Text Retrieval in Medical Collections Through Automatic Categorization. In: Nascimento, M.A., de Moura, E.S., Oliveira, A.L. (eds) String Processing and Information Retrieval. SPIRE 2003. Lecture Notes in Computer Science, vol 2857. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39984-1_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-39984-1_15

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-20177-9

  • Online ISBN: 978-3-540-39984-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics