Abstract
A current and important research issue is the retrieval of relevant medical information. In fact, while the medical knowledge expands at a rate never observed before, its diffusion is slow. One of the main reasons is the difficulty in locating the relevant information in the modern and large medical text collections of today. In this work, we introduce a framework, based on Bayesian networks, that allows combining information derived from the text of the medical documents with information on the diseases related to these documents (obtained from an automatic categorization method). This leads to a new ranking formula which we evaluate using a medical reference collection, the OHSUMED collection. Our results indicate that this combination of evidences might yield considerable gains in retrieval performance. When the queries are strongly related to diseases, these gains might be as high as 84%. This shows that information generated by an automatic categorization procedure can be used effectively to improve the quality of the answers provided by an information retrieval (IR) system specialized in the medical domain.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Apte, C., Damerau, F., Weiss, S.M.: Automated Learning of Decision Rules for Text Categorization. ACM Transactions on Information Systems 12(3), 233–251 (1994)
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison Wesley Longman, Harlow (1999)
Broglio, J., Callan, J.P., Croft, W.B., Nachbar, D.W.: Document retrieval and routing using the inquery system. In: Proceedings of the Third Text Retrieval Conference - TREC-3, National Institute of Standards and Technology, Gaithersburg, Maryland, USA, pp. 241–256 (1995) (NIST Special Publication 500-225)
Callan, J.: Document filtering with inference networks. In: Proceedings of the 19th Annual Int’l ACM SIGIR Conference on Research and Development in Information Retrieval, Zurich, Switzerland, pp. 262–269 (1996)
Cohen, W.W., Singer, Y.: Context-Sensitive Learning Methods for Text Categorization. In: Proceedings of the 19th Annual Int’l ACM SIGIR Conference on Research and Development in Information Retrieval, Zurich, Switzerland, pp. 307–315 (1996)
Haines, D., Croft, W.B.: Relevance feedback and inference networks. In: Proceedings of the 16th Annual Int’l ACM SIGIR Conference on Research and Development in Information Retrieval, Pittsburgh, PA, USA, pp. 2–11 (1993)
Hersh, W., Buckley, C., Leone, T., Hickam, D.: OHSUMED: An interactive retrieval evaluation and new large test collection for research. In: Proceedings of the 17th Annual Int’l ACM SIGIR Conference on Research and Development in Information Retrieval, Dublin, Ireland, pp. 192–201 (1994)
Lam, W., Ruiz, M., Srinivasan, P.: Automatic Text Categorization and its Application to Text Retrieval. IEEE Transactions on Knowledge and Data Engineering 11(6), 865–879 (1999)
Larkey, L.S., Croft, W.B.: Automatic assignment of ICD9 codes to discharge summaries. Technical report, Center for Intelligent Information Retrieval at University of Massachusetts, Amherst, Massachusetts (1995)
Larkey, L.S., Croft, W.B.: Combining Classifiers in Text Categorization. In: Proceedings of the 19th Annual Int’l ACM SIGIR Conference on Research and Development in Information Retrieval, Zurich, Switzerland, pp. 289–297 (1996)
Lewis, D.D., Schapire, R.E., Callan, J.P., Papka, R.: Training Algorithms for Linear Text Classifiers. In: Proceedings of the 19th Annual Int’l ACM SIGIR Conference on Research and Development in Information Retrieval, Zurich, Switzerland, pp. 298–306 (1996)
Lima, L.R.S., Laender, A.H.F., Ribeiro-Neto, B.: A Hierarchical Approach to the Automatic Categorization of Medical Documents. In: Proceedings of the 1998 ACM CIKM International Conference on Information and Knowledge Management, Bethesda, Maryland, USA, pp. 132–139 (1998)
Pearl, J.: Probabilistic Reasoning in Intellingent System: Networks of Plausible Inference. Morgan Kaufmann, San Francisco (1988)
Pestotnik, S.L.: Medical informatics: Meeting the information challenges of a changing health care system. Journal of Informed Pharmacotherapy 2(1) (2000)
Ribeiro-Neto, B., Laender, A.H.F., Lima, L.R.S.: An experimental study in automatically categorizing medical documents. Journal of the American Society for Information Science and Technology 52(5), 391–401 (2001)
Ribeiro-Neto, B., Muntz, R.: A Belief Network Model for IR. In: Proceedings of the 19th Annual Int’l ACM SIGIR Conference on Research and Development in Information Retrieval, Zurich, Switzerland, pp. 253–260 (1996)
Ribeiro-Neto, B., Silva, I., Muntz, R.: Bayesian network models for information retrieval. In: Crestani, F., Pasi, G. (eds.) Soft Computing in Information Retrieval, pp. 259–291. Physica-Verlag, Heidelberg (2000)
Salton, G., Buckley, C.: Term-weighting approaches in automatic retrieval. Information Processing & Management 24(5), 513–523 (1988)
Satomura, Y., Amaral, M.B.: Automated diagnostic indexing by natural language processing. Medical Informatics 17(3), 149–163 (1992)
Silva, I., Ribeiro-Neto, B., Calado, P., Moura, E., Ziviani, N.: Link-based and Content-based Evidential Information in a Belief Network Model. In: Proceedings of the 23rd Annual Int’l ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 96–103, Athens, Greece (2000)
Turtle, H., Croft, W.B.: Evaluation of an inference network-based retrieval model. ACM Transactions on Information Systems 9(3), 187–222 (1991)
Yang, Y.: Expert Network: Effective and Efficient Learning from Human Decisions in Text Categorization and Retrieval. In: Proceedings of the 17th Annual Int’l ACM SIGIR Conference on Research and Development in Information Retrieval, Dublin, Ireland, pp. 13–22 (1994)
Yang, Y., Chute, C.: An Application of Least Squares Fit Mapping to Text Information Retrieval. In: Proceedings of the 16th Annual Int’l ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 281–290 (1993)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Vale, R.F., Ribeiro-Neto, B.A., de Lima, L.R.S., Laender, A.H.F., Junior, H.R.F. (2003). Improving Text Retrieval in Medical Collections Through Automatic Categorization. In: Nascimento, M.A., de Moura, E.S., Oliveira, A.L. (eds) String Processing and Information Retrieval. SPIRE 2003. Lecture Notes in Computer Science, vol 2857. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39984-1_15
Download citation
DOI: https://doi.org/10.1007/978-3-540-39984-1_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20177-9
Online ISBN: 978-3-540-39984-1
eBook Packages: Springer Book Archive