Skip to main content

Automatic Indexing from a Thesaurus Using Bayesian Networks: Application to the Classification of Parliamentary Initiatives

  • Conference paper
Symbolic and Quantitative Approaches to Reasoning with Uncertainty (ECSQARU 2007)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4724))

Abstract

We propose a method which, given a document to be classified, automatically generates an ordered set of appropriate descriptors extracted from a thesaurus. The method creates a Bayesian network to model the thesaurus and uses probabilistic inference to select the set of descriptors having high posterior probability of being relevant given the available evidence (the document to be classified). We apply the method to the classification of parliamentary initiatives in the regional Parliament of Andalucía at Spain from the Eurovoc thesaurus.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Adami, G., Avesani, P., Sona, D.: Clustering documents in a web directory. In: Proceedings of Fifth ACM Int. Workshop on Web Information and Data Management, pp. 66–73. ACM Press, New York (2003)

    Chapter  Google Scholar 

  2. Adami, G., Avesani, P., Sona, D.: Clustering documents into a web directory for bootstrapping a supervised classification. Data & Knowledge Engineering 54, 301–325 (2006)

    Article  Google Scholar 

  3. Chakrabarti, S., Dom, B., Agrawal, R., Raghavan, P.: Using taxonomy, discriminants, and signatures for navigating in text databases. In: Proceedings of the 23rd International Conference on Very Large Data Bases, pp. 446–455 (1997)

    Google Scholar 

  4. de Campos, L.M., Fernández-Luna, J.M., Huete, J.F.: The BNR model: foundations and performance of a Bayesian network-based retrieval model. International Journal of Approximate Reasoning 34, 265–285 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  5. Dumais, S., Chen, H.: Hierarchical classification of web document. In: Proceedings of the 23th ACM International Conference on Research and Development in Information Retrieval, pp. 256–263. ACM Press, New York (2000)

    Google Scholar 

  6. Golub, K.: Automated subject classification of textual web documents. Journal of Documentation 62(3), 350–371 (2006)

    Article  Google Scholar 

  7. Koller, D., Sahami, M.: Hierarchically classifying documents using very few words. In: Proceedings of the 14th International Conference on Machine Learning, pp. 170–178 (1997)

    Google Scholar 

  8. Larson, R.R.: Experiments in automatic library of congress classification. Journal of the American Society for Information Science 43(2), 130–148 (1992)

    Article  Google Scholar 

  9. Lauser, B., Hotho, A.: Automatic multi-label subject indexing in a multilingual envirinment. In: Koch, T., Sølvberg, I.T. (eds.) ECDL 2003, vol. 2769, pp. 140–151. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  10. Medelyan, O., Witten, I.: Thesaurus based automatic keyphrase indexing. In: Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries, pp. 296–297 (2006)

    Google Scholar 

  11. Moskovitch, R., Cohen-Kashi, S., Dror, U., Levy, I.: Multiple hierarchical classification of free-text clinical guidelines. Artificial Intelligence in Medicine 37(3), 177–190 (2006)

    Article  Google Scholar 

  12. Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan and Kaufmann, San Mateo (1988)

    MATH  Google Scholar 

  13. Ruiz, M., Srinivasan, P.: Hierarchical text categorization using neural networks. Information Retrieval 5(1), 87–118 (2002)

    Article  MATH  Google Scholar 

  14. Sebastiani, F.: Machine Learning in automated text categorization. ACM Computing Surveys 34, 1–47 (2002)

    Article  Google Scholar 

  15. Yang, Y.: An evaluation of statistical approaches to text categorization. Information Retrieval 1, 69–90 (1999)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

de Campos, L.M., Fernández-Luna, J.M., Huete, J.F., Romero, A.E. (2007). Automatic Indexing from a Thesaurus Using Bayesian Networks: Application to the Classification of Parliamentary Initiatives. In: Mellouli, K. (eds) Symbolic and Quantitative Approaches to Reasoning with Uncertainty. ECSQARU 2007. Lecture Notes in Computer Science(), vol 4724. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75256-1_75

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-75256-1_75

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-75255-4

  • Online ISBN: 978-3-540-75256-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics