Abstract
We propose a method which, given a document to be classified, automatically generates an ordered set of appropriate descriptors extracted from a thesaurus. The method creates a Bayesian network to model the thesaurus and uses probabilistic inference to select the set of descriptors having high posterior probability of being relevant given the available evidence (the document to be classified). We apply the method to the classification of parliamentary initiatives in the regional Parliament of Andalucía at Spain from the Eurovoc thesaurus.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Adami, G., Avesani, P., Sona, D.: Clustering documents in a web directory. In: Proceedings of Fifth ACM Int. Workshop on Web Information and Data Management, pp. 66–73. ACM Press, New York (2003)
Adami, G., Avesani, P., Sona, D.: Clustering documents into a web directory for bootstrapping a supervised classification. Data & Knowledge Engineering 54, 301–325 (2006)
Chakrabarti, S., Dom, B., Agrawal, R., Raghavan, P.: Using taxonomy, discriminants, and signatures for navigating in text databases. In: Proceedings of the 23rd International Conference on Very Large Data Bases, pp. 446–455 (1997)
de Campos, L.M., Fernández-Luna, J.M., Huete, J.F.: The BNR model: foundations and performance of a Bayesian network-based retrieval model. International Journal of Approximate Reasoning 34, 265–285 (2003)
Dumais, S., Chen, H.: Hierarchical classification of web document. In: Proceedings of the 23th ACM International Conference on Research and Development in Information Retrieval, pp. 256–263. ACM Press, New York (2000)
Golub, K.: Automated subject classification of textual web documents. Journal of Documentation 62(3), 350–371 (2006)
Koller, D., Sahami, M.: Hierarchically classifying documents using very few words. In: Proceedings of the 14th International Conference on Machine Learning, pp. 170–178 (1997)
Larson, R.R.: Experiments in automatic library of congress classification. Journal of the American Society for Information Science 43(2), 130–148 (1992)
Lauser, B., Hotho, A.: Automatic multi-label subject indexing in a multilingual envirinment. In: Koch, T., Sølvberg, I.T. (eds.) ECDL 2003, vol. 2769, pp. 140–151. Springer, Heidelberg (2003)
Medelyan, O., Witten, I.: Thesaurus based automatic keyphrase indexing. In: Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries, pp. 296–297 (2006)
Moskovitch, R., Cohen-Kashi, S., Dror, U., Levy, I.: Multiple hierarchical classification of free-text clinical guidelines. Artificial Intelligence in Medicine 37(3), 177–190 (2006)
Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan and Kaufmann, San Mateo (1988)
Ruiz, M., Srinivasan, P.: Hierarchical text categorization using neural networks. Information Retrieval 5(1), 87–118 (2002)
Sebastiani, F.: Machine Learning in automated text categorization. ACM Computing Surveys 34, 1–47 (2002)
Yang, Y.: An evaluation of statistical approaches to text categorization. Information Retrieval 1, 69–90 (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
de Campos, L.M., Fernández-Luna, J.M., Huete, J.F., Romero, A.E. (2007). Automatic Indexing from a Thesaurus Using Bayesian Networks: Application to the Classification of Parliamentary Initiatives. In: Mellouli, K. (eds) Symbolic and Quantitative Approaches to Reasoning with Uncertainty. ECSQARU 2007. Lecture Notes in Computer Science(), vol 4724. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75256-1_75
Download citation
DOI: https://doi.org/10.1007/978-3-540-75256-1_75
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-75255-4
Online ISBN: 978-3-540-75256-1
eBook Packages: Computer ScienceComputer Science (R0)