Abstract
Automatic attribute selection is a critical step when using Formal Concept Analysis (FCA) in a free text document retrieval framework. Optimal attributes as document descriptors should produce smaller, clearer and more browsable concept lattices with better clustering features. In this paper we focus on the automatic selection of noun phrases as document descriptors to build an FCA-based IR framework. We present three different phrase selection strategies which are evaluated using the Lattice Distillation Factor and the Minimal Browsing Area evaluation measures. Noun phrases are shown to produce lattices with good clustering properties, with the advantage (over simple terms) of being better intensional descriptors from the user’s point of view.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Brin, S., Page, L.: The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems 30(1-7), 107–117 (1998)
Carpineto, C., Romano, G.: Concept Data Analysis. Theory and Applications. Wiley, Chichester (2004) ISBN: 0-470-85055-8
Carpineto, C., Romano, G.: A Lattice Conceptual Clustering System and its Application to Browsing Retrieval. Machine Learning 24, 95–122 (1996)
Cigarrán, J.M., Gonzalo, J., Peñas, A., Verdejo, F.: Browsing Search Results via Formal Concept Analysis: Automatic Selection of Attributes. In: Eklund, P. (ed.) ICFCA 2004. LNCS (LNAI), vol. 2961, pp. 74–87. Springer, Heidelberg (2004)
Cole, R.J.: The management and visualization of document collections using Formal Concept Analysis. Ph. D. Thesis, Griffith University (2000)
Cole, R.J., Eklund, P.W.: Application of Formal Concept Analysis to Information Retrieval using a Hierarchically structured thesaurus
Cole, R.J., Eklund, P.W.: A Knowledge Representation for Information Filtering Using Formal Concept Analysis. Linkoping Electronic Articles in Computer and Information Science 5(5) (2000)
Cole, R.J., Eklund, P.W.: Scalability in Formal Concept Analysis. Computational Intelligence 15(1), 11–27 (1999)
Cole, R., Eklund, P., Amardeilh, F.: Browsing Semi-structured Texts on the web using Formal Concept Analysis. Web Intelligence (2003)
Docco Project home page, http://tockit.sourceforge.net/docco/
Godin, R., Missaoui, R., April, A.: Experimental Comparision of Navigation in a Galois Lattice with Conventional Information Retrieval Methods. Int. J. Man-Machine Studies 38, 747–767 (1993)
Godin, R., Gecsel, J., Pichet, C.: Design of a Browsing Interface for Information Retrieval. In: 12th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM SIGIR Forum, Cambridge, MA, pp. 32–39 (1989)
Peñas, A., Verdejo, F., Gonzalo, J.: Terminology Retrieval: towards a synergy between thesaurus and free text searching. In: Garijo, F.J., Riquelme, J.-C., Toro, M. (eds.) IBERAMIA 2002. LNCS (LNAI), vol. 2527, pp. 684–693. Springer, Heidelberg (2002)
Peñas, A., Verdejo, F., Gonzalo, J.: Corpus-based terminology extraction applied to information access. In: Proceedings of the Corpus Linguistics 2001, Technical Papers, Special Issue. University Centre for Computer Corpus Research on Language, Lancaster University, vol. 13, pp. 458–465 (2001)
Peñas, A., Gonzalo, J., Verdejo, F.: Cross-Language Information Access through Phrase Browsing. Applications of Natural Language to Information Systems. In: Proceedings of 6th International Workshop NLDB 2001, Madrid, P-3, 121–130. Lecture Notes in Informatics (LNI), Series of the German Informatics, GI-Edition (2001)
Priss, U.: Lattice-based Information Retrieval. Knowledge Organization 27(3), 132–142 (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Cigarrán, J.M., Peñas, A., Gonzalo, J., Verdejo, F. (2005). Automatic Selection of Noun Phrases as Document Descriptors in an FCA-Based Information Retrieval System. In: Ganter, B., Godin, R. (eds) Formal Concept Analysis. ICFCA 2005. Lecture Notes in Computer Science(), vol 3403. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-32262-7_4
Download citation
DOI: https://doi.org/10.1007/978-3-540-32262-7_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24525-4
Online ISBN: 978-3-540-32262-7
eBook Packages: Computer ScienceComputer Science (R0)