Abstract
Automatic taxonomy generation deals with organizing text documents in terms of an unknown labeled hierarchy. The main issues here are (i) how to identify documents that have similar content, (ii) how to discover the hierarchical structure of the topics and subtopics, and (iii) how to find appropriate labels for each of the topics and subtopics. In this paper, we review several approaches to automatic taxonomy generation to provide an insight into the issues involved. We also describe how fuzzy hierarchies can overcome some of the problems associated with traditional crisp taxonomies.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Franzen, K., Karlgren, J.: Verbosity and interface design. Technical Report T2000:04, Swedish Institute of Computer Science (SICS) (2000)
Sanderson, M.: Word sense disambiguation and information retrieval. In: Proceedings of SIGIR. (1994) 142–151
Salton, G.: Cluster search strategies and the optimization of retrieval effectiveness. Prentice Hall, Englewood Cliffs, N.J. (1971)
Griffiths, A., Luckhurst, H., Willett, P.: Using inter-document similarity information in document retrieval systems. Journal of the American Society for Information Sciences 37 (1986) 3–11
Hearst, M.A., Pedersen, J.O.: Reexamining the cluster hypothesis: Scatter/gather on retrieval results. In: Proceedings of SIGIR, Zürich, CH (1996) 76–84
Zamir, O., Etzioni, O.: Web document clustering: A feasibility demonstration. In: Research and Development in Information Retrieval. (1998) 46–54
Selberg, E., Etzioni, O.: Multi-service search and comparison using the MetaCrawler. In: Proceedings of the 4th International World-Wide Web Conference, Darmstadt, Germany (1995)
Klir, G.J., Yuan, B.: Fuzzy sets and Fuzzy logic. Prentice Hall, Englewood Cliffs, New Jersey (1995)
Vaithyanathan, S., Dom, B.: Model selection in unsupervised learning with applications to document clustering. In: The Sixth International Conference on Machine Learning (ICML-1999). (1999) 423–433
Vaithyanathan, S., Dom, B.: Model-based hierarchical clustering. In: Proceedings of Sixth Conference on Uncertainty in Artificial Intelligence. (2000) 599–608
Nigam, K., McCallum, A.K., Thrun, S., Mitchell, T.M.: Learning to classify text from labeled and unlabeled documents. In: Proceedings of AAAI-98, 15th Conference of the American Association for Artificial Intelligence, Madison, US, AAAI Press, Menlo Park, US (1998) 792–799
Grefenstette, G.: Explorations in Automatic Thesaurus Discovery. Kluwer Academic Publishers (1994)
Hearst, M.A.: Automated discovery of WordNet relations. In Fellbaum, C., ed.: WordNet: an Electronic Lexical Database. MIT Press (1998)
Sanderson, M., W.B. Croft: Deriving concept hierarchies from text. In: Proceedings of SIGIR. (1999) 206–213
Lawrie, D., Croft, W.B., Rosenberg, A.: Finding topic words for hierarchical summarization. In: Proceedings of SIGIR, ACM Press (2001) 349–357
Krishna, K., Krishnapuram, R.: A clustering algorithm for asymmetrically related data with its applications to text mining. In: Proceedings of CIKM, Atlanta, USA (2001) 571–573
Xu, J., Croft, W.B.: Query expansion using local and global document analysis. In: Proceedings of SIGIR. (1996) 4–11
Baker, L.D., McCallum, A.K.: Distributional clustering of words for text classification. In: Proceedings of SIGIR, Melbourne, AU (1998) 96–103
Pereira, F.C.N., Tishby, N., Lee, L.: Distributional clustering of English words. In: Meeting of the Association for Computational Linguistics. (1993) 183–190
Dhillon, I.S.: Co-clustering documents and words using bipartite spectral graph partitioning. Technical Report TR2001-05, University of Texas, Austin (2001)
Kummamuru, K., Dhawale, A.K., Krishnapuram, R.: Fuzzy co-clustering of documents and keywords. In: Proceedings of FUZZIEEE, St. Louis, MO (2003)
Oh, C.H., Honda, K., Ichihashi, H.: Fuzzy clustering for categorical multivariate data. In: Proceedings of IFSA/NAFIPS, Vancouver, Canada (2001) 2154–2159
Bezdek, J.C., Hathaway, R.J.: Some notes on alternating optimization. In Pal, N.R., Sugeno, M., eds.: Advances in Soft Computing-AFSS 2002. Springer-Verlag (2002) 288–300
Frigui, H., Nasraoui, O.: Simultaneous categorization of text documents and identification of cluster-dependent keywords. In: Proceedings of FUZZIEEE, Honolulu, Hawaii (2002) 158–163
Frigui, H., Nasraoui, O.: Simultaneous clustering and attribute discrimination. In: Proceedings of FUZZIEEE, San Antonio (2000) 158–163
Mandhani, B., Joshi, S., Kummamuru, K.: A matrix density based algorithm to hierarchically co-cluster documents and words. In: Proceedings of WWW 2003 Conference, Budapest, Hungary (2003)
Oyanagi, S., Kubota, K., Nakase, A.: Application of matrix clustering to web log analysis and access prediction. In: Proceedings of WEBKDD, San Francisco (2001)
Liu, X., Gong, Y., Xu, W., Zhu, S.: Document clustering with cluster refinement and model selection capabilities. In: Proceedings of SIGIR, ACM Press (2002) 191–198
Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley-Interscience (1991)
Van Rijsbergen, C.J.: Information Retrieval, 2nd edition. Dept. of Computer Science, University of Glasgow (1979)
Chawathe, S.S.: Comparing hierarchical data in external memory. In: Proceedings of the Twenty-fifth International Conference on Very Large Data Bases, Edinburgh, Scotland, U.K. (1999) 90–101
Shasha, D., Zhang, K.: Approximate Tree Pattern Matching. Oxford University Press (1995)
Lee, D.H., Kim, M.H.: Database summarization using fuzzy ISA hierarchies. IEEE Trans. On Systems Man And Cybernetics Part B-Cybernetics 27 (1997) 68–78
Krishnapuram, R., Keller, J.M.: A possibilistic approach to clustering. IEEE Transactions on Fuzzy Systems 1 (1993) 98–110
Grefenstette, G.: SQLET: Short query linguistic expansion techniques: Palliating one or two-word queries by providing intermediate structure to text. In: Proceedings of RIAO. (1997)
Anick, P.G., Tipirneni., S.: The paraphrase search assistant: Terminological feedback for iterative information seeking. In: Proceedings of SIGIR. (1999) 153–159
Allan, J., Raghvan, H.: Using part-of-speech patterns to reduce query ambiguity. In: Proceedings of SIGIR, Tampere, Finland (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Krishnapuram, R., Kummamuru, K. (2003). Automatic Taxonomy Generation: Issues and Possibilities. In: Bilgiç, T., De Baets, B., Kaynak, O. (eds) Fuzzy Sets and Systems — IFSA 2003. IFSA 2003. Lecture Notes in Computer Science, vol 2715. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44967-1_5
Download citation
DOI: https://doi.org/10.1007/3-540-44967-1_5
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40383-8
Online ISBN: 978-3-540-44967-6
eBook Packages: Springer Book Archive