Skip to main content

Automatic Taxonomy Generation: Issues and Possibilities

  • Conference paper
  • First Online:
Fuzzy Sets and Systems — IFSA 2003 (IFSA 2003)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2715))

Included in the following conference series:

Abstract

Automatic taxonomy generation deals with organizing text documents in terms of an unknown labeled hierarchy. The main issues here are (i) how to identify documents that have similar content, (ii) how to discover the hierarchical structure of the topics and subtopics, and (iii) how to find appropriate labels for each of the topics and subtopics. In this paper, we review several approaches to automatic taxonomy generation to provide an insight into the issues involved. We also describe how fuzzy hierarchies can overcome some of the problems associated with traditional crisp taxonomies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Franzen, K., Karlgren, J.: Verbosity and interface design. Technical Report T2000:04, Swedish Institute of Computer Science (SICS) (2000)

    Google Scholar 

  2. Sanderson, M.: Word sense disambiguation and information retrieval. In: Proceedings of SIGIR. (1994) 142–151

    Google Scholar 

  3. Salton, G.: Cluster search strategies and the optimization of retrieval effectiveness. Prentice Hall, Englewood Cliffs, N.J. (1971)

    Google Scholar 

  4. Griffiths, A., Luckhurst, H., Willett, P.: Using inter-document similarity information in document retrieval systems. Journal of the American Society for Information Sciences 37 (1986) 3–11

    Article  Google Scholar 

  5. Hearst, M.A., Pedersen, J.O.: Reexamining the cluster hypothesis: Scatter/gather on retrieval results. In: Proceedings of SIGIR, Zürich, CH (1996) 76–84

    Google Scholar 

  6. Zamir, O., Etzioni, O.: Web document clustering: A feasibility demonstration. In: Research and Development in Information Retrieval. (1998) 46–54

    Google Scholar 

  7. Selberg, E., Etzioni, O.: Multi-service search and comparison using the MetaCrawler. In: Proceedings of the 4th International World-Wide Web Conference, Darmstadt, Germany (1995)

    Google Scholar 

  8. Klir, G.J., Yuan, B.: Fuzzy sets and Fuzzy logic. Prentice Hall, Englewood Cliffs, New Jersey (1995)

    MATH  Google Scholar 

  9. Vaithyanathan, S., Dom, B.: Model selection in unsupervised learning with applications to document clustering. In: The Sixth International Conference on Machine Learning (ICML-1999). (1999) 423–433

    Google Scholar 

  10. Vaithyanathan, S., Dom, B.: Model-based hierarchical clustering. In: Proceedings of Sixth Conference on Uncertainty in Artificial Intelligence. (2000) 599–608

    Google Scholar 

  11. Nigam, K., McCallum, A.K., Thrun, S., Mitchell, T.M.: Learning to classify text from labeled and unlabeled documents. In: Proceedings of AAAI-98, 15th Conference of the American Association for Artificial Intelligence, Madison, US, AAAI Press, Menlo Park, US (1998) 792–799

    Google Scholar 

  12. Grefenstette, G.: Explorations in Automatic Thesaurus Discovery. Kluwer Academic Publishers (1994)

    Google Scholar 

  13. Hearst, M.A.: Automated discovery of WordNet relations. In Fellbaum, C., ed.: WordNet: an Electronic Lexical Database. MIT Press (1998)

    Google Scholar 

  14. Sanderson, M., W.B. Croft: Deriving concept hierarchies from text. In: Proceedings of SIGIR. (1999) 206–213

    Google Scholar 

  15. Lawrie, D., Croft, W.B., Rosenberg, A.: Finding topic words for hierarchical summarization. In: Proceedings of SIGIR, ACM Press (2001) 349–357

    Google Scholar 

  16. Krishna, K., Krishnapuram, R.: A clustering algorithm for asymmetrically related data with its applications to text mining. In: Proceedings of CIKM, Atlanta, USA (2001) 571–573

    Google Scholar 

  17. Xu, J., Croft, W.B.: Query expansion using local and global document analysis. In: Proceedings of SIGIR. (1996) 4–11

    Google Scholar 

  18. Baker, L.D., McCallum, A.K.: Distributional clustering of words for text classification. In: Proceedings of SIGIR, Melbourne, AU (1998) 96–103

    Google Scholar 

  19. Pereira, F.C.N., Tishby, N., Lee, L.: Distributional clustering of English words. In: Meeting of the Association for Computational Linguistics. (1993) 183–190

    Google Scholar 

  20. Dhillon, I.S.: Co-clustering documents and words using bipartite spectral graph partitioning. Technical Report TR2001-05, University of Texas, Austin (2001)

    Google Scholar 

  21. Kummamuru, K., Dhawale, A.K., Krishnapuram, R.: Fuzzy co-clustering of documents and keywords. In: Proceedings of FUZZIEEE, St. Louis, MO (2003)

    Google Scholar 

  22. Oh, C.H., Honda, K., Ichihashi, H.: Fuzzy clustering for categorical multivariate data. In: Proceedings of IFSA/NAFIPS, Vancouver, Canada (2001) 2154–2159

    Google Scholar 

  23. Bezdek, J.C., Hathaway, R.J.: Some notes on alternating optimization. In Pal, N.R., Sugeno, M., eds.: Advances in Soft Computing-AFSS 2002. Springer-Verlag (2002) 288–300

    Google Scholar 

  24. Frigui, H., Nasraoui, O.: Simultaneous categorization of text documents and identification of cluster-dependent keywords. In: Proceedings of FUZZIEEE, Honolulu, Hawaii (2002) 158–163

    Google Scholar 

  25. Frigui, H., Nasraoui, O.: Simultaneous clustering and attribute discrimination. In: Proceedings of FUZZIEEE, San Antonio (2000) 158–163

    Google Scholar 

  26. Mandhani, B., Joshi, S., Kummamuru, K.: A matrix density based algorithm to hierarchically co-cluster documents and words. In: Proceedings of WWW 2003 Conference, Budapest, Hungary (2003)

    Google Scholar 

  27. Oyanagi, S., Kubota, K., Nakase, A.: Application of matrix clustering to web log analysis and access prediction. In: Proceedings of WEBKDD, San Francisco (2001)

    Google Scholar 

  28. Liu, X., Gong, Y., Xu, W., Zhu, S.: Document clustering with cluster refinement and model selection capabilities. In: Proceedings of SIGIR, ACM Press (2002) 191–198

    Google Scholar 

  29. Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley-Interscience (1991)

    Google Scholar 

  30. Van Rijsbergen, C.J.: Information Retrieval, 2nd edition. Dept. of Computer Science, University of Glasgow (1979)

    Google Scholar 

  31. Chawathe, S.S.: Comparing hierarchical data in external memory. In: Proceedings of the Twenty-fifth International Conference on Very Large Data Bases, Edinburgh, Scotland, U.K. (1999) 90–101

    Google Scholar 

  32. Shasha, D., Zhang, K.: Approximate Tree Pattern Matching. Oxford University Press (1995)

    Google Scholar 

  33. Lee, D.H., Kim, M.H.: Database summarization using fuzzy ISA hierarchies. IEEE Trans. On Systems Man And Cybernetics Part B-Cybernetics 27 (1997) 68–78

    Article  Google Scholar 

  34. Krishnapuram, R., Keller, J.M.: A possibilistic approach to clustering. IEEE Transactions on Fuzzy Systems 1 (1993) 98–110

    Article  Google Scholar 

  35. Grefenstette, G.: SQLET: Short query linguistic expansion techniques: Palliating one or two-word queries by providing intermediate structure to text. In: Proceedings of RIAO. (1997)

    Google Scholar 

  36. Anick, P.G., Tipirneni., S.: The paraphrase search assistant: Terminological feedback for iterative information seeking. In: Proceedings of SIGIR. (1999) 153–159

    Google Scholar 

  37. Allan, J., Raghvan, H.: Using part-of-speech patterns to reduce query ambiguity. In: Proceedings of SIGIR, Tampere, Finland (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Krishnapuram, R., Kummamuru, K. (2003). Automatic Taxonomy Generation: Issues and Possibilities. In: Bilgiç, T., De Baets, B., Kaynak, O. (eds) Fuzzy Sets and Systems — IFSA 2003. IFSA 2003. Lecture Notes in Computer Science, vol 2715. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44967-1_5

Download citation

  • DOI: https://doi.org/10.1007/3-540-44967-1_5

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-40383-8

  • Online ISBN: 978-3-540-44967-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics