Skip to main content

Towards the Automatic Construction of Conceptual Taxonomies

  • Conference paper
Data Warehousing and Knowledge Discovery (DaWaK 2008)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5182))

Included in the following conference series:

Abstract

In this paper we investigate the possibility of an automatic construction of conceptual taxonomies and evaluate the achievable results. The hierarchy is performed by Ward algorithm, guided by Goodman-Kruskal τ as proximity measure. Then, we provide a concise description of each cluster by a keyword representative selected by PageRank.

The obtained hierarchy has the same advantages - both descriptive and operative - of indices on keywords which partition a set of documents with respect to their content.

We performed experiments in a real case - the abstracts of the papers published in ACM TODS in which the papers have been manually classified into the ACM Computing Taxonomy (CT). We evaluated objectively the generated hierarchy by two methods: Jaccard measure and entropy. We obtained good results by both the methods. Finally we evaluated the capability to classify in the categories of the two taxonomies showing that KH provides a greater facility than CT.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aggarwal, C.C., Gates, S.C., Yu, P.S.: On the merits of building categorization systems by supervised clustering. In: Proc. of 5th ACM Int. Conf. on Knowledge Discovery and Data Mining, San Diego, US, pp. 352–356 (1999)

    Google Scholar 

  2. Anderberg, M.R.: Cluster analysis for applications, 2nd edn. Academic (1973)

    Google Scholar 

  3. Chakrabarti, S., Dom, B., Agrawal, R., Raghavan, P.: Scalable feature selection, classification and signature generation for organizing large text databases into hierarchical topic taxonomies. VLDB Journal 7(3), 163–178 (1998)

    Article  Google Scholar 

  4. Clifton, C., Cooley, R., Rennie, J.: Topcat: Data mining for topic identification in a text corpus. IEEE Trans. Knowledge and Data Engineering 16(8), 949–964 (2004)

    Article  Google Scholar 

  5. Dhillon, I.S., Modha, D.S.: Concept decompositions for large sparse text data using clustering. Mach. Learn. 42(1/2), 143–175 (2001)

    Article  MATH  Google Scholar 

  6. Gates, S.C., Teiken, W., Cheng, K.-S.F.: Taxonomies by the numbers: building high-performance taxonomies. In: ACM CIKM 2005: Proc. of the 14th ACM international conference on Information and knowledge management, pp. 568–577 (2005)

    Google Scholar 

  7. Goodman, L.A., Kruskal, W.H.: Measures of association for cross classifications. Journal American Statistical Association 49(268), 732–764 (1954)

    Article  MATH  Google Scholar 

  8. Hatzivassiloglou, V., Gravano, L., Maganti, A.: An investigation of linguistic features and clustering algorithms for topical document clustering. In: ACM SIGIR 2000, pp. 224–231 (2000)

    Google Scholar 

  9. Hofmann, T.: The cluster-abstraction model: Unsupervised learning of topic hierarchies from text data. In: IJCAI, pp. 682–687 (1999)

    Google Scholar 

  10. Ienco, D., Meo, R.: Exploration and reduction of the feature space by hierarchical clustering. In: SDM 2008 (2008)

    Google Scholar 

  11. Lewis, D.D.: Evaluating text categorization. In: Proc. Speech and Natural Language Workshop, HLT (1991)

    Google Scholar 

  12. Mihalcea, R.: Unsupervised large-vocabulary word sense disambiguation with graph-based algorithms for sequence data labeling. In: HLT/EMNLP 2005 (2005)

    Google Scholar 

  13. Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems 30 (1998)

    Google Scholar 

  14. Michalski, R.S., Stepp, R.E.: Learning from observation: Conceptual clustering. Machine Learning: An Artificial Intelligence Approach, 331–363 (1983)

    Google Scholar 

  15. Sanderson, M., Croft, W.B.: Deriving concept hierarchies from text. In: Research and Development in Information Retrieval, pp. 206–213 (1999)

    Google Scholar 

  16. Segal, E., Koller, D., Ormoneit, D.: Probabilistic abstraction hierarchies. In: Proc. NIPS 2001 (2001)

    Google Scholar 

  17. Zamir, O., Etzioni, O., Madani, O., Karp, R.M.: Fast and intuitive clustering of web documents. In: SIGACM KDD Conference, pp. 287–290 (1997)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Il-Yeol Song Johann Eder Tho Manh Nguyen

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ienco, D., Meo, R. (2008). Towards the Automatic Construction of Conceptual Taxonomies. In: Song, IY., Eder, J., Nguyen, T.M. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2008. Lecture Notes in Computer Science, vol 5182. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85836-2_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-85836-2_31

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-85835-5

  • Online ISBN: 978-3-540-85836-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics