Towards the Automatic Construction of Conceptual Taxonomies

Ienco, Dino; Meo, Rosa

doi:10.1007/978-3-540-85836-2_31

Dino Ienco¹ &
Rosa Meo¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5182))

Included in the following conference series:

International Conference on Data Warehousing and Knowledge Discovery

1792 Accesses
5 Citations

Abstract

In this paper we investigate the possibility of an automatic construction of conceptual taxonomies and evaluate the achievable results. The hierarchy is performed by Ward algorithm, guided by Goodman-Kruskal τ as proximity measure. Then, we provide a concise description of each cluster by a keyword representative selected by PageRank.

The obtained hierarchy has the same advantages - both descriptive and operative - of indices on keywords which partition a set of documents with respect to their content.

We performed experiments in a real case - the abstracts of the papers published in ACM TODS in which the papers have been manually classified into the ACM Computing Taxonomy (CT). We evaluated objectively the generated hierarchy by two methods: Jaccard measure and entropy. We obtained good results by both the methods. Finally we evaluated the capability to classify in the categories of the two taxonomies showing that KH provides a greater facility than CT.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aggarwal, C.C., Gates, S.C., Yu, P.S.: On the merits of building categorization systems by supervised clustering. In: Proc. of 5th ACM Int. Conf. on Knowledge Discovery and Data Mining, San Diego, US, pp. 352–356 (1999)
Google Scholar
Anderberg, M.R.: Cluster analysis for applications, 2nd edn. Academic (1973)
Google Scholar
Chakrabarti, S., Dom, B., Agrawal, R., Raghavan, P.: Scalable feature selection, classification and signature generation for organizing large text databases into hierarchical topic taxonomies. VLDB Journal 7(3), 163–178 (1998)
Article Google Scholar
Clifton, C., Cooley, R., Rennie, J.: Topcat: Data mining for topic identification in a text corpus. IEEE Trans. Knowledge and Data Engineering 16(8), 949–964 (2004)
Article Google Scholar
Dhillon, I.S., Modha, D.S.: Concept decompositions for large sparse text data using clustering. Mach. Learn. 42(1/2), 143–175 (2001)
Article MATH Google Scholar
Gates, S.C., Teiken, W., Cheng, K.-S.F.: Taxonomies by the numbers: building high-performance taxonomies. In: ACM CIKM 2005: Proc. of the 14th ACM international conference on Information and knowledge management, pp. 568–577 (2005)
Google Scholar
Goodman, L.A., Kruskal, W.H.: Measures of association for cross classifications. Journal American Statistical Association 49(268), 732–764 (1954)
Article MATH Google Scholar
Hatzivassiloglou, V., Gravano, L., Maganti, A.: An investigation of linguistic features and clustering algorithms for topical document clustering. In: ACM SIGIR 2000, pp. 224–231 (2000)
Google Scholar
Hofmann, T.: The cluster-abstraction model: Unsupervised learning of topic hierarchies from text data. In: IJCAI, pp. 682–687 (1999)
Google Scholar
Ienco, D., Meo, R.: Exploration and reduction of the feature space by hierarchical clustering. In: SDM 2008 (2008)
Google Scholar
Lewis, D.D.: Evaluating text categorization. In: Proc. Speech and Natural Language Workshop, HLT (1991)
Google Scholar
Mihalcea, R.: Unsupervised large-vocabulary word sense disambiguation with graph-based algorithms for sequence data labeling. In: HLT/EMNLP 2005 (2005)
Google Scholar
Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems 30 (1998)
Google Scholar
Michalski, R.S., Stepp, R.E.: Learning from observation: Conceptual clustering. Machine Learning: An Artificial Intelligence Approach, 331–363 (1983)
Google Scholar
Sanderson, M., Croft, W.B.: Deriving concept hierarchies from text. In: Research and Development in Information Retrieval, pp. 206–213 (1999)
Google Scholar
Segal, E., Koller, D., Ormoneit, D.: Probabilistic abstraction hierarchies. In: Proc. NIPS 2001 (2001)
Google Scholar
Zamir, O., Etzioni, O., Madani, O., Karp, R.M.: Fast and intuitive clustering of web documents. In: SIGACM KDD Conference, pp. 287–290 (1997)
Google Scholar

Download references

Author information

Authors and Affiliations

Dipartimento di Informatica, Università di Torino, Italy
Dino Ienco & Rosa Meo

Authors

Dino Ienco
View author publications
You can also search for this author in PubMed Google Scholar
Rosa Meo
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Il-Yeol Song Johann Eder Tho Manh Nguyen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ienco, D., Meo, R. (2008). Towards the Automatic Construction of Conceptual Taxonomies. In: Song, IY., Eder, J., Nguyen, T.M. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2008. Lecture Notes in Computer Science, vol 5182. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85836-2_31

Download citation

DOI: https://doi.org/10.1007/978-3-540-85836-2_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85835-5
Online ISBN: 978-3-540-85836-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics