Abstract
In this paper, we investigate the problem of quality analysis of clustering results using semantic annotations given by experts. We propose a novel approach to construction of evaluation measure, called SEE (Semantic Evaluation by Exploration), which is an improvement of the existing methods such as Rand Index or Normalized Mutual Information. We illustrate the proposed evaluation method on the freely accessible biomedical research articles from Pubmed Central (PMC). Many articles from Pubmed Central are annotated by the experts using Medical Subject Headings (MeSH) thesaurus. We compare different semantic techniques for search result clustering using the proposed measure.
This work is partially supported by the National Centre for Research and Development (NCBiR) under Grant No. SP/I/1/77065/10 by the Strategic scientific research and experimental development program: “Inter-disciplinary System for Interactive Scientific and Scientific-Technical Information”.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Assent, I., Krieger, R., Müller, E., Seidl, T.: Visa: visual subspace clustering analysis. SIGKDD Explor. Newsl. 9(2), 5–12 (2007)
Böhm, C., Kailing, K., Kriegel, H.-P., Kröger, P.: Density connected clustering with local subspace preferences. In: ICDM, pp. 27–34. IEEE Computer Society (2004)
Cao, T., Do, H., Hong, D., Quan, T.: Fuzzy named entity-based document clustering. In: Proc. of the 17th IEEE International Conference on Fuzzy Systems (FUZZ-IEEE 2008), pp. 2028–2034 (2008)
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer Series in Statistics. Springer New York Inc. (2001)
Herskovic, J.R., Tanaka, L.Y., Hersh, W., Bernstam, E.V.: A day in the life of pubmed: analysis of a typical day’s query log. Journal of the American Medical Informatics Association, 212–220 (2007)
Kriegel, H.-P., Kroger, P., Renz, M., Wurst, S.: A generic framework for efficient subspace clustering of high-dimensional data. In: Proceedings of the Fifth IEEE International Conference on Data Mining, ICDM 2005, pp. 250–257. IEEE Computer Society, Washington, DC (2005)
Kröger, P., Kriegel, H.-P., Kailing, K.: Density-connected subspace clustering for high-dimensional data. In: Berry, M.W., Dayal, U., Kamath, C., Skillicorn, D.B. (eds.) SDM. SIAM (2004)
Lancichinetti, A., Fortunato, S., Kertész, J.: Detecting the overlapping and hierarchical community structure of complex networks. New Journal of Physics 11, 033015 (2009)
MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Cam, L.M.L., Neyman, J. (eds.) Proc. of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297. University of California Press (1967)
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval (2007)
Nguyen, H.S., Nguyen, S.H., Swieboda, W.: Semantic explorative evaluation of document clustering algorithms. In: Ganzha, M., Maciaszek, L.A., Paprzycki, M. (eds.) FedCSIS, pp. 115–122 (2013)
Nguyen, S.H., Świeboda, W., Jaśkiewicz, G.: Extended document representation for search result clustering. In: Bembenik, R., Skonieczny, Ł., Rybiński, H., Niezgódka, M. (eds.) Intelligent Tools for Building a Scient. Info. Plat. SCI, vol. 390, pp. 77–95. Springer, Heidelberg (2012)
Nguyen, S.H., Świeboda, W., Jaśkiewicz, G.: Semantic evaluation of search result clustering methods. In: Bembenik, R., Skonieczny, Ł., Rybiński, H., Kryszkiewicz, M., Niezgódka, M. (eds.) Intell. Tools for Building a Scientific Information. SCI, vol. 467, pp. 393–414. Springer, Heidelberg (2013)
Nguyen, S.H., Świeboda, W., Jaśkiewicz, G., Nguyen, H.S.: Enhancing search results clustering with semantic indexing. In: SoICT 2012, pp. 71–80 (2012)
Osinski, S., Stefanowski, J., Weiss, D.: Lingo: Search results clustering algorithm based on singular value decomposition. In: Intelligent Information Systems, pp. 359–368 (2004)
Pfitzner, D., Leibbrandt, R., Powers, D.: Characterization and evaluation of similarity measures for pairs of clusterings. Knowl. Inf. Syst. 19(3), 361–394 (2009)
Rand, W.M.: Objective criteria for the evaluation of clustering methods. J. Amer. Stat. Assoc. 66(336), 846–850 (1971)
Roberts, R.J.: PubMed Central: The GenBank of the published literature. Proceedings of the National Academy of Sciences of the United States of America 98(2), 381–382 (2001)
Sequeira, K., Zaki, M.: SCHISM: A new approach for interesting subspace mining. In: Proceedings of the Fourth IEEE Conference on Data Mining, pp. 186–193. IEEE Computer Society (2004)
Szczuka, M., Janusz, A., Herba, K.: Semantic clustering of scientific articles with use of DBpedia knowledge base. In: Bembenik, R., Skonieczny, Ł., Rybiński, H., Niezgódka, M. (eds.) Intelligent Tools for Building a Scient. Info. Plat. SCI, vol. 390, pp. 61–76. Springer, Heidelberg (2012)
United States National Library of Medicine. Introduction to MeSH – 2011 (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Nguyen, S.H., Świeboda, W., Nguyen, H.S. (2014). Semantic Evaluation of Text Clustering. In: van Do, T., Thi, H., Nguyen, N. (eds) Advanced Computational Methods for Knowledge Engineering. Advances in Intelligent Systems and Computing, vol 282. Springer, Cham. https://doi.org/10.1007/978-3-319-06569-4_20
Download citation
DOI: https://doi.org/10.1007/978-3-319-06569-4_20
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-06568-7
Online ISBN: 978-3-319-06569-4
eBook Packages: EngineeringEngineering (R0)