Skip to main content

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 282))

  • 1472 Accesses

Abstract

In this paper, we investigate the problem of quality analysis of clustering results using semantic annotations given by experts. We propose a novel approach to construction of evaluation measure, called SEE (Semantic Evaluation by Exploration), which is an improvement of the existing methods such as Rand Index or Normalized Mutual Information. We illustrate the proposed evaluation method on the freely accessible biomedical research articles from Pubmed Central (PMC). Many articles from Pubmed Central are annotated by the experts using Medical Subject Headings (MeSH) thesaurus. We compare different semantic techniques for search result clustering using the proposed measure.

This work is partially supported by the National Centre for Research and Development (NCBiR) under Grant No. SP/I/1/77065/10 by the Strategic scientific research and experimental development program: “Inter-disciplinary System for Interactive Scientific and Scientific-Technical Information”.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Assent, I., Krieger, R., Müller, E., Seidl, T.: Visa: visual subspace clustering analysis. SIGKDD Explor. Newsl. 9(2), 5–12 (2007)

    Article  Google Scholar 

  2. Böhm, C., Kailing, K., Kriegel, H.-P., Kröger, P.: Density connected clustering with local subspace preferences. In: ICDM, pp. 27–34. IEEE Computer Society (2004)

    Google Scholar 

  3. Cao, T., Do, H., Hong, D., Quan, T.: Fuzzy named entity-based document clustering. In: Proc. of the 17th IEEE International Conference on Fuzzy Systems (FUZZ-IEEE 2008), pp. 2028–2034 (2008)

    Google Scholar 

  4. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer Series in Statistics. Springer New York Inc. (2001)

    Google Scholar 

  5. Herskovic, J.R., Tanaka, L.Y., Hersh, W., Bernstam, E.V.: A day in the life of pubmed: analysis of a typical day’s query log. Journal of the American Medical Informatics Association, 212–220 (2007)

    Google Scholar 

  6. Kriegel, H.-P., Kroger, P., Renz, M., Wurst, S.: A generic framework for efficient subspace clustering of high-dimensional data. In: Proceedings of the Fifth IEEE International Conference on Data Mining, ICDM 2005, pp. 250–257. IEEE Computer Society, Washington, DC (2005)

    Google Scholar 

  7. Kröger, P., Kriegel, H.-P., Kailing, K.: Density-connected subspace clustering for high-dimensional data. In: Berry, M.W., Dayal, U., Kamath, C., Skillicorn, D.B. (eds.) SDM. SIAM (2004)

    Google Scholar 

  8. Lancichinetti, A., Fortunato, S., Kertész, J.: Detecting the overlapping and hierarchical community structure of complex networks. New Journal of Physics 11, 033015 (2009)

    Article  Google Scholar 

  9. MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Cam, L.M.L., Neyman, J. (eds.) Proc. of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297. University of California Press (1967)

    Google Scholar 

  10. Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval (2007)

    Google Scholar 

  11. Nguyen, H.S., Nguyen, S.H., Swieboda, W.: Semantic explorative evaluation of document clustering algorithms. In: Ganzha, M., Maciaszek, L.A., Paprzycki, M. (eds.) FedCSIS, pp. 115–122 (2013)

    Google Scholar 

  12. Nguyen, S.H., Świeboda, W., Jaśkiewicz, G.: Extended document representation for search result clustering. In: Bembenik, R., Skonieczny, Ł., Rybiński, H., Niezgódka, M. (eds.) Intelligent Tools for Building a Scient. Info. Plat. SCI, vol. 390, pp. 77–95. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  13. Nguyen, S.H., Świeboda, W., Jaśkiewicz, G.: Semantic evaluation of search result clustering methods. In: Bembenik, R., Skonieczny, Ł., Rybiński, H., Kryszkiewicz, M., Niezgódka, M. (eds.) Intell. Tools for Building a Scientific Information. SCI, vol. 467, pp. 393–414. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  14. Nguyen, S.H., Świeboda, W., Jaśkiewicz, G., Nguyen, H.S.: Enhancing search results clustering with semantic indexing. In: SoICT 2012, pp. 71–80 (2012)

    Google Scholar 

  15. Osinski, S., Stefanowski, J., Weiss, D.: Lingo: Search results clustering algorithm based on singular value decomposition. In: Intelligent Information Systems, pp. 359–368 (2004)

    Google Scholar 

  16. Pfitzner, D., Leibbrandt, R., Powers, D.: Characterization and evaluation of similarity measures for pairs of clusterings. Knowl. Inf. Syst. 19(3), 361–394 (2009)

    Article  Google Scholar 

  17. Rand, W.M.: Objective criteria for the evaluation of clustering methods. J. Amer. Stat. Assoc. 66(336), 846–850 (1971)

    Article  Google Scholar 

  18. Roberts, R.J.: PubMed Central: The GenBank of the published literature. Proceedings of the National Academy of Sciences of the United States of America 98(2), 381–382 (2001)

    Article  Google Scholar 

  19. Sequeira, K., Zaki, M.: SCHISM: A new approach for interesting subspace mining. In: Proceedings of the Fourth IEEE Conference on Data Mining, pp. 186–193. IEEE Computer Society (2004)

    Google Scholar 

  20. Szczuka, M., Janusz, A., Herba, K.: Semantic clustering of scientific articles with use of DBpedia knowledge base. In: Bembenik, R., Skonieczny, Ł., Rybiński, H., Niezgódka, M. (eds.) Intelligent Tools for Building a Scient. Info. Plat. SCI, vol. 390, pp. 61–76. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  21. United States National Library of Medicine. Introduction to MeSH – 2011 (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sinh Hoa Nguyen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Nguyen, S.H., Świeboda, W., Nguyen, H.S. (2014). Semantic Evaluation of Text Clustering. In: van Do, T., Thi, H., Nguyen, N. (eds) Advanced Computational Methods for Knowledge Engineering. Advances in Intelligent Systems and Computing, vol 282. Springer, Cham. https://doi.org/10.1007/978-3-319-06569-4_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-06569-4_20

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-06568-7

  • Online ISBN: 978-3-319-06569-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics