Semantic Evaluation of Search Result Clustering Methods

Nguyen, Sinh Hoa; Świeboda, Wojciech; Jaśkiewicz, Grzegorz

doi:10.1007/978-3-642-35647-6_24

Sinh Hoa Nguyen^6,7,
Wojciech Świeboda⁶ &
Grzegorz Jaśkiewicz⁶

Part of the book series: Studies in Computational Intelligence ((SCI,volume 467))

950 Accesses
3 Citations

Abstract

This paper is a continuation of the research on designing and developing a dialog-based semantic search engine for SONCA system which is a part of the SYNAT project. In previous papers we proposed some extensions of Tolerance Rough Set Model (TRSM), which can be used to improve the search result clustering algorithms. In this paper, we investigate the problem of quality analysis of presented solutions. We propose some semantic evaluation measures to estimate the quality of the proposed search clustering methods. We illustrate the proposed evaluation method on the base of the Medical Subject Headings (MeSH) thesaurus and compare different clustering techniques over the commonly accessed biomedical research articles from PubMed Central (PMC) portal. The experimental results are showing the advantages of our new clustering methods which are implemented in SONCA system.

The authors are partially supported by the grant N N516 077837 from the Ministry of Science and Higher Education of the Republic of Poland and by the National Centre for Research and Development (NCBiR) under Grant No. SP/I/1/77065/10 by the strategic scientific research and experimental development program: “Interdisciplinary System for Interactive Scientific and Scientific-Technical Information”. The research of the first author has been partially supported by the statutory research grant founded by Polish-Japanese Institute of Information Technology.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Cao, T., Do, H., Hong, D., Quan, T.: Fuzzy named entity-based document clustering. In: Proc. of the 17th IEEE International Conference on Fuzzy Systems (FUZZ-IEEE 2008), pp. 2028–2034 (2008)
Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer Series in Statistics. Springer New York Inc. (2001)
Google Scholar
Herskovic, J.R., Tanaka, L.Y., Hersh, W., Bernstam, E.V.: A day in the life of pubmed: analysis of a typical day’s query log. Journal of the American Medical Informatics Association, 212–220 (2007)
Google Scholar
Ho, T.B., Nguyen, N.B.: Nonhierarchical document clustering based on a tolerance rough set model. International Journal of Intelligent Systems 17(2), 199–212 (2002)
Article MATH Google Scholar
Hubert, L., Arabie, P.: Comparing partitions. Journal of Classification 2, 193–218 (1985)
Article Google Scholar
Kawasaki, S., Nguyen, N.B., Ho, T.-B.: Hierarchical Document Clustering Based on Tolerance Rough Set Model. In: Zighed, D.A., Komorowski, J., Żytkow, J. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 458–463. Springer, Heidelberg (2000)
Chapter Google Scholar
Lancichinetti, A., Fortunato, S., Kertãsz, J.: Detecting the overlapping and hierarchical community structure of complex networks. New Journal of Physics 11, 033015 (2009)
Article Google Scholar
Lu, Z., Kim, W., Wilbur, W.: Evaluation of query expansion using MeSH in PubMed. Information Retrieval 12(1), 69–80 (2009)
Article Google Scholar
MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Cam, L.M.L., Neyman, J. (eds.) Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297. University of California Press (1967)
Google Scholar
Manning, C.D., Raghavan, P., Schtze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)
Book MATH Google Scholar
Nguyen, H.S., Ho, T.B.: Rough document clustering and the internet. In: Pedrycz, W., Skowron, A., Kreinovich, V. (eds.) Handbook of Granular Computing, pp. 987–1004. Wiley & Sons (2008)
Google Scholar
Nguyen, S., Świeboda, W., Jaśkiewicz, G., Nguyen, H.: Enhancing search result clustering with semantic indexing. In: 3rd International Symposium on Information and Communication Technology (SoICT 2012), pp. 71–80. ACM (2012)
Google Scholar
Nguyen, S.H., Świeboda, W., Jaśkiewicz, G.: Extended Document Representation for Search Result Clustering. In: Bembenik, R., Skonieczny, L., Rybiński, H., Niezgodka, M. (eds.) Intelligent Tools for Building a Scient. Info. Plat. SCI, vol. 390, pp. 77–95. Springer, Heidelberg (2012)
Chapter Google Scholar
Nguyen, X.V., Epps, J., Bailey, J.: Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance. Journal of Machine Learning Research 11, 2837–2854 (2010)
MATH Google Scholar
Osiński, S.: An algorithm for clustering of web search result. Master’s thesis, Poznan University of Technology, Poland (June 2003)
Google Scholar
Rand, W.M.: Objective criteria for the evaluation of clustering methods. Journal of the American Statistical association 66(336), 846–850 (1971)
Article Google Scholar
Roberts, R.J.: PubMed Central: The Gen. Bank of the published literature. Proceedings of the National Academy of Sciences of the United States of America 98(2), 381–382 (2001)
Article Google Scholar
Skowron, A., Stepaniuk, J.: Tolerance approximation spaces. Fundamenta Informaticae 27(2-3), 245–253 (1996)
MathSciNet MATH Google Scholar
Szczuka, M., Janusz, A., Herba, K.: Semantic Clustering of Scientific Articles with Use of DBpedia Knowledge Base. In: Bembenik, R., Skonieczny, Ł., Rybiński, H., Niezgódka, M. (eds.) Intelligent Tools for Building a Scient. Info. Plat. SCI, vol. 390, pp. 61–76. Springer, Heidelberg (2012)
Chapter Google Scholar
Tan, P.-N., Steinbach, M., Kumar, V.: Introduction to Data Mining, 1st edn. Addison-Wesley Longman Publishing Co., Inc., Boston (2005)
Google Scholar
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, vol. 2. Morgan Kaufmann (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Mathematics, Warsaw University, Banacha 2, 02-097, Warsaw, Poland
Sinh Hoa Nguyen, Wojciech Świeboda & Grzegorz Jaśkiewicz
Polish-Japanese Institute of Information Technology, Koszykowa 86, 02008, Warszawa, Poland
Sinh Hoa Nguyen

Authors

Sinh Hoa Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Wojciech Świeboda
View author publications
You can also search for this author in PubMed Google Scholar
Grzegorz Jaśkiewicz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sinh Hoa Nguyen .

Editor information

Editors and Affiliations

, Institute of Computer Science, Warsaw University of Technology, ul. Nowowiejska 15/19, Warszawa, 00-665, Poland
Robert Bembenik
, Institute of Computer Science, Warsaw University of Technology, ul. Nowowiejska 15/19, Warszawa, 00-665, Poland
Lukasz Skonieczny
, Institute of Computer Science, Warsaw University of Technology, ul. Nowowiejska 15/19, Warszawa, 00-665, Poland
Henryk Rybinski
, Institute of Computer Science, Warsaw University of Technology, ul. Nowowiejska 15/19, Warszawa, 00-665, Poland
Marzena Kryszkiewicz
, Interdisciplinary Centre for, University of Warsaw, Pawińskiego 5a bl. D, Warsaw, 02-106, Poland
Marek Niezgodka

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Nguyen, S.H., Świeboda, W., Jaśkiewicz, G. (2013). Semantic Evaluation of Search Result Clustering Methods. In: Bembenik, R., Skonieczny, L., Rybinski, H., Kryszkiewicz, M., Niezgodka, M. (eds) Intelligent Tools for Building a Scientific Information Platform. Studies in Computational Intelligence, vol 467. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35647-6_24

Download citation

DOI: https://doi.org/10.1007/978-3-642-35647-6_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35646-9
Online ISBN: 978-3-642-35647-6
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics