Abstract
In this article we propose a general framework incorporating semantic indexing and search of texts within scientific document repositories, where document representation may include, excepts the content, some additional document meta-data, citations and semantic information. Our idea is based on application of Tolerance Rough Set Model, semantic information extracted from source text and domain ontology to approximate concepts associated with documents and to enrich the vector representation. We present the experiment performed over the freely accessed biomedical research articles from Pubmed Cetral (PMC) portal. The experimental results are showing the advantages of the proposed solution.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Ngo, C.L., Nguyen, H.S.: A method of web search result clustering based on rough sets. In: Skowron, A., Agrawal, R., Luck, M., Yamaguchi, T., Mor-Mahoudeaux, P., Liu, J., Zhong, N. (eds.) Web Intelligence, pp. 673–679. IEEE Computer Society (2005)
Szczuka, M., Janusz, A., Herba, K.: Semantic clustering of scientific articles with use of DBpedia knowledge base. In: Bembenik, R., Skonieczny, L., Rybiński, H., Niezgodka, M. (eds.) Intelligent Tools for Building a Scient. Info. Plat. SCI, vol. 390, pp. 61–76. Springer, Heidelberg (2012)
Nguyen, S.H., Jaśkiewicz, G., Świeboda, W., Nguyen, H.S.: Enhancing search result clustering with semantic indexing. In: Proceedings of the Third Symposium on Information and Communication Technology, SoICT 2012, pp. 71–80. ACM, New York (2012)
Skowron, A., Stepaniuk, J.: Tolerance approximation spaces. Fundamenta Informaticae 27(2-3), 245–253 (1996)
Nguyen, H.S., Ho, T.B.: Rough document clustering and the internet. In: Pedrycz, W., Skowron, A., Kreinovich, V. (eds.) Handbook of Granular Computing, pp. 987–1004. Wiley & Sons (2008)
Kawasaki, S., Nguyen, N.B., Ho, T.B.: Hierarchical document clustering based on tolerance rough set model. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 458–463. Springer, Heidelberg (2000)
Ho, T.B., Nguyen, N.B.: Nonhierarchical document clustering based on a tolerance rough set model. International Journal of Intelligent Systems 17(2), 199–212 (2002)
Virginia, G., Nguyen, H.S.: Investigating the effectiveness of thesaurus generated using tolerance rough set model. In: Kryszkiewicz, M., Rybinski, H., Skowron, A., Raś, Z.W. (eds.) ISMIS 2011. LNCS, vol. 6804, pp. 705–714. Springer, Heidelberg (2011)
Feldman, R., Sanger, J. (eds.): The Text Mining Handbook. Cambridge University Press (2007)
Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using wikipedia-based explicit semantic analysis. In: Proceedings of the 20th International Joint Conference on Artifical Intelligence, IJCAI 2007, pp. 1606–1611. Morgan Kaufmann Publishers Inc., San Francisco (2007)
Hliaoutakis, A., Varelas, G., Voutsakis, E., Petrakis, E.G.M., Milios, E.: Information retrieval by semantic similarity. Int. Journal on Semantic Web and Information Systems (IJSWIS). Special Issue of Multimedia Semantics 3(3), 55–73 (2006)
Rinaldi, A.M.: An ontology-driven approach for semantic information retrieval on the web. ACM Trans. Internet Technol. 10:1–10:24 (2009)
Janusz, A., Świeboda, W., Krasuski, A., Nguyen, H.S.: Interactive document indexing method based on explicit semantic analysis. In: Yao, J., Yang, Y., Słowiński, R., Greco, S., Li, H., Mitra, S., Polkowski, L. (eds.) RSCTC 2012. LNCS, vol. 7413, pp. 156–165. Springer, Heidelberg (2012)
Herskovic, J.R., Tanaka, L.Y., Hersh, W., Bernstam, E.V.: A day in the life of pubmed: analysis of a typical day’s query log. Journal of the American Medical Informatics Association, 212–220 (2007)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer International Publishing Switzerland
About this paper
Cite this paper
Nguyen, S.H., Nguyen, H.S. (2013). An Approach to Semantic Indexing Based on Tolerance Rough Set Model. In: Nguyen, N., van Do, T., le Thi, H. (eds) Advanced Computational Methods for Knowledge Engineering. Studies in Computational Intelligence, vol 479. Springer, Heidelberg. https://doi.org/10.1007/978-3-319-00293-4_26
Download citation
DOI: https://doi.org/10.1007/978-3-319-00293-4_26
Publisher Name: Springer, Heidelberg
Print ISBN: 978-3-319-00292-7
Online ISBN: 978-3-319-00293-4
eBook Packages: EngineeringEngineering (R0)