Abstract
This chapter presents a novel approach to keyword search in Information Retrieval based on Tolerance Rough Set Model (TRSM). Bag-of-word representation of each document is extended by additional words that are enclosed into inverted index along with appropriate weights. Those extension words are derived from different techniques (e.g. semantic information, word distribution, etc.) that are encapsulated in the model by a tolerance relation. Weight for structural extension are then assigned by unsupervised algorithm. This method, called TRSM-WL, allow us to improve retrieval effectiveness by returning documents that not necessarily include words from the query. We compare performance of these two algorithms in the keyword search problem over a benchmark data set.
The authors are supported by grant 2012/05/B/ST6/03215 from the Polish National Science Centre (NCN), and the grant SP/I/1/77065/10 in frame of the strategic scientific research and experimental development program: “Interdisciplinary System for Interactive Scientific and Scientific-Technical Information” founded by the Polish National Centre for Research and Development (NCBiR).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Pawlak, Z.: Rough Sets: Theoretical Aspects of Reasoning about Data. Kluwer Academic Publishers, Dordrecht (1991)
Kawasaki, S., Nguyen, N.B., Ho, T.B.: Hierarchical document clustering based on tolerance rough set model. In: Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery. PKDD ‘00, pp. 458–463. Springer, London, UK (2000)
Ho, T.B., Nguyen, N.B.: Nonhierarchical document clustering based on a tolerance rough set model. Int. J. Intell. Syst. 17, 199–212 (2002)
Blair, D.C., Maron, M.E.: An evaluation of retrieval effectiveness for a full-text document-retrieval system. Commun. ACM 28(3), 289–299 (1985)
Furnas, G.W., Landauer, T.K., Gomez, L.M., Dumais, S.T.: The vocabulary problem in human-system communication. Commun. ACM 30(11), 964–971 (1987)
Carpineto, C., Romano, G.: A survey of automatic query expansion in information retrieval. ACM Comput. Surv. 44(1), 1:1–1:50 (2012)
Manning, C.D., Raghavan, P., Schtze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)
Voorhees, E.M.: The cluster hypothesis revisited. In: Proceedings of the 8th Annual International SIGIR Conference on Research and Development in Information Retrieval, SIGIR ‘85, pp. 188–196. ACM, New York, NY, USA (1985)
Leuski, A.: Evaluating document clustering for interactive information retrieval. In: Proceedings of the Tenth International Conference on Information and Knowledge Management, CIKM ‘01, pp. 33–40. ACM, New York, NY, USA (2001)
Agirre, E., Arregi, X., Otegi, A.: Document expansion based on wordnet for robust IR. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters. COLING ‘10, Association for Computational Linguistics, pp. 9–17. Stroudsburg, PA, USA (2010)
Świeboda, W., Meina, M., Nguyen, H.S.: Weight learning for document tolerance rough set model. In: Lingras, P., Wolski, M., Cornelis, C., Mitra, S., Wasilewski, P. (eds.) Eight International Conference on RSKT. Lecture Notes in Computer Science, vol. 8171, pp. 385–396. Springer, Berlin (2013)
Skowron, A., Stepaniuk, J.: Tolerance approximation spaces. Fundam. Inform. 27(2/3), 245–253 (1996)
Feldman, R., Sanger, J.: Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data. Cambridge University Press, New York (2006)
Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inform. Sci. 41(6), 391–407 (1990)
Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using wikipedia based explicit semantic analysis. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence, pp. 1606–1611, (2007)
Janusz, A., Slezak, D., Nguyen, H.S.: Unsupervised similarity learning from textual data. Fundam. Inform. 119(3–4), 319–336 (2012)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Świeboda, W., Meina, M., Nguyen, H.S. (2014). Weight Learning in TRSM-based Information Retrieval. In: Bembenik, R., Skonieczny, Ł., Rybiński, H., Kryszkiewicz, M., Niezgódka, M. (eds) Intelligent Tools for Building a Scientific Information Platform: From Research to Implementation. Studies in Computational Intelligence, vol 541. Springer, Cham. https://doi.org/10.1007/978-3-319-04714-0_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-04714-0_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-04713-3
Online ISBN: 978-3-319-04714-0
eBook Packages: EngineeringEngineering (R0)