Weight Learning in TRSM-based Information Retrieval

Świeboda, Wojciech; Meina, Michał; Nguyen, Hung Son

doi:10.1007/978-3-319-04714-0_5

Wojciech Świeboda⁷,
Michał Meina⁸ &
Hung Son Nguyen⁷

Part of the book series: Studies in Computational Intelligence ((SCI,volume 541))

583 Accesses

Abstract

This chapter presents a novel approach to keyword search in Information Retrieval based on Tolerance Rough Set Model (TRSM). Bag-of-word representation of each document is extended by additional words that are enclosed into inverted index along with appropriate weights. Those extension words are derived from different techniques (e.g. semantic information, word distribution, etc.) that are encapsulated in the model by a tolerance relation. Weight for structural extension are then assigned by unsupervised algorithm. This method, called TRSM-WL, allow us to improve retrieval effectiveness by returning documents that not necessarily include words from the query. We compare performance of these two algorithms in the keyword search problem over a benchmark data set.

The authors are supported by grant 2012/05/B/ST6/03215 from the Polish National Science Centre (NCN), and the grant SP/I/1/77065/10 in frame of the strategic scientific research and experimental development program: “Interdisciplinary System for Interactive Scientific and Scientific-Technical Information” founded by the Polish National Centre for Research and Development (NCBiR).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Pawlak, Z.: Rough Sets: Theoretical Aspects of Reasoning about Data. Kluwer Academic Publishers, Dordrecht (1991)
Google Scholar
Kawasaki, S., Nguyen, N.B., Ho, T.B.: Hierarchical document clustering based on tolerance rough set model. In: Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery. PKDD ‘00, pp. 458–463. Springer, London, UK (2000)
Google Scholar
Ho, T.B., Nguyen, N.B.: Nonhierarchical document clustering based on a tolerance rough set model. Int. J. Intell. Syst. 17, 199–212 (2002)
Article MATH Google Scholar
Blair, D.C., Maron, M.E.: An evaluation of retrieval effectiveness for a full-text document-retrieval system. Commun. ACM 28(3), 289–299 (1985)
Article Google Scholar
Furnas, G.W., Landauer, T.K., Gomez, L.M., Dumais, S.T.: The vocabulary problem in human-system communication. Commun. ACM 30(11), 964–971 (1987)
Article Google Scholar
Carpineto, C., Romano, G.: A survey of automatic query expansion in information retrieval. ACM Comput. Surv. 44(1), 1:1–1:50 (2012)
Google Scholar
Manning, C.D., Raghavan, P., Schtze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)
Book MATH Google Scholar
Voorhees, E.M.: The cluster hypothesis revisited. In: Proceedings of the 8th Annual International SIGIR Conference on Research and Development in Information Retrieval, SIGIR ‘85, pp. 188–196. ACM, New York, NY, USA (1985)
Google Scholar
Leuski, A.: Evaluating document clustering for interactive information retrieval. In: Proceedings of the Tenth International Conference on Information and Knowledge Management, CIKM ‘01, pp. 33–40. ACM, New York, NY, USA (2001)
Google Scholar
Agirre, E., Arregi, X., Otegi, A.: Document expansion based on wordnet for robust IR. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters. COLING ‘10, Association for Computational Linguistics, pp. 9–17. Stroudsburg, PA, USA (2010)
Google Scholar
Świeboda, W., Meina, M., Nguyen, H.S.: Weight learning for document tolerance rough set model. In: Lingras, P., Wolski, M., Cornelis, C., Mitra, S., Wasilewski, P. (eds.) Eight International Conference on RSKT. Lecture Notes in Computer Science, vol. 8171, pp. 385–396. Springer, Berlin (2013)
Google Scholar
Skowron, A., Stepaniuk, J.: Tolerance approximation spaces. Fundam. Inform. 27(2/3), 245–253 (1996)
MATH MathSciNet Google Scholar
Feldman, R., Sanger, J.: Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data. Cambridge University Press, New York (2006)
Book Google Scholar
Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inform. Sci. 41(6), 391–407 (1990)
Article Google Scholar
Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using wikipedia based explicit semantic analysis. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence, pp. 1606–1611, (2007)
Google Scholar
Janusz, A., Slezak, D., Nguyen, H.S.: Unsupervised similarity learning from textual data. Fundam. Inform. 119(3–4), 319–336 (2012)
MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Mathematics, The University of Warsaw, Banacha 2, 02-097, Warsaw, Poland
Wojciech Świeboda & Hung Son Nguyen
Faculty of Mathematics and Computer Science, Nicolaus Copernicus University, Toruń, Poland
Michał Meina

Authors

Wojciech Świeboda
View author publications
You can also search for this author in PubMed Google Scholar
Michał Meina
View author publications
You can also search for this author in PubMed Google Scholar
Hung Son Nguyen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Wojciech Świeboda , Michał Meina or Hung Son Nguyen .

Editor information

Editors and Affiliations

Faculty of Electronics and Information Technology, Warsaw University of Technology, Institute of Computer Science, Warsaw, Poland
Robert Bembenik
Faculty of Electronics and Information Technology, Warsaw University of Technology, Institute of Computer Science, Warsaw, Poland
Łukasz Skonieczny
Faculty of Electronics and Information Technology, Warsaw University of Technology, Institute of Computer Science, Warsaw, Poland
Henryk Rybiński
Faculty of Electronics and Information Technology, Warsaw University of Technology, Institute of Computer Science, Warsaw, Poland
Marzena Kryszkiewicz
InterdisciplinaryCentre for Mathematical and Computational Modelling (ICM), University of Warsaw, Warsaw, Poland
Marek Niezgódka

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Świeboda, W., Meina, M., Nguyen, H.S. (2014). Weight Learning in TRSM-based Information Retrieval. In: Bembenik, R., Skonieczny, Ł., Rybiński, H., Kryszkiewicz, M., Niezgódka, M. (eds) Intelligent Tools for Building a Scientific Information Platform: From Research to Implementation. Studies in Computational Intelligence, vol 541. Springer, Cham. https://doi.org/10.1007/978-3-319-04714-0_5

Download citation

DOI: https://doi.org/10.1007/978-3-319-04714-0_5
Published: 27 February 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-04713-3
Online ISBN: 978-3-319-04714-0
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics