Skip to main content

Part of the book series: Studies in Computational Intelligence ((SCI,volume 541))

  • 583 Accesses

Abstract

This chapter presents a novel approach to keyword search in Information Retrieval based on Tolerance Rough Set Model (TRSM). Bag-of-word representation of each document is extended by additional words that are enclosed into inverted index along with appropriate weights. Those extension words are derived from different techniques (e.g. semantic information, word distribution, etc.) that are encapsulated in the model by a tolerance relation. Weight for structural extension are then assigned by unsupervised algorithm. This method, called TRSM-WL, allow us to improve retrieval effectiveness by returning documents that not necessarily include words from the query. We compare performance of these two algorithms in the keyword search problem over a benchmark data set.

The authors are supported by grant 2012/05/B/ST6/03215 from the Polish National Science Centre (NCN), and the grant SP/I/1/77065/10 in frame of the strategic scientific research and experimental development program: “Interdisciplinary System for Interactive Scientific and Scientific-Technical Information” founded by the Polish National Centre for Research and Development (NCBiR).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Pawlak, Z.: Rough Sets: Theoretical Aspects of Reasoning about Data. Kluwer Academic Publishers, Dordrecht (1991)

    Google Scholar 

  2. Kawasaki, S., Nguyen, N.B., Ho, T.B.: Hierarchical document clustering based on tolerance rough set model. In: Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery. PKDD ‘00, pp. 458–463. Springer, London, UK (2000)

    Google Scholar 

  3. Ho, T.B., Nguyen, N.B.: Nonhierarchical document clustering based on a tolerance rough set model. Int. J. Intell. Syst. 17, 199–212 (2002)

    Article  MATH  Google Scholar 

  4. Blair, D.C., Maron, M.E.: An evaluation of retrieval effectiveness for a full-text document-retrieval system. Commun. ACM 28(3), 289–299 (1985)

    Article  Google Scholar 

  5. Furnas, G.W., Landauer, T.K., Gomez, L.M., Dumais, S.T.: The vocabulary problem in human-system communication. Commun. ACM 30(11), 964–971 (1987)

    Article  Google Scholar 

  6. Carpineto, C., Romano, G.: A survey of automatic query expansion in information retrieval. ACM Comput. Surv. 44(1), 1:1–1:50 (2012)

    Google Scholar 

  7. Manning, C.D., Raghavan, P., Schtze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)

    Book  MATH  Google Scholar 

  8. Voorhees, E.M.: The cluster hypothesis revisited. In: Proceedings of the 8th Annual International SIGIR Conference on Research and Development in Information Retrieval, SIGIR ‘85, pp. 188–196. ACM, New York, NY, USA (1985)

    Google Scholar 

  9. Leuski, A.: Evaluating document clustering for interactive information retrieval. In: Proceedings of the Tenth International Conference on Information and Knowledge Management, CIKM ‘01, pp. 33–40. ACM, New York, NY, USA (2001)

    Google Scholar 

  10. Agirre, E., Arregi, X., Otegi, A.: Document expansion based on wordnet for robust IR. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters. COLING ‘10, Association for Computational Linguistics, pp. 9–17. Stroudsburg, PA, USA (2010)

    Google Scholar 

  11. Świeboda, W., Meina, M., Nguyen, H.S.: Weight learning for document tolerance rough set model. In: Lingras, P., Wolski, M., Cornelis, C., Mitra, S., Wasilewski, P. (eds.) Eight International Conference on RSKT. Lecture Notes in Computer Science, vol. 8171, pp. 385–396. Springer, Berlin (2013)

    Google Scholar 

  12. Skowron, A., Stepaniuk, J.: Tolerance approximation spaces. Fundam. Inform. 27(2/3), 245–253 (1996)

    MATH  MathSciNet  Google Scholar 

  13. Feldman, R., Sanger, J.: Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data. Cambridge University Press, New York (2006)

    Book  Google Scholar 

  14. Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inform. Sci. 41(6), 391–407 (1990)

    Article  Google Scholar 

  15. Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using wikipedia based explicit semantic analysis. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence, pp. 1606–1611, (2007)

    Google Scholar 

  16. Janusz, A., Slezak, D., Nguyen, H.S.: Unsupervised similarity learning from textual data. Fundam. Inform. 119(3–4), 319–336 (2012)

    MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Wojciech Świeboda , Michał Meina or Hung Son Nguyen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Świeboda, W., Meina, M., Nguyen, H.S. (2014). Weight Learning in TRSM-based Information Retrieval. In: Bembenik, R., Skonieczny, Ł., Rybiński, H., Kryszkiewicz, M., Niezgódka, M. (eds) Intelligent Tools for Building a Scientific Information Platform: From Research to Implementation. Studies in Computational Intelligence, vol 541. Springer, Cham. https://doi.org/10.1007/978-3-319-04714-0_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-04714-0_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-04713-3

  • Online ISBN: 978-3-319-04714-0

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics