Skip to main content

Information Retrieval in Wikipedia with Conceptual Directions

  • Conference paper
Distributed Computing and Internet Technology (ICDCIT 2015)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8956))

  • 2140 Accesses

Abstract

The paper describes our algorithm used for retrieval of textual information from Wikipedia. The experiments show that the algorithm allows to improve typical evaluation measures of retrieval quality. The improvement of the retrieval results was achieved by two phase usage approach. In first the algorithm extends the set of content that has been indexed by the specified keywords and thus increases the Recall value. Then, using the interaction with the user by presenting him so-called Conceptual Directions the search results are purified, which allows to increase Precision value. The preliminary evaluation on multi-sense test phrases indicates, that the algorithm is able to increase the Precision, within result set, without Recall loss. We also describe an additional method used for extending the result set based on creating cluster prototypes and finding the most similar, not retrieved content in text repository. In our demo implementation in the form of web portal, clustering has been used to present the search results organized in thematic groups instead of ranked list.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Croft, W., Metzler, D., Strohman, T.: Search engines: Information retrieval in practice. Addison-Wesley (2010)

    Google Scholar 

  2. Scholer, F., Williams, H., Yiannis, J., Zobel, J.: Compression of inverted indexes for fast query evaluation. In: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 222–229. ACM (2002)

    Google Scholar 

  3. Agrawal, R., Gollapudi, S., Halverson, A., Ieong, S.: Diversifying search results. In: Proceedings of the Second ACM International Conference on Web Search and Data Mining, pp. 5–14. ACM (2009)

    Google Scholar 

  4. Buckley, C., Salton, G., Allan, J., Singhal, A.: Automatic query expansion using smart: Trec 3, p. 69. NIST SPECIAL PUBLICATION SP (1995)

    Google Scholar 

  5. Andrenucci, A., Sneiders, E.: Automated question answering: Review of the main approaches. In: Third International Conference on Information Technology and Applications, ICITA 2005, vol. 1, pp. 514–519. IEEE (2005)

    Google Scholar 

  6. Mann, G.: Fine-grained proper noun ontologies for question answering. In: Proceedings of the 2002 Workshop on Building and using Semantic Networks, vol. 11, pp. 1–7. Association for Computational Linguistics (2002)

    Google Scholar 

  7. Unger, C., Cimiano, P.: Pythia: Compositional meaning construction for ontology-based question answering on the semantic web. In: Muñoz, R., Montoyo, A., Métais, E. (eds.) NLDB 2011. LNCS, vol. 6716, pp. 153–160. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  8. Szymański, J., Krawczyk, H., Deptuła, M.: Retrieval with semantic sieve. In: Selamat, A., Nguyen, N.T., Haron, H. (eds.) ACIIDS 2013, Part I. LNCS, vol. 7802, pp. 236–245. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  9. Ogilvie, P., Voorhees, E., Callan, J.: On the number of terms used in automatic query expansion. Information Retrieval 12, 666–679 (2009)

    Article  Google Scholar 

  10. Miller, G.A., Beckitch, R., Fellbaum, C., Gross, D., Miller, K.: Introduction to WordNet: An On-line Lexical Database. Cognitive Science Laboratory, Princeton University Press (1993)

    Google Scholar 

  11. Dumais, S.: Latent semantic analysis. Annual Review of Information Science and Technology 38, 188–230 (2004)

    Article  Google Scholar 

  12. Lund, K., Burgess, C.: Hyperspace analog to language (hal): A general model of semantic representation. Language and Cognitive Processes (1996)

    Google Scholar 

  13. Jacquemin, C.: Syntagmatic and paradigmatic representations of term variation. In: Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics, pp. 341–348. Association for Computational Linguistics (1999)

    Google Scholar 

  14. Szymański, J.: Words context analysis for improvement of information retrieval. In: Nguyen, N.-T., Hoang, K., Jędrzejowicz, P. (eds.) ICCCI 2012, Part I. LNCS, vol. 7653, pp. 318–325. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  15. Quillian, M.: Semantic memory. Semantic Information Processing 2, 227–270 (1968)

    Google Scholar 

  16. Szymański, J., Duch, W.: Information retrieval with semantic memory model. Cognitive Systems Research 14, 84–100 (2012)

    Article  Google Scholar 

  17. Shawar, B., Atwell, E.: Chatbots: are they really useful? Zeitschrift für Computerlinguistik und Sprachtechnologie,  29 (2007)

    Google Scholar 

  18. Gärdenfors, P.: Semantics based on conceptual spaces. In: Logic and Its Applications, pp. 1–11 (2011)

    Google Scholar 

  19. Szymanski, J.: Comparative analysis of text representation methods using classification. Cybernetics and Systems 45, 180–199 (2014)

    Article  Google Scholar 

  20. Quinlan, J.: Induction of decision trees. Machine Learning 1, 81–106 (1986)

    Google Scholar 

  21. Szymański, J.: Mining relations between wikipedia categories. In: Zavoral, F., Yaghob, J., Pichappan, P., El-Qawasmeh, E. (eds.) NDT 2010. CCIS, vol. 88, pp. 248–255. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  22. Szymanski, J., Duch, W.: Semantic memory knowledge acquisition through active dialogues. In: Proceedings of the International Joint Conference on Neural Networks, IJCNN 2007, Celebrating 20 Years of Neural Networks, Orlando, Florida, USA, August 12-17, pp. 536–541 (2007)

    Google Scholar 

  23. Darken, C., Moody, J.: Fast adaptive k-means clustering: some empirical results. In: Int. Joint Conf. on Neural Networks, vol. 2, pp. 233–238 (1990)

    Google Scholar 

  24. Langville, A., Meyer, C.: Google page rank and beyond. Princeton Univ Pr. (2006)

    Google Scholar 

  25. Carpineto, C., Osiński, S., Romano, G., Weiss, D.: A survey of web clustering engines. ACM Computing Surveys (CSUR) 41, 17 (2009)

    Article  Google Scholar 

  26. Ester, M., Kriegel, H., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, vol. 1996, pp. 226–231. AAAI Press (1996)

    Google Scholar 

  27. Szymański, J.: Interactive information retrieval algorithm for wikipedia articles. In: Yin, H., Costa, J.A.F., Barreto, G. (eds.) IDEAL 2012. LNCS, vol. 7435, pp. 200–207. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  28. Zhao, Y., Karypis, G., Fayyad, U.: Hierarchical clustering algorithms for document datasets. Data Mining and Knowledge Discovery 10, 141–168 (2005)

    Article  MathSciNet  Google Scholar 

  29. Draszawka, K., Szymanski, J.: Thresholding strategies for large scale multi-label text classifier. In: 2013 The 6th International Conference on Human System Interaction (HSI), pp. 350–355. IEEE (2013)

    Google Scholar 

  30. Czarnul, P.: Modeling, run-time optimization and execution of distributed workflow applications in the jee-based beesycluster environment. The Journal of Supercomputing 63, 46–71 (2013)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Szymański, J. (2015). Information Retrieval in Wikipedia with Conceptual Directions. In: Natarajan, R., Barua, G., Patra, M.R. (eds) Distributed Computing and Internet Technology. ICDCIT 2015. Lecture Notes in Computer Science, vol 8956. Springer, Cham. https://doi.org/10.1007/978-3-319-14977-6_42

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-14977-6_42

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-14976-9

  • Online ISBN: 978-3-319-14977-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics