Information Retrieval in Wikipedia with Conceptual Directions

Szymański, Julian

doi:10.1007/978-3-319-14977-6_42

Julian Szymański¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8956))

Included in the following conference series:

International Conference on Distributed Computing and Internet Technology

2140 Accesses

Abstract

The paper describes our algorithm used for retrieval of textual information from Wikipedia. The experiments show that the algorithm allows to improve typical evaluation measures of retrieval quality. The improvement of the retrieval results was achieved by two phase usage approach. In first the algorithm extends the set of content that has been indexed by the specified keywords and thus increases the Recall value. Then, using the interaction with the user by presenting him so-called Conceptual Directions the search results are purified, which allows to increase Precision value. The preliminary evaluation on multi-sense test phrases indicates, that the algorithm is able to increase the Precision, within result set, without Recall loss. We also describe an additional method used for extending the result set based on creating cluster prototypes and finding the most similar, not retrieved content in text repository. In our demo implementation in the form of web portal, clustering has been used to present the search results organized in thematic groups instead of ranked list.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Croft, W., Metzler, D., Strohman, T.: Search engines: Information retrieval in practice. Addison-Wesley (2010)
Google Scholar
Scholer, F., Williams, H., Yiannis, J., Zobel, J.: Compression of inverted indexes for fast query evaluation. In: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 222–229. ACM (2002)
Google Scholar
Agrawal, R., Gollapudi, S., Halverson, A., Ieong, S.: Diversifying search results. In: Proceedings of the Second ACM International Conference on Web Search and Data Mining, pp. 5–14. ACM (2009)
Google Scholar
Buckley, C., Salton, G., Allan, J., Singhal, A.: Automatic query expansion using smart: Trec 3, p. 69. NIST SPECIAL PUBLICATION SP (1995)
Google Scholar
Andrenucci, A., Sneiders, E.: Automated question answering: Review of the main approaches. In: Third International Conference on Information Technology and Applications, ICITA 2005, vol. 1, pp. 514–519. IEEE (2005)
Google Scholar
Mann, G.: Fine-grained proper noun ontologies for question answering. In: Proceedings of the 2002 Workshop on Building and using Semantic Networks, vol. 11, pp. 1–7. Association for Computational Linguistics (2002)
Google Scholar
Unger, C., Cimiano, P.: Pythia: Compositional meaning construction for ontology-based question answering on the semantic web. In: Muñoz, R., Montoyo, A., Métais, E. (eds.) NLDB 2011. LNCS, vol. 6716, pp. 153–160. Springer, Heidelberg (2011)
Chapter Google Scholar
Szymański, J., Krawczyk, H., Deptuła, M.: Retrieval with semantic sieve. In: Selamat, A., Nguyen, N.T., Haron, H. (eds.) ACIIDS 2013, Part I. LNCS, vol. 7802, pp. 236–245. Springer, Heidelberg (2013)
Chapter Google Scholar
Ogilvie, P., Voorhees, E., Callan, J.: On the number of terms used in automatic query expansion. Information Retrieval 12, 666–679 (2009)
Article Google Scholar
Miller, G.A., Beckitch, R., Fellbaum, C., Gross, D., Miller, K.: Introduction to WordNet: An On-line Lexical Database. Cognitive Science Laboratory, Princeton University Press (1993)
Google Scholar
Dumais, S.: Latent semantic analysis. Annual Review of Information Science and Technology 38, 188–230 (2004)
Article Google Scholar
Lund, K., Burgess, C.: Hyperspace analog to language (hal): A general model of semantic representation. Language and Cognitive Processes (1996)
Google Scholar
Jacquemin, C.: Syntagmatic and paradigmatic representations of term variation. In: Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics, pp. 341–348. Association for Computational Linguistics (1999)
Google Scholar
Szymański, J.: Words context analysis for improvement of information retrieval. In: Nguyen, N.-T., Hoang, K., Jędrzejowicz, P. (eds.) ICCCI 2012, Part I. LNCS, vol. 7653, pp. 318–325. Springer, Heidelberg (2012)
Chapter Google Scholar
Quillian, M.: Semantic memory. Semantic Information Processing 2, 227–270 (1968)
Google Scholar
Szymański, J., Duch, W.: Information retrieval with semantic memory model. Cognitive Systems Research 14, 84–100 (2012)
Article Google Scholar
Shawar, B., Atwell, E.: Chatbots: are they really useful? Zeitschrift für Computerlinguistik und Sprachtechnologie, 29 (2007)
Google Scholar
Gärdenfors, P.: Semantics based on conceptual spaces. In: Logic and Its Applications, pp. 1–11 (2011)
Google Scholar
Szymanski, J.: Comparative analysis of text representation methods using classification. Cybernetics and Systems 45, 180–199 (2014)
Article Google Scholar
Quinlan, J.: Induction of decision trees. Machine Learning 1, 81–106 (1986)
Google Scholar
Szymański, J.: Mining relations between wikipedia categories. In: Zavoral, F., Yaghob, J., Pichappan, P., El-Qawasmeh, E. (eds.) NDT 2010. CCIS, vol. 88, pp. 248–255. Springer, Heidelberg (2010)
Chapter Google Scholar
Szymanski, J., Duch, W.: Semantic memory knowledge acquisition through active dialogues. In: Proceedings of the International Joint Conference on Neural Networks, IJCNN 2007, Celebrating 20 Years of Neural Networks, Orlando, Florida, USA, August 12-17, pp. 536–541 (2007)
Google Scholar
Darken, C., Moody, J.: Fast adaptive k-means clustering: some empirical results. In: Int. Joint Conf. on Neural Networks, vol. 2, pp. 233–238 (1990)
Google Scholar
Langville, A., Meyer, C.: Google page rank and beyond. Princeton Univ Pr. (2006)
Google Scholar
Carpineto, C., Osiński, S., Romano, G., Weiss, D.: A survey of web clustering engines. ACM Computing Surveys (CSUR) 41, 17 (2009)
Article Google Scholar
Ester, M., Kriegel, H., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, vol. 1996, pp. 226–231. AAAI Press (1996)
Google Scholar
Szymański, J.: Interactive information retrieval algorithm for wikipedia articles. In: Yin, H., Costa, J.A.F., Barreto, G. (eds.) IDEAL 2012. LNCS, vol. 7435, pp. 200–207. Springer, Heidelberg (2012)
Chapter Google Scholar
Zhao, Y., Karypis, G., Fayyad, U.: Hierarchical clustering algorithms for document datasets. Data Mining and Knowledge Discovery 10, 141–168 (2005)
Article MathSciNet Google Scholar
Draszawka, K., Szymanski, J.: Thresholding strategies for large scale multi-label text classifier. In: 2013 The 6th International Conference on Human System Interaction (HSI), pp. 350–355. IEEE (2013)
Google Scholar
Czarnul, P.: Modeling, run-time optimization and execution of distributed workflow applications in the jee-based beesycluster environment. The Journal of Supercomputing 63, 46–71 (2013)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Systems Architecture, Gdańsk University of Technology, Poland
Julian Szymański

Authors

Julian Szymański
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Technology and Computer Science, Tata Institute of Fundamental Research, Homi Bhabha Road, Colaba, 400005, Mumbai, India
Raja Natarajan
Department of Computer Science and Engineering, Indian Institute of Technology Guwahati, 781039, Guwahati, India
Gautam Barua
Department of Computer Science,, Berhampur University, 760007, Berhampur, Odisha, India
Manas Ranjan Patra

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Szymański, J. (2015). Information Retrieval in Wikipedia with Conceptual Directions. In: Natarajan, R., Barua, G., Patra, M.R. (eds) Distributed Computing and Internet Technology. ICDCIT 2015. Lecture Notes in Computer Science, vol 8956. Springer, Cham. https://doi.org/10.1007/978-3-319-14977-6_42

Download citation

DOI: https://doi.org/10.1007/978-3-319-14977-6_42
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-14976-9
Online ISBN: 978-3-319-14977-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics