Abstract
The paper describes our algorithm used for retrieval of textual information from Wikipedia. The experiments show that the algorithm allows to improve typical evaluation measures of retrieval quality. The improvement of the retrieval results was achieved by two phase usage approach. In first the algorithm extends the set of content that has been indexed by the specified keywords and thus increases the Recall value. Then, using the interaction with the user by presenting him so-called Conceptual Directions the search results are purified, which allows to increase Precision value. The preliminary evaluation on multi-sense test phrases indicates, that the algorithm is able to increase the Precision, within result set, without Recall loss. We also describe an additional method used for extending the result set based on creating cluster prototypes and finding the most similar, not retrieved content in text repository. In our demo implementation in the form of web portal, clustering has been used to present the search results organized in thematic groups instead of ranked list.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Croft, W., Metzler, D., Strohman, T.: Search engines: Information retrieval in practice. Addison-Wesley (2010)
Scholer, F., Williams, H., Yiannis, J., Zobel, J.: Compression of inverted indexes for fast query evaluation. In: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 222–229. ACM (2002)
Agrawal, R., Gollapudi, S., Halverson, A., Ieong, S.: Diversifying search results. In: Proceedings of the Second ACM International Conference on Web Search and Data Mining, pp. 5–14. ACM (2009)
Buckley, C., Salton, G., Allan, J., Singhal, A.: Automatic query expansion using smart: Trec 3, p. 69. NIST SPECIAL PUBLICATION SP (1995)
Andrenucci, A., Sneiders, E.: Automated question answering: Review of the main approaches. In: Third International Conference on Information Technology and Applications, ICITA 2005, vol. 1, pp. 514–519. IEEE (2005)
Mann, G.: Fine-grained proper noun ontologies for question answering. In: Proceedings of the 2002 Workshop on Building and using Semantic Networks, vol. 11, pp. 1–7. Association for Computational Linguistics (2002)
Unger, C., Cimiano, P.: Pythia: Compositional meaning construction for ontology-based question answering on the semantic web. In: Muñoz, R., Montoyo, A., Métais, E. (eds.) NLDB 2011. LNCS, vol. 6716, pp. 153–160. Springer, Heidelberg (2011)
Szymański, J., Krawczyk, H., Deptuła, M.: Retrieval with semantic sieve. In: Selamat, A., Nguyen, N.T., Haron, H. (eds.) ACIIDS 2013, Part I. LNCS, vol. 7802, pp. 236–245. Springer, Heidelberg (2013)
Ogilvie, P., Voorhees, E., Callan, J.: On the number of terms used in automatic query expansion. Information Retrieval 12, 666–679 (2009)
Miller, G.A., Beckitch, R., Fellbaum, C., Gross, D., Miller, K.: Introduction to WordNet: An On-line Lexical Database. Cognitive Science Laboratory, Princeton University Press (1993)
Dumais, S.: Latent semantic analysis. Annual Review of Information Science and Technology 38, 188–230 (2004)
Lund, K., Burgess, C.: Hyperspace analog to language (hal): A general model of semantic representation. Language and Cognitive Processes (1996)
Jacquemin, C.: Syntagmatic and paradigmatic representations of term variation. In: Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics, pp. 341–348. Association for Computational Linguistics (1999)
Szymański, J.: Words context analysis for improvement of information retrieval. In: Nguyen, N.-T., Hoang, K., Jędrzejowicz, P. (eds.) ICCCI 2012, Part I. LNCS, vol. 7653, pp. 318–325. Springer, Heidelberg (2012)
Quillian, M.: Semantic memory. Semantic Information Processing 2, 227–270 (1968)
Szymański, J., Duch, W.: Information retrieval with semantic memory model. Cognitive Systems Research 14, 84–100 (2012)
Shawar, B., Atwell, E.: Chatbots: are they really useful? Zeitschrift für Computerlinguistik und Sprachtechnologie, 29 (2007)
Gärdenfors, P.: Semantics based on conceptual spaces. In: Logic and Its Applications, pp. 1–11 (2011)
Szymanski, J.: Comparative analysis of text representation methods using classification. Cybernetics and Systems 45, 180–199 (2014)
Quinlan, J.: Induction of decision trees. Machine Learning 1, 81–106 (1986)
Szymański, J.: Mining relations between wikipedia categories. In: Zavoral, F., Yaghob, J., Pichappan, P., El-Qawasmeh, E. (eds.) NDT 2010. CCIS, vol. 88, pp. 248–255. Springer, Heidelberg (2010)
Szymanski, J., Duch, W.: Semantic memory knowledge acquisition through active dialogues. In: Proceedings of the International Joint Conference on Neural Networks, IJCNN 2007, Celebrating 20 Years of Neural Networks, Orlando, Florida, USA, August 12-17, pp. 536–541 (2007)
Darken, C., Moody, J.: Fast adaptive k-means clustering: some empirical results. In: Int. Joint Conf. on Neural Networks, vol. 2, pp. 233–238 (1990)
Langville, A., Meyer, C.: Google page rank and beyond. Princeton Univ Pr. (2006)
Carpineto, C., Osiński, S., Romano, G., Weiss, D.: A survey of web clustering engines. ACM Computing Surveys (CSUR) 41, 17 (2009)
Ester, M., Kriegel, H., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, vol. 1996, pp. 226–231. AAAI Press (1996)
Szymański, J.: Interactive information retrieval algorithm for wikipedia articles. In: Yin, H., Costa, J.A.F., Barreto, G. (eds.) IDEAL 2012. LNCS, vol. 7435, pp. 200–207. Springer, Heidelberg (2012)
Zhao, Y., Karypis, G., Fayyad, U.: Hierarchical clustering algorithms for document datasets. Data Mining and Knowledge Discovery 10, 141–168 (2005)
Draszawka, K., Szymanski, J.: Thresholding strategies for large scale multi-label text classifier. In: 2013 The 6th International Conference on Human System Interaction (HSI), pp. 350–355. IEEE (2013)
Czarnul, P.: Modeling, run-time optimization and execution of distributed workflow applications in the jee-based beesycluster environment. The Journal of Supercomputing 63, 46–71 (2013)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Szymański, J. (2015). Information Retrieval in Wikipedia with Conceptual Directions. In: Natarajan, R., Barua, G., Patra, M.R. (eds) Distributed Computing and Internet Technology. ICDCIT 2015. Lecture Notes in Computer Science, vol 8956. Springer, Cham. https://doi.org/10.1007/978-3-319-14977-6_42
Download citation
DOI: https://doi.org/10.1007/978-3-319-14977-6_42
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-14976-9
Online ISBN: 978-3-319-14977-6
eBook Packages: Computer ScienceComputer Science (R0)