Abstract
In this work several semantic approaches to concept-based query expansion and re-ranking schemes are studied and compared with different ontology-based expansion methods in web document search and retrieval. In particular, we focus on concept-based query expansion schemes where, in order to effectively increase the precision of web document retrieval and to decrease the users’ browsing time, the main goal is to quickly provide users with the most suitable query expansion. Two key tasks for query expansion in web document retrieval are to find the expansion candidates, as the closest concepts in web document domain, and to rank the expanded queries properly. The approach we propose aims at improving the expansion phase for better web document retrieval and precision. The basic idea is to measure the distance between candidate concepts using the PMING distance, a collaborative semantic proximity measure, i.e. a measure which can be computed using statistical results from a web search engine. Experiments show that the proposed technique can provide users with more satisfying expansion results and improve the quality of web document retrieval.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Abberley, D., Kirby, D., Renals, S., Robinson, T.: The THISL broadcast news retrieval system. In: Proc. ESCA ETRW Workshop Accessing Information in Spoken Audio (Cambridge), pp. 14–19 (1999); Section on Query Expansion – Concise, mathematical overview
Franzoni, V., Milani, A.: PMING Distance: A Collaborative Semantic Proximity Measure. In: WI-IAT, vol. 2, pp. 442–449 (2012); IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology (2012) ISBN: 978-1-4673-6057-9, doi:10.1109/WI-IAT.2012.226
Mitra, M., Singhal, A., Buckley, C.: Improving Automatic Query Expansion. In: Proc. of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 206–214
Wong, C.F.: Automatic Semantic Web document Annotation and Retrieval. PhD Thesis, Hong Kong Baptist University (August 2010)
Hollink, L., Schreiber, G., Wielinga, B.: Query Expansion for Web document Content Search (2008)
Santucci, V., Milani, A.: Covariance-based parameters adaptation in differential evolution. In: GECCO 2011 Proceedings of the 13th Annual Conference Companion on Genetic and Evolutionary Computation, pp. 687–690. ACM (2011) ISBN: 978-1-4503-0690-4
Gentili, E., Milani, A., Poggioni, V.: Data Summarization Model for User Action Log Files. In: Murgante, B., Gervasi, O., Misra, S., Nedjah, N., Rocha, A.M.A.C., Taniar, D., Apduhan, B.O. (eds.) ICCSA 2012, Part III. LNCS, vol. 7335, pp. 539–549. Springer, Heidelberg (2012)
Budanitsky, A., Hirst, G.: Semantic distance in wordnet: An experimental, application-oriented evaluation of five measures. In: Proceedings of Workshop on WordNet and Other Lexical Resources, Pittsburgh, PA, USA, p. 641. North American Chapter of the Association for Computational Linguistics (2001)
Resnik, P.: Using information content to evaluate semantic similarity in a taxonomy. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence, pp. 448–453 (1995)
Franzoni, V., Milani, A.: Heuristic Semantic Walk. In: Browsing a Collaborative Network With a Search Engine-Based Heuristic 2001. LNCS (in press, 2013)
Miller, E.G.A.: Wordnet: a lexical database for English. Communications of the ACM 38(11), 39–41 (1995)
Jin, Y., Khan, L., Wang, L., Awad, M.: Web document annotations by combining multiple evidence & wordnet. In: MULTIMEDIA 2005: Proceedings of the 13th Annual ACM International Conference on Multimedia, New York, NY, USA, pp. 706–715 (2005)
Andreou, A.: Ontologies and Query Expansion (2005)
Natsev, A., Haubold, A., Tesic, J., Xie, L., Yan, R.: Semantiv Concept-Based Query Expansion and Re-ranking for Multimedia Retrieval. In: Proceedings of the 15th ACM International Conference on Multimedia, New York, NY, USA, pp. 991–1000 (2007)
Wong, R.C.F., Leung, C.H.C.: Automatic Semantic Annotation of Real-World Web Web documents. IEEE Transactions on Pattern Analysis and Machine Intelligence 30(11), 1933–1944 (2008)
Reed, S., Lenat, D.: Mapping Ontologies into Cyc. In: Proceedings of AAAI 2002 Conference Workshop on Ontologies for The Semantic Web, Edmonton, Canada (2002)
Matuszek, C., Witbrock, M., Kahlert, R., Cabral, J., Schneider, D., Shah, P., Lenat, D.: Searching for Common Sense: Populating Cyc from the Web. In: Proceedings of the Twentieth National Conference on Artificial Intelligence, Pittsburgh, Pennsylvania (2005)
Gao, Y., Fan, J.: Incorporating Concept Ontology To Enable Probabilistic Concept Reasoning for Multi-Level Web document Annotation. In: Proceedings of the 8th ACM International Workshop on Multimedia information Retrieval, pp. 79–88 (2006)
Torralba, A., Fergus, R., Freeman, W.T.: 80 Million Tiny Web documents: A Large Data Set for Nonparametric Object and Scene Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 30(11), 1958–1970 (2008)
Leung, C.H.C., Chan, W.S., Milani, A., Liu, J., Li, Y.X.: Intelligent Social Media Indexing and Sharing Using an Adaptive Indexing Search Engine. ACM Transactions on Intelligent Systems and Technology (2012)
Franzoni, V., Gervasi, O.: Guidelines for Web Usability and Accessibility on the Nintendo Wii. In: Gavrilova, M.L., Tan, C.J.K. (eds.) Transactions on Computational Science VI. LNCS, vol. 5730, pp. 19–40. Springer, Heidelberg (2009)
Cilibrasi, R., Vitanyi, P.: The Google Similarity Distance. ArXiv.org (2004)
Cialdea Mayer, M., Limongelli, C., Orlandini, A., Poggioni, V.: Linear temporal logic as an executable semantics for planning languages. Journal of Logic, Language and Information 16(1) (2007)
Tam, A.M., Leung, C.H.C.: Semantic Content Retrieval and Structured Annotation: Beyond Keywords. In: ISO/IEC JTC1/SC29/WG11 MPEG00/M5738, Noordwijkerhout, Netherlands (March 2000)
Manning, D., Schutze, H.: Foundations of statistical natural language processing. The MIT Press, London (2002)
Baioletti, M., Milani, A., Poggioni, V., Rossi, F.: Experimental evaluation of pheromone models in ACOPlan. Annals of Mathematics and Artificial Intelligence 62, 187–217 (2011)
Turney, P.D.: Mining the web for synonyms: PMI-IR versus LSA on TOEFL. In: Flach, P.A., De Raedt, L. (eds.) ECML 2001. LNCS (LNAI), vol. 2167, pp. 491–502. Springer, Heidelberg (2001)
Leung, C.H.C., Chan, W.S., Milani, A., Liu, J., Li, Y.X.: Intelligent Social Media Indexing and Sharing Using an Adaptive Indexing Search Engine. ACM Transactions on Intelligent Systems and Technology (2012)
Li, Y.X., Leung, C.H.C.: Multi-level Semantic Characterisation and Refinement for Web Web document Search. In: The 2nd International Conference on Innovative Computing and Communication, pp. 70–73 (2011)
Santucci, V., Milani, A.: Particle Swarm Optimization in the EDAs framework. In: Gaspar-Cunha, A., Takahashi, R., Schaefer, G., Costa, L. (eds.) Soft Computing in Industrial Applications. AISC, vol. 96, pp. 87–96. Springer, Heidelberg (2011)
Santucci, V., Milani, A.: Community of Scientist Optimization An autonomy oriented approach to distributed optimization. AI Communications 25(2), 157–172 (2012)
Santucci, V., Milani, A.: Adaptive Memetic Particle Swarm Optimization. In: Proceedings of 16th Online Conference on Soft Computing in Industrial Applications (WSC16)
Santucci, V., Milani, A.: Community of Scientist Optimization: Foraging and Competing for Research Resources. In: IJCAI 2011 Workshop Proceedings, 18th RCRA International Workshop on Experimental Evaluation of Algorithms for Solving Problems with Combinatorial Explosion, pp. 66–80 (2011)
Milani, A., Baioletti, M., Santucci, V.: Discrete Differential Evolution for Learning Bayesian Network Structure. In: Proceedings of GECCO 2013, Genetic and Evolutionary Computation Conference (2013)
Milani, A., Santucci, V.: Particle Swarm Estimation of Distribution Algorithm for Lymphoma Classification through Automatic Biopsies Analysis. In: Proceedings of Mibisoc 2013, International Conference on Medical Imaging using Bio-inspired and Soft-Computing (2013)
Milani, A., Ukey, N., Niyogi, R., Poggioni, V., Singh, K.: A Bidirectional Heuristic for Web Service Composition with Costs. International Journal of Web and Grid Services, Inderscience 6, 160–175 (2010)
Franzoni, V.: Semantic Proximity Measures for the Web (Misure di Prossimità Semantic per il Web), Laurea Thesis, Department of Mathematics and Computer Science, Università degli Studi di Perugia, Italy (2012)
Milani, A., Poggioni, V.: Planning in Reactive Environments. Computational Intelligence 23, 439–463 (2007)
Milani, A., Santucci, A.V., Leung, V.C.: Optimal Design of Web Information Contents for E-Commerce Applications. In: Gelenbe, E., Lent, R., Sakellari, G., Sacan, A., Toroslu, H., Yazici, A. (eds.) Computer and Information Sciences. LNEE, vol. 62, pp. 978–990. Springer, Heidelberg (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Leung, C.H.C., Li, Y., Milani, A., Franzoni, V. (2013). Collective Evolutionary Concept Distance Based Query Expansion for Effective Web Document Retrieval. In: Murgante, B., et al. Computational Science and Its Applications – ICCSA 2013. ICCSA 2013. Lecture Notes in Computer Science, vol 7974. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39649-6_47
Download citation
DOI: https://doi.org/10.1007/978-3-642-39649-6_47
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-39648-9
Online ISBN: 978-3-642-39649-6
eBook Packages: Computer ScienceComputer Science (R0)