Abstract
In this paper we propose a novel comparative web search system – BiCWS, which can mine cognitive differences from web search results in a multi-language setting. Given a topic represented by two queries (they are the translations of each other) in two languages, the corresponding web search results for the two queries are firstly retrieved by using a general web search engine, and then the bilingual facets for the topic are mined by using a bilingual search results clustering algorithm. The semantics in Wikipedia are leveraged to improve the bilingual clustering performance. After that, the semantic distributions of the search results over the mined facets are visually presented, which can reflect the cognitive differences in the bilingual communities. Experimental results show the effectiveness of our proposed system.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Anttila, R.: Historical and Comparative Linguistics (Current Issues in Linguistic Theory). John Benjamins Pub. Co., Amsterdam (1989)
Weisstein, U.: Comparative Literature and Literary Theory: Survey and Introduction. Indiana University Press, Bloomington (1974)
de Zepetnek, S.: Comparative Central European Culture. Purdue University Press, West Lafayette (2002)
Jindal, N., Liu, B.: Mining Comparative Sentences and Relations. In: 21st National Conference on Artificial Intelligence, pp. 1331–1336. AAAI Press, Palo Alto (2006)
Zhai, C., Velivelli, A., Yu, B.: A Cross-Collection Mixture Model for Comparative Text Mining. In: 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 743–748. ACM, New York (2004)
Kim, H.D., Zhai, C.: Generating Comparative Summaries of Contradictory Opinions in Text. In: 18th ACM Conference on Information and Knowledge Management, pp. 385–394. ACM, New York (2009)
Liu, C., Huang, Q., Jiang, S., Xu, C.: The third eye: mining the visual cognition across multi-language communities. In: 18th International Conference on Multimedia, pp. 431–440. ACM, New York (2010)
Carpineto, C., Osiński, S., Romano, G., Weiss, D.: A survey of Web clustering engines. ACM Comput. Surv. 41, 1–38 (2009)
Zamir, O., Etzioni, O.: Grouper: a dynamic clustering interface to Web search results. Computer Networks, 1361–1374 (1999)
Zeng, H.-J., He, Q.-C., Chen, Z., Ma, W.-Y., Ma, J.: Learning to cluster web search results. In: 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 210–217. ACM, New York (2004)
Sun, J.-T., Wang, X., Shen, D., Zeng, H.-J., Chen, Z.: CWS: A Comparative Web Search System. In: 5th International Conference on World Wide Web, pp. 467–476. ACM, New York (2006)
Barrachina, S., Vilar, J.M.: Bilingual clustering using monolingual algorithms. In: 8th International Conference on Theoretical and Methodological Issues in Machine Translation (TMI 1999), pp. 77–87 (1999)
Kiran Kumar, N., Santosh, G.S.K., Varma, V.: Multilingual Document Clustering Using Wikipedia as External Knowledge. In: Hanbury, A., Rauber, A., de Vries, A.P. (eds.) IRFC 2011. LNCS, vol. 6653, pp. 108–117. Springer, Heidelberg (2011)
Li, B., Gaussier, E., Aizawa, A.: Clustering comparable corpora for bilingual lexicon extraction. In: 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Short Papers, vol. 2, pp. 473–478. Association for Computational Linguistics, Stroudsburg (2011)
Dagan, I., Itai, A.: Word sense disambiguation using a second language monolingual corpus. Comput. Linguist. 20, 563–596 (1994)
Khapra, M.M., Joshi, S., Chatterjee, A., Bhattacharyya, P.: Together we can: bilingual bootstrapping for WSD. In: 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 561–569. Association for Computational Linguistics, Stroudsburg (2011)
Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using Wikipedia-based explicit semantic analysis. In: 20th International Joint Conference on Artifical Intelligence, pp. 1606–1611. Morgan Kaufmann Publishers Inc., San Francisco (2007)
Potthast, M., Stein, B., Anderka, M.: A Wikipedia-Based Multilingual Retrieval Model. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds.) ECIR 2008. LNCS, vol. 4956, pp. 522–530. Springer, Heidelberg (2008)
Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, Heidelberg (2006)
Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological) 39, 1–38 (1977)
Kuhn, H.W.: The Hungarian Method for the assignment problem. Naval Research Logistics Quarterly 2, 83–97 (1955)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Huang, X., Wan, X., Xiao, J. (2012). BiCWS: Mining Cognitive Differences from Bilingual Web Search Results. In: Wang, X.S., Cruz, I., Delis, A., Huang, G. (eds) Web Information Systems Engineering - WISE 2012. WISE 2012. Lecture Notes in Computer Science, vol 7651. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35063-4_5
Download citation
DOI: https://doi.org/10.1007/978-3-642-35063-4_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35062-7
Online ISBN: 978-3-642-35063-4
eBook Packages: Computer ScienceComputer Science (R0)