ABSTRACT
As non-English languages are growing exponentially on the Web, the number of online non-English speakers who realizes the importance of finding information in different languages is enormously growing. However, the major general purpose search engines such as Google, Yahoo, etc have been lagging behind in providing indexes and search features to handle non-English languages. Amharic, which is the family of Semitic languages and the official working language of the federal government of Ethiopia, is one of these languages with a rapidly growing content on the Web. As a result, the need to develop bilingual search engine that handles the specific characteristics of the users' native language query (Amharic) and retrieves documents in both Amharic and English languages becomes more apparent.
In this research work, we designed a model for an Amharic-English Search Engine and developed a bilingual Web search engine based on the model that enables Web users for finding the information they need in Amharic and English languages. In doing so, we identified different language dependent query preprocessing components for query translation. We have also developed a bidirectional dictionary-based translation system which incorporates a transliteration component to handle proper names which are often missing in bilingual lexicons. We have used an Amharic search engine and an open source English search engine (Nutch) as our underlying search engines for Web document crawling, indexing, searching, ranking and retrieving.
To evaluate the effectiveness of our Amharic-English bilingual search engine, precision measures were conducted on the top 10 retrieved Web documents. The experimental results showed that the Amharic-English cross-lingual retrieval engine performed 74.12% of its corresponding English monolingual retrieval engine and the English-Amharic cross-lingual retrieval engine performed 78.82% of its corresponding Amharic monolingual retrieval engine. The bilingualism advantage of the system is also evaluated by comparing its results with general purpose search engines. The overall evaluation results of the system are found to be promising.
- Atelach Alemu Argaw. "Amharic-English Information Retrieval with Pseudo Relevance Feedback". In: Peters, C., et al. (eds.) Advances in Multilingual and Multimodal Information Retrieval: 8th Workshop of the Cross Language Evaluation Forum, CLEF 2007, Budapest, Hungary, September 19--21, 2007. Google ScholarDigital Library
- Tessema Mindaye, Hassen Redwan, Solomon Atnafu. "Searching the Web for Amharic Content". On the International Journal of Multimedia Processing and Technologies (JMPT), 2010. ISSN Print ISSN: 0976-4127."Google Scholar
- Atelach Alemu Argaw and Lars Asker. "Amharic-English Information Retrieval". Working Notes of CLEF 2006, Alicante, Spain. September 2006.Google Scholar
- Atelach Alemu Argaw, Lars Asker, Rickard Cöster and Jussi Karlgren. "Dictionary-based Amharic - English Information Retrieval". In Proceedings of Cross Language Evaluation Forum (CLEF 2004), Bath, UK. September 2004. Google ScholarDigital Library
- Kristen Parton, Kathleen R. McKeown, James Allan, and Enrique Henestroza. "Simultaneous Multilingual Search for Translingual Information Retrieval". In Proceeding of the 17th ACM conference on Information and knowledge management, Napa Valley, California, USA. ACM New York, NY, USA, 2008. Google ScholarDigital Library
- Karunesh Arora, Ankur Garg, Gour Mohan, Somiram Singla, and Chander Mohan. "Cross Lingual Information Retrieval Efficiency Improvement through Transliteration". In Proceedings of ASCNT -- 2009, CDAC, Noida, India, pp. 65--71, 2009.Google Scholar
- Judit Bar-Ilan and Tatyana Gutman: "How do search engines handle non-English queries? - A case study". WWW (Alternate Paper Tracks), Budapest, Hungary, 2003.Google Scholar
- Amharic-WIKIPIDIA, the free encyclopaedia. Available at: http://en.wikipedia.org/wiki/Amharic, Accessed on 21 September 2009.Google Scholar
- Amharic Ethiopia Language. Available at: http://www.free-press-release.com/news/200907/1248234344.html, Accessed on July 21, 2009.Google Scholar
- Amharic Language. Available at: http://multilingualbooks.com/amharic.html, Accessed on 11 August 2010.Google Scholar
- Wen-hui Zhang, Hua-lin Qian, Wei Mao and Guo-nian Sun. "A Multilingual (Chinese, English) Indexing, Retrieval, Searching Search Engine". Available at http://www.isoc.org/inet99/proceedings/posters/210/index.htm, Accessed on August 10, 2009.Google Scholar
- Joanne Capstick, Abdel Kader Diagne, Gregor Erbach and Hans Uszkoreit. "MULINEX: Multilingual Web Search and Navigation", Accessed on August 25, 2009, Available at http://eprints.kfupm.edu.sa/52030/1/52030.pdf, Published on 08.02.99.Google Scholar
- Jialun Qin, Yilu Zhou, Michael Chau and Hsinchun Chen. "Multilingual Web retrieval: An experiment in English--Chinese business intelligence". John Wiley & Sons, Inc. New York, NY, USA, 2006Google Scholar
- Mohammed Aljlayl, Ophir Frieder, and David Grossman. "On Arabic-English Cross-Language Information Retrieval: A Machine Translation Approach". IEEE Computer Society Washington, DC, USA, 2002Google Scholar
- P. L. Nikesh, Sumam Mary Idicula, S. David Peter. "English-Malayalam Cross-Lingual Information Retrieval--an Experience', In Proceedings of IEEE International Conference on Electro/Information Technology, Ames, Iowa State University, May 2008Google Scholar
- A. Arasu, J. Cho, H. Garcia-Molina, A. Paepcke, and S. Raghavan. "Searching the Web". ACM Transactions on Internet Technology (TOIT), 2001. Google ScholarDigital Library
- Wessel Kraaij, Jian-Yun Nie, and Michel Simard. "Embedding Web-based statistical translation models in cross-language information retrieval." MIT Press Cambridge, MA, USA, 2003.Google Scholar
Index Terms
- Amharic-English bilingual web search engine
Recommendations
Using English information in non-English web search
iNEWS '08: Proceedings of the 2nd ACM workshop on Improving non english web searchingThe leading web search engines have spent a decade building highly specialized ranking functions for English web pages. One of the reasons these ranking functions are effective is that they are designed around features such as PageRank, automatic query ...
Design and Implementation of Amharic Search Engine
SITIS '09: Proceedings of the 2009 Fifth International Conference on Signal Image Technology and Internet Based SystemsThe Web is a huge repository of information in the form of text, image, audio, and video. People use search engines, such as Google, Yahoo!, etc, to discover resources from this huge repository. These general purpose search engines are designed and ...
Comments