For more than a decade the growth of the level of non-English activity on the web has been noted by many authors (Spink et al. 2002; Global Reach 2004; Yang 2005; Kwok 2006; Chung 2008; Miniwatts International 2009a, b) and there is no reason to expect the pace of this change to slacken. The pace is likely to increase especially in continents that currently have low Internet penetration. The Web has become a dominant global multicultural and multilingual pool of data. Although in recent years search engines have improved their handling of non-English queries, studies show that many problems still exist and are worthy of further research.

This special issue aims at addressing the challenges and directions in Non-English Web retrieval by providing insights into the existing problems and presenting specific solutions. The call for papers for this special issue was released on February 2008. Twenty-nine papers were received by June 2008 and each was reviewed by three independent reviewers. After the review process nine papers were accepted for inclusion in the special issue. These studies address various aspects of the special issue topics and concern many non-English languages such as Arabic, Polish, Spanish, Greek, Japanese, Amharic, and a few other European languages.

In the first paper, the special issue editors, Lazarinis, Vilares, Tait and Efthimiadis, provide an overview of the research on non-English search through an extensive literature review. The research issues discussed in these studies are categorized in order to identify the research questions and solutions proposed. Further research is proposed at the end of each section.

Eguchi and Croft use a structured query approach using word-based units to capture compound words, as well as more general phrases, in a query. The paper discusses problems, such as compound words and segmentation that appear in Japanese information retrieval and some research efforts to address these problems.

Knowledge-poor methods for tackling person name matching and lemmatization in Polish, a highly inflectioned language with a complex personal name declension paradigm is discussed in Piskorski, Wieloch and Sydow.

Hammo presents a framework to enhance the retrieval effectiveness of search engines to search for diacritic and diacritic-less Arabic text through query expansion techniques. Query expansion for searching Arabic text is promising, according to the results of the study.

The effect of multilingual queries for homepage finding is studied in Blanco and Lioma, where the aim of their retrieval system is to return a specific homepage. The study reports that Latinized versions of the queries and the local adaptations of the search engines produce better results in many cases.

Efthimiadis, Malevris, Kousaridas, Lepeniotou and Loutas conducted an evaluation using Greek and Latinized homepage finding queries for known Greek organizations. The analysis showed that the global search engines ignore the characteristics of the Greek language, hence treating semantically similar Greek queries differently.

The information-seeking behaviour of non-English Web users is studied in Berendt and Kralisch. The study established that content and link creation behaviour leads to an under-representation of non-English languages in the Web. It also provides evidence that link-following behaviour leads to an under-utilization of non-English content.

Guzman, Montes-y-Gómez, Rosso and Villaseñor-Pineda study the use of the Web as a Spanish linguistic resource for text classification. They retrieved their initial data using Google and they were able to develop a self-training method, which makes use of the Web as a lexical support resource.

Classification of Amharic texts compiled from the Web is discussed in Asker, Argaw, Gambäck, Asfeha and Habte The effect of operations like stemming or part-of-speech tagging on text classification was also investigated. The experiments indicated that stemming plays a less important role than expected for text classification for a highly inflected language like Amharic.

The main conclusion from the special issue papers is that there are still many open research issues for non-English Web search. The papers highlight the need for more research.