The Web continues to grow and evolve very fast, changing our daily lives. This information production and consumption activity represents the collaborative work of the hundred of millions of institutions and people that contribute content to the Web, as well as the one and a half billion people that use it. Due to the huge amount of available data, and to the fast changing rate of the Web, users find very hard to locate or mine information relevant to their specific information needs.

In last years, significant research efforts have addressed the challenge of improving the way we search and extract from the Web data and information relevant to our needs. One key technique to address this challenge is the definition and use of Web mining techniques. That is, mining the content, structure and usage of the Web to improve search effectiveness. As a result, Web mining has provided neat solutions to many search improvements, including query suggestion and reformulation, clustering results, query results diversification, index layout and caching, etc. Hence Web mining has become fundamental to the development of search.

This Special Issue is focused on Web Mining to improve search: it collects six contributions, which propose innovative techniques of Web mining to improve tasks such as query and search advertising prediction, query reformulation, opinion mining and information filtering.

As witnessed by the several techniques related to Web usage, Web logs provide a very useful hint to information that users’ possibly find useful and interesting. To the aim of effectively representing the information available in Web logs for further processing, in the paper titled “A Unified Representation of Web Logs for Mining Applications” a data structure is defined that offers a representation of the collective search activity performed by the users of an external or internal search engine. This structured representation can be used in several Web mining tasks like query suggestion and Web page categorization.

Information Filtering is another important task to provide users with information relevant to their needs. The paper titled “A Pattern Mining Approach for Information Filtering Systems” addresses the problem of clearly identifying the boundary between positive and negative streams for information filtering systems, using a pattern mining approach to select some offenders from the negative documents, where an offender can be used to reduce the side effects of noisy features.

Query reformulation is the main technique users apply to refine their search, in a way that better captures their intents. An understanding of the users’ query reformulation patterns, through an automatic analysis of query logs, constitute a big help to predict the user intent, to define automatic query reformulation and query suggestion techniques. In “Query Reformulation Mining: Models, Patterns and Applications” the authors propose a model for classifying user query reformulations into broad classes (generalization, specialization, error correction or parallel move). They also define methods of query recommendation based on short random walks on the query-flow graph.

It is well known that user’s browsing behaviour through link clicks represents a useful source of information that can be exploited to improve search effectiveness. In “Incorporating Web Browsing Activities into Anchor Texts for Web Search” the browsing behaviour of Web users is exploited to affect anchor text weighting. As the authors outline, as the aim of anchor texts is to help users browse the Web, an analysis of the browsing behaviour of Web users provides useful information for anchor text weighting. To this aim, this paper proposes a smoothing method for the new anchor models that incorporate browsing activities of Web users into anchor texts for Web search.

Another kind of click behaviour is represented by clicks on search advertisements; search advertisement has become in recent years a hot research topic. In the paper titled “The Sum of Its Parts: Reducing Sparsity in Click Estimation with Query Segments” some relevance models for sponsored search are defined based on mining click behaviour for partial user queries.

Among the various techniques to mine information from texts, the ones finalized to the extraction of subjective opinions have raised an increasing interest in recent years. The contribution titled “Sentiment Classification: A Lexical Similarity Based Approach for Extracting Subjectivity in Documents” proposes a new approach to document sentiment classification, based on a subjective feature extraction methodology.

Finally, as the guest editors of this special issue, we would like to express our gratitude to the authors for their contributions and to the reviewers for their invaluable help.