A review of ranking approaches for semantic search on Web

doi:10.1016/j.ipm.2013.10.004

Information Processing & Management

Volume 50, Issue 2, March 2014, Pages 416-425

https://doi.org/10.1016/j.ipm.2013.10.004 Get rights and content

Highlights

•
An exhaustive review of ranking approaches for semantic search on Web.
•
We identified three stages of ranking in semantic search process.
•
We identified criteria for the comparison of semantic search approaches.
•
We examined some open issues relevant to efficient semantic search.

Abstract

With ever increasing information being available to the end users, search engines have become the most powerful tools for obtaining useful information scattered on the Web. However, it is very common that even most renowned search engines return result sets with not so useful pages to the user. Research on semantic search aims to improve traditional information search and retrieval methods where the basic relevance criteria rely primarily on the presence of query keywords within the returned pages. This work is an attempt to explore different relevancy ranking approaches based on semantics which are considered appropriate for the retrieval of relevant information. In this paper, various pilot projects and their corresponding outcomes have been investigated based on methodologies adopted and their most distinctive characteristics towards ranking. An overview of selected approaches and their comparison by means of the classification criteria has been presented. With the help of this comparison, some common concepts and outstanding features have been identified.

Introduction

Web search is a key application of the Web where present search technologies rely on link analysis techniques that exploit the structure of Web to determine important documents. At the same time, they rely on simple term statistics to identify documents that are most relevant to a query. Mark-up languages such as (X)HTML are primarily focused to documents whose content should be interpretable by human interpreters and hence focused on document structure and its presentation. Little efforts are paid to the representation of the semantics of the content itself.

The growing availability of structured information on the Web enables new opportunities for information access. Semantically oriented search engines and specifically that use ontologies as enabling technologies have gained considerable interest in the last decade. The ever growing amount of ontology-based semantic mark-up in the Web provides an opportunity to start working in the direction of a new generation of open intelligent applications (Motta & Sabou, 2006). Efficient search is one such major envisioned application of this next generation Web popularly known as Semantic Web (Burners-Lee, Hendler, & Lassila, 2001).

Current Web search techniques are not directly suited for indexing and retrieval of semantic mark-up. Document is treated as a bag of words where words or word variants are recognized as indexing terms. The existing semantic mark-up is either simply ignored by many search engines for indexing purposes or not processed in a way that allows the mark-up to be used distinguishably from other text during the search.

The upcoming Web search is no longer limited to matching keywords of the query against documents but instead complex information needs can be expressed in a structured way with precise and structured answers as results. The kind of search in which user’s information needs are addressed by considering the meaning of user’s query as well as available resources is referred to as Semantic Search (Tran, Haase, & Studer, 2009).

Due to the ever increasing move from data to knowledge and increasing popularity of the vision of Semantic Web, there is equally increasing interest and work in automatically extracting and representing the metadata as semantic annotation to the documents and services on the Web (Shah, Finin, Joshi, Cost, & Mayfield, 2002). It seems that each Web page would possess semantic annotation that record additional details concerning the page itself. Annotations are based on classes of concepts and relations among them. The “vocabulary” for the annotation is usually expressed by means of ontology. The information contained in such agreed upon ontology is quite valuable for determining the relevance of the retrieved documents based on the “known” facts, relationships or the other data. Table 1 shows a comparison of features of Traditional Keyword-based search and Semantic-based search based on various parameters.

The two elements of the ontology are quite significant from the “relevant information access” point of view. The first element is the named entities such as names of persons, objects, countries, places, research articles, artists, and museum. Available techniques had been developed for entity oriented search of documents (Aleman-Meza, Arpinar, Nural, & Sheth, 2010). The second element is the relationships which provide meaning to the entity. The value of such relationships relies on the fact that those are named relationships. Relationships play a vital role in the relevant information access as the Web evolves continuously (Sheth & Ramakrishnan, 2007).

Section snippets

Motivation for ranking

Many users try to analyze information either by browsing information space or using a search engine. Search engine based systems generally locate documents based on keywords. Although they do return documents involving keywords inputted by user, a lot of retrieved documents have very less to do with user’s needs. The onus lies on the user to decide about the relevance of the retrieved documents using their mental model in order to obtain desired information. Efforts are consistently being made

Classification of existing ranking approaches

The research on ranking approaches for Semantic search on Web has been broadly classified into three categories in accordance with their stage of ranking. Also some distinctive features have been identified after careful analysis of the contemporary approaches. A detailed comparison of most of the distinct approaches has been made based on most distinctive characteristics towards ranking and presented in Table 2. Rather than discussing the varieties and the evolution of selected ideas, it has

Discussion

After the comparison of surveyed systems by means of classification criteria, some peculiar issues have been mined which are thought to be relevant with respect to efficient semantic search. In this subsection, these issues have been discussed with the intention to reflect their potential for further research.

Conclusion

In this paper, a number of promising ranking approaches for Semantic search on Web have been presented which have been classified in accordance with their stage of ranking. It is observed that unlike classical IR based search models, in case of semantic based search models, ranking involves at three stages termed as the first: Entity Ranking, the second: Relationship Ranking and finally: Semantic Document Ranking. Two entities are connected to each other by a single relationship or a chain of

References (29)

B. Aleman-Meza et al.
Swetodblp: Ontology of computer science publications
Journal of Web Semantics: Science, Services and Agents on the World Wide Web
(2007)
A. Kiryakov et al.
Semantic annotation, indexing and retrieval
Journal of Web Semantics: Science, Services and Agents on the World Wide Web
(2004)
X. Ning et al.
RSS: A framework enabling ranked search on the semantic web
Information Processing and Management
(2008)
O. Vechtomova et al.
A domain-independent approach to finding related entities
Information Processing and Management
(2012)
W. Wei et al.
Rational research model for ranking semantic entities
Information Sciences
(2011)
Aleman-Meza, B., Arpinar, I. B., Nural, M. V. & Sheth, A. P. (2010). Ranking documents semantically using ontological...
B. Aleman-Meza et al.
Ranking complex relationships on the semantic web
IEEE Internet Computing
(2005)
Anyanwu, K., Maduko, A. & Sheth, A. (2005). SemRank: Ranking complex relation search results on the semantic web. In...
Burners-Lee, T., Hendler, J. & Lassila, O. (2001). The semantic web. Scientific American (pp....
P. Castells et al.
An adaptation of the vector space model for ontology-based information retrieval
IEEE Transactions on Knowledge and Data Engineering
(2007)

Cost, R. S., Kallurkar, S., Majithia, H., Nicholas, C. & Shi, Y. (2002). Integrating distributed information sources...

Dali, L. & Fortuna, B. (2011). Learning to rank for semantic search. In: Proc. of fourth international Semantic Search...

Ding, L., Finin, T., Joshi, A., Pan, R., Cost, R. S., Peng, Y., et al. (2004). Swoogle: A search and metadata engine...

Ding, L., Pan, R., Finin, T., Joshi, A., Peng, Y. & Kolari, P. (2005). Finding and ranking knowledge on the semantic...

Cited by (32)

Content and link-structure perspective of ranking webpages: A review
2021, Computer Science Review
Citation Excerpt :
However, not all the ranking algorithms work and behave the same way, their strengths and limitations vary and therefore, they should be studied to come with the most prominent ones. Several survey articles focusing on different aspects of the ranking algorithms have been published. [21] classify ranking algorithms into entity ranking, relationship, and semantic document ranking by focusing on the semantics of the entities on the web. [22]
The delivery of ranked relevant results is probably the most important factor in making a web search engine acceptable to its users. This inspiration has led the search engine engineers and researchers to conceive ranking algorithms that can provide the most relevant results (webpages) at the top of the Search Engines Results Page (SERP). To rank webpages, several features are exploited in research studies related to the content and link structure of the web. This article discusses and assesses the webpage ranking algorithms proposed in the domains of content-based and link-based rankings in the past two decades. The assessment of these algorithms is done using features extracted from the relevant literature. The strengths and limitations of these features as well as the ranking algorithms are highlighted and discussed. The findings of this work suggest that the link-based ranking factors are still the dominant force in ranking webpages but these alone are by no means enough to fulfill the information needs of the users. An acceptable solution must contain features from both the content-based and link-based ranking domains integrated with the temporal features and users’ behavior information. Possible future directions are also highlighted.
Supporting inter-topic entity search for biomedical Linked Data based on heterogeneous relationships
2017, Computers in Biology and Medicine
Citation Excerpt :
With a set of keywords as the input query, retrieved entities are sorted with a descending order of the relevancy, while excluding the need to understand the schema of data [2]. The relevancy is computed with two types of models [4,5]: (1) query-dependent models, such as Siren [6], RareRank [7], ObjectRank [8]; and (2) query-independent models, such as ReCibRabk [9], Swoogle [10]. In order to provide better results, two methods are used in general for: (1) query expansions, and (2) filters in advanced search.
The keyword-based entity search restricts search space based on the preference of search. When given keywords and preferences are not related to the same biomedical topic, existing biomedical Linked Data search engines fail to deliver satisfactory results. This research aims to tackle this issue by supporting an inter-topic search—improving search with inputs, keywords and preferences, under different topics.
This study developed an effective algorithm in which the relations between biomedical entities were used in tandem with a keyword-based entity search, Siren. The algorithm, PERank, which is an adaptation of Personalized PageRank (PPR), uses a pair of input: (1) search preferences, and (2) entities from a keyword-based entity search with a keyword query, to formalize the search results on-the-fly based on the index of the precomputed Individual Personalized PageRank Vectors (IPPVs).
Our experiments were performed over ten linked life datasets for two query sets, one with keyword-preference topic correspondence (intra-topic search), and the other without (inter-topic search). The experiments showed that the proposed method achieved better search results, for example a 14% increase in precision for the inter-topic search than the baseline keyword-based search engine.
The proposed method improved the keyword-based biomedical entity search by supporting the inter-topic search without affecting the intra-topic search based on the relations between different entities.
Ranking Search Results in Library Information Systems - Considering Ranking Approaches Adapted From Web Search Engines
2015, Journal of Academic Librarianship
Citation Excerpt :
Moreover, the ability of search engines to better understand search queries and user intent via semantic components will also influence relevance ranking, e.g. Google's Hummingbird algorithm (Sullivan, 2013). In fact, several approaches towards semantic search and ranking issues have already been illustrated (see for example Agrawal, Sharma, Kumar, Parshav, & Goudar, 2013; Jindal, Bawa, & Batra, 2014; Shepherd, 2007). In addition, the combination of natural language processing and artificial intelligence may replace conventional keyword searching over the long term.
For an information retrieval system to be successful, it must have the ability to rank search results. As web search engines are the most often used and — in terms of ranking functionality — the most advanced existing systems, the principles they are based on and the strategies they use can be advantageous when applied to the library context. We categorize ranking factors into six different groups: 1. text statistics, 2. popularity, 3. freshness, 4. locality and availability, 5. content properties and 6. user background. We discuss the basic concepts and assumptions these ranking factors involve and offer potential implementations in the library context. The practice recommended here is for libraries to not only apply selected ranking factors — as existing library information systems already do — but to systematically test for the ranking factors best suited to their systems. We argue for a user-centric view on ranking, because in the end, ranking should be for the benefit of the user, and user preferences may vary across different contexts.
Diversity-aware retrieval of medical records
2015, Computers in Industry
Citation Excerpt :
Considering the ambiguous queries, as users provide no more information for disambiguating their intents, a medical search engine should produce a set of diversified results that cover all possible intents implied by the given query, in order to enable users to find their interested medical information. From technical point of view, traditional IR technologies can be classified into two categories, i.e., content-based [25,27,28] and semantic-based [14,20,21,38,40,47] approaches. The former predicts the relevance of a document to the given query by considering only the document-inside content.
The widely adoption of Electronic Medical Records (EMRs) causes an explosive growth of the medical and clinical data. It makes the medical search technologies become critical to find useful patient information in the large medical dataset. However, the high quality medical search is a challenging task, in particular due to the inherent complexity and ambiguity of medical terminology. In this paper, by exploiting the uncertainty in ambiguous medical queries, we propose a novel semantic-based approach to achieve the diversity-aware retrieval of EMRs, i.e., both the relevance and novelty are considered for EMR ranking. With the support of medical domain ontologies, we first mine all the potential semantics (concepts and relations between them) from a user query and consume them to model the multiple query aspects. Then, we propose a novel diversification strategy, which considers not only the aspect importance but also the aspect similarity, to perform the diversity-aware EMR ranking. A real-world pilot study, which utilizes the proposed medical search approach to improve the second use of the EMRs, is reported. We believe that our experience can serve as an important reference for the development of similar applications in a medical data utilization and sharing environment.
Rank web documents based on multi-domain ontology
2024, Journal of Ambient Intelligence and Humanized Computing
Understanding User Intent Modeling for Conversational Recommender Systems: A Systematic Literature Review
2023, Research Square

View all citing articles on Scopus

View full text

A review of ranking approaches for semantic search on Web

Highlights

Abstract

Introduction

Section snippets

Motivation for ranking

Classification of existing ranking approaches

Discussion

Conclusion

Journal of Web Semantics: Science, Services and Agents on the World Wide Web

Journal of Web Semantics: Science, Services and Agents on the World Wide Web

Information Processing and Management

Information Processing and Management

Information Sciences

Ranking complex relationships on the semantic web

IEEE Internet Computing

An adaptation of the vector space model for ontology-based information retrieval

IEEE Transactions on Knowledge and Data Engineering