A review of ranking approaches for semantic search on Web

https://doi.org/10.1016/j.ipm.2013.10.004Get rights and content

Highlights

  • An exhaustive review of ranking approaches for semantic search on Web.

  • We identified three stages of ranking in semantic search process.

  • We identified criteria for the comparison of semantic search approaches.

  • We examined some open issues relevant to efficient semantic search.

Abstract

With ever increasing information being available to the end users, search engines have become the most powerful tools for obtaining useful information scattered on the Web. However, it is very common that even most renowned search engines return result sets with not so useful pages to the user. Research on semantic search aims to improve traditional information search and retrieval methods where the basic relevance criteria rely primarily on the presence of query keywords within the returned pages. This work is an attempt to explore different relevancy ranking approaches based on semantics which are considered appropriate for the retrieval of relevant information. In this paper, various pilot projects and their corresponding outcomes have been investigated based on methodologies adopted and their most distinctive characteristics towards ranking. An overview of selected approaches and their comparison by means of the classification criteria has been presented. With the help of this comparison, some common concepts and outstanding features have been identified.

Introduction

Web search is a key application of the Web where present search technologies rely on link analysis techniques that exploit the structure of Web to determine important documents. At the same time, they rely on simple term statistics to identify documents that are most relevant to a query. Mark-up languages such as (X)HTML are primarily focused to documents whose content should be interpretable by human interpreters and hence focused on document structure and its presentation. Little efforts are paid to the representation of the semantics of the content itself.

The growing availability of structured information on the Web enables new opportunities for information access. Semantically oriented search engines and specifically that use ontologies as enabling technologies have gained considerable interest in the last decade. The ever growing amount of ontology-based semantic mark-up in the Web provides an opportunity to start working in the direction of a new generation of open intelligent applications (Motta & Sabou, 2006). Efficient search is one such major envisioned application of this next generation Web popularly known as Semantic Web (Burners-Lee, Hendler, & Lassila, 2001).

Current Web search techniques are not directly suited for indexing and retrieval of semantic mark-up. Document is treated as a bag of words where words or word variants are recognized as indexing terms. The existing semantic mark-up is either simply ignored by many search engines for indexing purposes or not processed in a way that allows the mark-up to be used distinguishably from other text during the search.

The upcoming Web search is no longer limited to matching keywords of the query against documents but instead complex information needs can be expressed in a structured way with precise and structured answers as results. The kind of search in which user’s information needs are addressed by considering the meaning of user’s query as well as available resources is referred to as Semantic Search (Tran, Haase, & Studer, 2009).

Due to the ever increasing move from data to knowledge and increasing popularity of the vision of Semantic Web, there is equally increasing interest and work in automatically extracting and representing the metadata as semantic annotation to the documents and services on the Web (Shah, Finin, Joshi, Cost, & Mayfield, 2002). It seems that each Web page would possess semantic annotation that record additional details concerning the page itself. Annotations are based on classes of concepts and relations among them. The “vocabulary” for the annotation is usually expressed by means of ontology. The information contained in such agreed upon ontology is quite valuable for determining the relevance of the retrieved documents based on the “known” facts, relationships or the other data. Table 1 shows a comparison of features of Traditional Keyword-based search and Semantic-based search based on various parameters.

The two elements of the ontology are quite significant from the “relevant information access” point of view. The first element is the named entities such as names of persons, objects, countries, places, research articles, artists, and museum. Available techniques had been developed for entity oriented search of documents (Aleman-Meza, Arpinar, Nural, & Sheth, 2010). The second element is the relationships which provide meaning to the entity. The value of such relationships relies on the fact that those are named relationships. Relationships play a vital role in the relevant information access as the Web evolves continuously (Sheth & Ramakrishnan, 2007).

Section snippets

Motivation for ranking

Many users try to analyze information either by browsing information space or using a search engine. Search engine based systems generally locate documents based on keywords. Although they do return documents involving keywords inputted by user, a lot of retrieved documents have very less to do with user’s needs. The onus lies on the user to decide about the relevance of the retrieved documents using their mental model in order to obtain desired information. Efforts are consistently being made

Classification of existing ranking approaches

The research on ranking approaches for Semantic search on Web has been broadly classified into three categories in accordance with their stage of ranking. Also some distinctive features have been identified after careful analysis of the contemporary approaches. A detailed comparison of most of the distinct approaches has been made based on most distinctive characteristics towards ranking and presented in Table 2. Rather than discussing the varieties and the evolution of selected ideas, it has

Discussion

After the comparison of surveyed systems by means of classification criteria, some peculiar issues have been mined which are thought to be relevant with respect to efficient semantic search. In this subsection, these issues have been discussed with the intention to reflect their potential for further research.

Conclusion

In this paper, a number of promising ranking approaches for Semantic search on Web have been presented which have been classified in accordance with their stage of ranking. It is observed that unlike classical IR based search models, in case of semantic based search models, ranking involves at three stages termed as the first: Entity Ranking, the second: Relationship Ranking and finally: Semantic Document Ranking. Two entities are connected to each other by a single relationship or a chain of

References (29)

  • Cost, R. S., Kallurkar, S., Majithia, H., Nicholas, C. & Shi, Y. (2002). Integrating distributed information sources...
  • Dali, L. & Fortuna, B. (2011). Learning to rank for semantic search. In: Proc. of fourth international Semantic Search...
  • Ding, L., Finin, T., Joshi, A., Pan, R., Cost, R. S., Peng, Y., et al. (2004). Swoogle: A search and metadata engine...
  • Ding, L., Pan, R., Finin, T., Joshi, A., Peng, Y. & Kolari, P. (2005). Finding and ranking knowledge on the semantic...
  • Cited by (32)

    • Content and link-structure perspective of ranking webpages: A review

      2021, Computer Science Review
      Citation Excerpt :

      However, not all the ranking algorithms work and behave the same way, their strengths and limitations vary and therefore, they should be studied to come with the most prominent ones. Several survey articles focusing on different aspects of the ranking algorithms have been published. [21] classify ranking algorithms into entity ranking, relationship, and semantic document ranking by focusing on the semantics of the entities on the web. [22]

    • Supporting inter-topic entity search for biomedical Linked Data based on heterogeneous relationships

      2017, Computers in Biology and Medicine
      Citation Excerpt :

      With a set of keywords as the input query, retrieved entities are sorted with a descending order of the relevancy, while excluding the need to understand the schema of data [2]. The relevancy is computed with two types of models [4,5]: (1) query-dependent models, such as Siren [6], RareRank [7], ObjectRank [8]; and (2) query-independent models, such as ReCibRabk [9], Swoogle [10]. In order to provide better results, two methods are used in general for: (1) query expansions, and (2) filters in advanced search.

    • Ranking Search Results in Library Information Systems - Considering Ranking Approaches Adapted From Web Search Engines

      2015, Journal of Academic Librarianship
      Citation Excerpt :

      Moreover, the ability of search engines to better understand search queries and user intent via semantic components will also influence relevance ranking, e.g. Google's Hummingbird algorithm (Sullivan, 2013). In fact, several approaches towards semantic search and ranking issues have already been illustrated (see for example Agrawal, Sharma, Kumar, Parshav, & Goudar, 2013; Jindal, Bawa, & Batra, 2014; Shepherd, 2007). In addition, the combination of natural language processing and artificial intelligence may replace conventional keyword searching over the long term.

    • Diversity-aware retrieval of medical records

      2015, Computers in Industry
      Citation Excerpt :

      Considering the ambiguous queries, as users provide no more information for disambiguating their intents, a medical search engine should produce a set of diversified results that cover all possible intents implied by the given query, in order to enable users to find their interested medical information. From technical point of view, traditional IR technologies can be classified into two categories, i.e., content-based [25,27,28] and semantic-based [14,20,21,38,40,47] approaches. The former predicts the relevance of a document to the given query by considering only the document-inside content.

    • Rank web documents based on multi-domain ontology

      2024, Journal of Ambient Intelligence and Humanized Computing
    View all citing articles on Scopus
    View full text