skip to main content
10.1145/3539618.3591789acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
abstract

Conversational Bibliographic Search

Published:18 July 2023Publication History

ABSTRACT

In almost every area of research, it is necessary to find experts and publications on a topic. However, finding experts and publications is a difficult task not only for computers, but also for humans. For example, searching for experts, a user often enters a topic into a search engine, which then checks which people have published on that topic. A problem arises when a user does not make their query specific enough which can happen intentionally, e.g. when the user is doing a navigational search, or unintentionally, e.g., when the user lacks knowledge. As a result, the quality of the search results may not be very high and the best results may not be found. Current and widely used search engines for bibliographic metadata, such as dblp[2], ResearchGate, Google Scholar or Semantic Scholar allow only keyword-based searches. Kreutz et al.[1] presented SchenQL, a query language for bibliographic metadata that allows users to formulate their queries more easily and precisely than SQL. However, it requires training to understand the language and is not as easy for non-experts to use e.g. Google Scholar.

To address the limitations of insufficient attention to the user's search intent and lack of search support, we aim to develop a conversational retrieval system in the domain of bibliographic metadata. This conversational search system assists users in achieving their search intent through a natural language dialog. It should be possible not only to find experts, but also to search for bibliographic metadata with the help of the system without prior knowledge.

In this work, we aim to answer the research question: How beneficial is a conversational information retrieval system for searching bibliographic data? To address the research question, our contribution is threefold. First, we present an architecture for such a conversational information retrieval system for bibliographic metadata. Second, we will implement all the components of this system and evaluate our system by comparing it to existing bibliographic data search engines in terms of effectiveness, efficiency, and user satisfaction. Third, we will create and publish a dataset consisting of user queries that we will use to train our system.

The architecture we propose consists of five main components: i) user intent classification, ii) a keyword extractor, iii) a search module, iv) a conversational module and v) the conversation history.

i) The task of user intent classification is to determine the goal the user wants to achieve with their search query. The user intent classification consists of a set of corresponding user intent classifiers, each of which is responsible for one intent. If no classifier can match the user's query to their intent, or multiple classifiers conclude that the query matches their intents, the system asks the user to specify the query accordingly. ii) If the intent is correctly determined, a keyword extractor extracts the actual search term from the query. iii) After the intent and the search term have been determined, the actual search takes place in the search module. The user's query could be reformulated into a SchenQL query (Kreutz et al.[1] and sent to the database. iv) In the conversational module, the results are converted into a natural language response and the user is given suggestions for further queries related to the previous ones. v) The conversation history stores the user's queries and the system's responses, both to consider the entire session when determining intent and to improve the system's components. For example, new question formulations could improve the accuracy of the user intent classifiers as well as reveal what new intents a user of such a conversational search system might have that have not yet been implemented.

In first experiments, we defined four user intents and already evaluated the user intent classification. The four user intents are: (1) searching for persons/authors/experts on a topic, (2) searching for publications by author name, (3) searching for publications on a topic and (4) searching for similar topics of a topic. Our classifiers achieved an accuracy of 0.998 in correctly determining user intent.

In the future, we not only want to evaluate the individual components of a conversational information retrieval system for bibliographic data, but also want to work out the advantages and disadvantages of the conversational information retrieval system for bibliographic data, in comparison to already existing systems that do not support the user in their search process via natural language conversations.

Skip Supplemental Material Section

Supplemental Material

SIGIR23-doc0597.mp4

mp4

113.7 MB

References

  1. Christin Katharina Kreutz, MichaelWolz, Jascha Knack, BenjaminWeyers, and Ralf Schenkel. 2022. SchenQL: in-depth analysis of a query language for bibliographic metadata. Int. J. Digit. Libr. 23, 2 (2022), 113--132. https://doi.org/10.1007/s00799-021-00317-8Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Michael Ley. 2009. DBLP - Some Lessons Learned. Proc. VLDB Endow. 2, 2 (2009), 1493--1500. https://doi.org/10.14778/1687553.1687577Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Conversational Bibliographic Search

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval
        July 2023
        3567 pages
        ISBN:9781450394086
        DOI:10.1145/3539618

        Copyright © 2023 Owner/Author

        Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 18 July 2023

        Check for updates

        Qualifiers

        • abstract

        Acceptance Rates

        Overall Acceptance Rate792of3,983submissions,20%
      • Article Metrics

        • Downloads (Last 12 months)50
        • Downloads (Last 6 weeks)7

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader