Searching bibliographic data using graphs: A visual graph query interface
Introduction
Graph data are prevalent in the real world as data from a variety of domains (e.g., physics, chemistry, biology, sociology, and computer science) can be represented by graph data models (Aggarwal & Wang, 2010). Graph data models can represent relational information and enable a number of applications by supporting efficient searching and mining (Cook & Holder, 2006). Because of this, a few studies have investigated ways of generating graphs from arbitrary data (e.g., Baeza-Yates, Brisaboa, & Larriba-Pey, 2010). Bibliographic data are graph data in nature because they can be represented in the form of interconnected papers, authors, terms, sources, and organizations. Recent bibliometric studies, including searching bibliographic data (Zhu, Yan, & Song, 2016), measuring scholarly impact (Yan & Ding, 2009), and mining bibliographic networks (Sun, Barber, Gupta, Aggarwal, & Han, 2011) have taken the advantage of the graphical representation of bibliographic data. Regardless of the physical representations (e.g., relational databases) of graph data, efficient searching of graph data is one of primary tasks for the information retrieval community (e.g., Kacholia et al., 2005; Jiang, Wang, Yu, & Zhou, 2007; Yuan, Wang, Chen, & Wang, 2013). User interface is an integral part of searching (Hearst, 2009), and a variety of user interfaces have been proposed to support searching graph data, including regular expression- (Giugno & Shasha, 2002), keyword- (Tran, Wang, Rudolph, & Cimiano, 2009), and natural language-based (Pradel, 2012) interfaces. A recent movement towards efficient graph data searching is to adopt graph queries (e.g., Zhang, Zhang, Tang, Rao, & Tang, 2010; Han, Finin, & Joshi, 2012). Graph queries are a way of searching graph data by taking a graph pattern with a few constraints over nodes and edges as input, which is a natural fit to graph data (He and Singh, 2008). Graph queries are known to convey richer information than other forms of queries and thus improve search performance (e.g., Zhou, Wang, Xiong, Wang, & Yu, 2008). Traditionally, systems with a graph query interface relied on textually represented graph queries (i.e., graph query languages). For example, He and Singh (2008) proposed a graph algebra-based query language to explore graph data, and, likewise, Leser (2005) proposed a pathway query language for biological networks. Even though textually represented graph queries are an effective way of searching graph data, writing these queries requires substantial efforts and causes a hindrance to users (Jayaram, Khan, Li, Yan, & Elmasri, 2015). As an alternative, a few studies proposed searching graph data using visually represented graph queries (e.g. Ceri, Comai, Damiani, & Fraternali, 1999). These visual graph queries are seen as more user-friendly because users do not need to remember the syntax of textual graph queries (Ykhlef & Alqahtani, 2011).
Despite the improved performance of graph queries, searching bibliographic data still faces a critical challenge-the proliferation of data has made it increasingly burdensome to retrieve relevant literature. Major bibliographic search systems provide forms, keywords, and Boolean queries as the main interfaces for searching bibliographic data. A typical search scenario is that users need to go through multiple refining processes after sending the first query; even so, they usually end up in getting too many search results than can be absorbed. Therefore, it is imperative to enable queries to covey more explicit information and represent more complicated information needs to only return the most pertinent search results. Aforementioned search user interfaces have limitations in representing such precise information needs. For example, they cannot directly represent queries such as “papers on information retrieval, which were cited by John’s papers that had been presented in SIGIR”. This type of queries, on the other hand, can be easily represented by visual graph queries with a set of nodes with constraints and links. Visual graph query interfaces are thus seen as a reliable solution for users to express precise and explicit information needs and receive more relevant search results. With this motivation in mind, this study aims to propose a visual graph query interface for bibliographic information retrieval. Specifically, this study aims to address the following research questions:
- •
How to design and implement a visual graph query interface to search bibliographic data?
- •
What features does a visual graph query interface need to have in order to improve bibliographic data retrieval? And
- •
How to integrate a visual graph query interface with back-end databases to build a streamlined system?
The work is built on our previous work (Zhu et al., 2016), in which, we proposed a framework for graph-based bibliographic information retrieval. In the present work, we focus on visual graph query formulation and processing while using the same graph schema proposed in our previous work.
Section snippets
Literature review
Earlier studies on visual graph queries were carried out by taking a specific data structure—XML-in mind (e.g., Ceri et al., 1999, Erwig, 2003, Ni and Ling, 2003, Ykhlef and Alqahtani, 2009). These studies proposed visual graph queries for querying and restructuring XML data. As XML data are quite complex with multiple nested structures, visual graph queries are seen as an efficient solution. Because the main goal of these studies was to build efficient languages of visual graph queries by
Bibliographic graph queries
Bibliographic data are by nature a directed graph with nodes and links. For example, a link named “WRITES” is a directed link, in which the source is an “Author” and the target is a “Paper”. There are a variety of ways to model bibliographic data using graphs. The one shown in Fig. 1 shows a typical schema of bibliographic data with five bibliographic entities. In the schema, “Source” denotes a journal or a conference in which authors publish or present papers. “Term” denotes a keyword, a
Methods
In this section, we discuss the methods of processing visual graph queries. First, we present the designed system architecture and show the process flow and interconnected components. Then, we introduce and discuss with example queries the components of visual graph queries.
The designed system
We implemented a web-based bibliographic information retrieval system with a visual graph interface. The system is based on the Spring Framework and the interface (i.e., front end) is implemented using JavaScript libraries (jQuery and D3.js). We used Neo4 j (a graph database) to build the database layer. The example dataset used in the system is provided by Tang et al. (2008), which contains 629,814 papers, 595,775 authors, 12,609 sources, 291,109 terms, and 1000 organizations. Detailed
Discussion
While the proposed visual graph query interface is not specifically designed for a certain group of users, as shown in the previous section, it will be beneficial to bibliometricians and researchers who have complex bibliographic information needs. Simple bibliographic queries might be easily formulated in the form-based systems. For instance, common bibliometric tasks such as bibliographic coupling and author co-citation can be succinctly accomplished by the proposed interface while simple
Conclusion
In this paper, we proposed a visual graph query interface for bibliographic information retrieval. We first introduced a visual graph query interface, through which, users can formulate bibliographic queries by drawing nodes and links. We introduced methods of interpreting and translating graph queries into relational and graph database queries and implemented a web-based bibliographic information retrieval system with a visual graph query interface. We designed and achieved several novel
Author contributions
Conceived and designed the analysis: Yongjun Zhu and Erjia Yan.
Collected the data: Yongjun Zhu and Erjia Yan.
Contributed data or analysis tools: Yongjun Zhu and Erjia Yan.
Performed the analysis: Yongjun Zhu and Erjia Yan.
Wrote the paper: Yongjun Zhu and Erjia Yan.
Other contribution: Yongjun Zhu and Erjia Yan.
Acknowledgements
This project was made possible in part by the Institute of Museum and Library Services (Grant Award Number: RE-07-15-0060-15), for the project titled “Building an entity-based research framework to enhance digital services on knowledge discovery and delivery”.
References (30)
- et al.
XML-GL: A graphical language for querying and restructuring XML documents
Computer Networks
(1999) Xing: A visual XML query language
Journal of Visual Languages and Computing
(2003)- et al.
A survey of graphical query languages for XML data
Journal of King Saud University – Computer and Information Sciences
(2011) - et al.
Managing and mining graph data
(2010) - et al.
A model for automatic generation of multi-partite graphs from arbitrary data
(2010) Mining graph data
- et al.
BIBEX: A bibliographic exploration tool based on the DEX graph query engine
Proceedings of the 11th international conference on Extending database technology: Advances in database technology
(2008) - et al.
Graphgrep: A fast and universal method for querying graphs
- et al.
GoRelations: an intuitive query system for DBpedia
The semantic web
(2012) - et al.
Graphs-at-a-time: Query language and access methods for graph databases
Search user interfaces
RDF-GL: A SPARQL-based graphical query language for RDF
Emergent web intelligence: advanced information retrieval
Performance of graph query languages: Comparison of cypher, gremlin and native access in neo4j
Querying knowledge graphs by example entity tuples
IEEE Transactions on Knowledge and Data Engineering
Gstring: A novel approach for efficient search in graph databases
Cited by (7)
A Knowledge Graph Approach towards Re-structuring of Scientific Articles
2022, Proceedings of the 2022 International Conference on Connected Systems and Intelligence, CSI 2022The linked open bibliographic data and its behavior in the information retrieval
2020, Investigacion BibliotecologicaIndustry 4.0 technologies basic network identification
2019, ScientometricsPaperPoles: Facilitating adaptive visual exploration of scientific publications by citation links
2019, Journal of the Association for Information Science and TechnologyEvaluating interactive bibliographic information retrieval systems: A user-centered approach
2018, Proceedings of the Association for Information Science and Technology