Development and Usability Evaluation of a Prototype Conversational Interface for Biological Information Retrieval via Bioinformatics

Ritzel Paixão-Côrtes, Walter; Stangherlin Machado Paixão-Côrtes, Vanessa; Ellwanger, Cristiane; Norberto de Souza, Osmar

doi:10.1007/978-3-030-22660-2_43

Development and Usability Evaluation of a Prototype Conversational Interface for Biological Information Retrieval via Bioinformatics

Walter Ritzel Paixão-Côrtes¹⁰,
Vanessa Stangherlin Machado Paixão-Côrtes¹⁰,
Cristiane Ellwanger¹⁰ &
…
Osmar Norberto de Souza¹⁰

Conference paper
First Online: 28 June 2019

1543 Accesses
3 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11569))

Abstract

Bioinformatics is an interdisciplinary area strongly driven by the advancement of technology and data generation. It applies concepts and methodologies derived from computer science, engineering, and statistics to the study of problems in the medical and biological areas. Bioinformatics users need to deal not only with large amounts of data but also with numerous resources to retrieve them. Furthermore, bioinformatics tools are known for their lack of usability. In this paper, we propose a service-oriented architecture-conversational interface called Maggie, which serves as the gateway to access and retrieve diverse sets of biological information based on BioCatalogue. We also discuss the usability evaluation of Maggie’s prototype. Overall, the results show that a conversational interface has a great potential to become a valuable and productive tool to retrieve biological information from multiple sources if it is capable of understanding the context of the dialog maintained with the user.

You have full access to this open access chapter, Download conference paper PDF

1 Introduction

Bioinformatics is an interdisciplinary and multidisciplinary area strongly driven by the advancement of technology and data generation. It applies concepts and methodologies derived from computer science, engineering, and statistics to the study of problems in the medical and biological areas [1]. Bioinformatics develops and applies computational algorithms and tools for the execution of costly and complex processes, using computers to store, organize, analyze and visualize large complex sets of biological data such as DNA, RNA, and proteins sequences and three-dimensional structures [2].

Studies [3, 4] point out that bioinformatics is a complex field of study because it demands a series of skills such as ability to manage, interpret and analyze large data sets; extensive knowledge of data analysis methodologies - specialization in bioinformatics most common software packages and algorithms - and familiarity with genetic and genomic data [5]. For example, it is estimated that only in GenBank [6], a database of DNA sequences, whose number of nitrogenous bases doubles every 18 months, are stored approximately 10 trillions of deposited bases. This enormous amount of data and its inherent complexity generates challenging usability [7] and accessibility [8] problems for its analysis [9]. Bolchini et al. [10] state that usability barriers consume researchers’ time with tasks that are not aligned with their ultimate goal.

There has been an increase in the search for methods and techniques that can make this data less complicated and easily accessible [11, 12]. An example is the study of Shaer et al. [13], which reports an initiative to reduce the complexity of the data, using an interactive visual interface. While visual interfaces are great at helping to understand the abstraction used to represent the data, it still creates some restrictions to people not familiarized with the abstraction in question (students).

Natural language interfaces to databases have been proposed by many authors as the solution to retrieve information easily, allowing their use by people who are not experts in computer programming languages, thus unlinking the formulation of a query from the specific programming language knowledge required to write it [10]. Although this type of interface is known since the ‘60s [14], just recently they started to gain traction through its use in business intelligence, which provided good results. However, its configuration process is a nontrivial issue. Moreover, the efficient exploration of biological data is limited also by the accessibility and usability difficulties of bioinformatics tools.

This article presents a conversational interface called Maggie for natural and easy retrieval of biological data by bioinformatics users. Maggie is based on a service-oriented architecture (SOA), and the name is a homage to Dr. Margareth Oakley Dayhoff (1925–1983), a bioinformatics pioneer. For this, we studied a specific type of conversational interface, the chatbot, which connects to biological databases through an orchestration service (via BioCatalogue [15] service mapping), simply and intuitively. The presentation of the information in an integrated and automated form in a single interface is important, as it does not require tedious, low-level coding.

We performed usability assessments with 8 area specialists who compared the performance of tasks in standard interfaces such as GenBank [6], PDB (Protein Data Bank) [16], and PubMed [17] and the Maggie interface presented in this study. The tasks were selected within a defined usage context and represent a workflow commonly performed by bioinformatics users.

2 Background

2.1 Bioinformatics

Bioinformatics conceptualizes biology in terms of molecules in the physical-chemical sense and, using computational techniques, statistics, mathematics, and computer science, allows us to store, organize and understand, on a large scale, this biological data explosion [1]. The use of computer programs to analyze DNA, RNA, and protein information has become critical to life science research and development [12].

Early bioinformatics activities included building databases. Knowledge of the structure and content of primary databases, such as NCBI, is essential as much as the ability to manipulate and process large amounts of data. As the “omics”^{Footnote 1} produce more data with the next generation sequencing (NGS) techniques, a more significant amount of new data is expected [18].

Several factors contribute to the explosive growth of data: more affordable hardware, new computational concepts and better software, just to name a few. The problem has become more visible with the spread of the internet, where individuals, companies, governments, non-governmental institutions are producing potential content, turning the world into a massive database that is updated by thousands of people. In the field of bioinformatics, this is no different, because data are generated by scientists and researchers when doing experiments and accessing computational services [19].

Bioinformatics is in full growth. Its most significant challenge is the management of vast amounts of data generated daily from the workbenches of researchers worldwide. The most advanced algorithms then process these data, analyze and visualize them from different perspectives. Finally, the knowledge generated is published for the benefit not only of the scientific community but also of the entire world population (through advances in medicine, the creation of new drugs) [9].

The bioinformatics databases emerged in response to a need - where to store experimental data - and their concept was thought through and executed rudimentarily (through punch cards) even before the technology that made it possible existed. Some databases even started as printed publications [20], serving the purpose of making information public. They then became modest technologies such as a simple backend to support some applications and quickly developed to the current level: complete platforms for global sharing of data generated by researchers, an ecosystem of tools that help in the tasks of analysis, processing, query, and visualization of the information.

Thanks to the technological evolution, which has brought us the electronic storage capacity, and to the database technology itself, they now occupy a prominent place among bioinformatics tools, since they are the standard (together with the internet) found to share information with the entire scientific community. The journal Nucleic Acids Research annually publishes a Database Issue^{Footnote 2}, which presents to the scientific community new databases created to share this biological information. Only in 2018 were published 82 articles on new databases and 84 articles that feature updates that appeared in previous editions [21].

With the growing number of databases, the question of how to access these banks and how to integrate data becomes more relevant. The first response came in the form of specialization of bioinformatics databases. While the databases responsible for storing protein and DNA sequences or entire genomes are classified as primary databases because they have raw information coming from the bench, there are banks of data classified as secondary and tertiary, which deliver more enriched and specific information resulting from workflows (scientific workflows) that process information from various sources [22]. The most recent response takes advantage of these databases being made available on the internet and providing access to data through web services. These are the service-oriented architectures (SOA), which allows researchers to integrate information from different sources through the composition of web services [23].

2.2 Service Oriented Architecture

Service-oriented architecture (SOA) is the new generation of distributed computing platforms, with its own paradigms and principles [24]. The SOA is built from the technologies that preceded it (like the client-server architecture or CORBA) and added new layers of design, governance considerations, and best-in-class technologies for implementation. The purpose of this type of architecture is to enable reuse of small portions of code (services) in a highly decoupled way, ensuring end users the ability to build more complex solutions from these small pieces.

The first mention of the SOA was made in an article published by the IBM web services development team at developerWorks^{Footnote 3}. Since then it has been of great interest among researchers in the scientific environment and has received a strong acceptance in the software development market. Although the architecture was originated in a web development team, it does not presuppose the use of this technology, being much more comprehensive than this one.

Serman [25] defines SOA as a software architecture concept made up of components available through generic interfaces and standardized and preferably license-free protocols (services), designed to achieve the least possible level of dependence on information systems that consume them and the technical part of the development, stimulating its reuse and taking advantage of the existing functionalities. In bioinformatics, SOA proposals focus on standardization of interfaces, query, validation and analysis of biological data, and integration of new tools [26,27,28]. These works will be discussed in Sect. 3.

2.3 Natural Language Processing and Databases

For conversational interfaces to be able to understand human beings, they need what is now called the language understanding engine (LUE). The most well-known LUE is ALICE, based on AIML (Artificial Intelligence Markup Language) [29]. During our literature research on conversational interfaces, we observed that the interfaces mentioned in the articles used the same engine or a variation of it, and only 1 used a more modern engine (Telegram Bot) [30].

But this is not the only option: Google and Microsoft have made their LUE, called DialogFlow and LUIS (Language Understanding Internet Services), respectively, available as services in the cloud. These engines have some advantages over AIML, such as recognition of entities (variables, subjects, verbs) in sentences, a vast base of predefined entities, and greater ease of configuration. Its major disadvantage is that they are not free. During this research, we performed a comparative study of several natural language comprehension engines, to select the one that best fit our architectural proposal. Also, the integration of engines into a service-oriented architecture requires that the characteristics of the other components that compose it and their behavior be observed together.

Conversational interfaces are often referenced as bots or chatbots. A chatbot is a software that interacts with a user through natural language dialogues, such as in English [31]. This technology began in the ‘60s with the objective of verifying that the chatbots could pass for real humans in dialogues between users [32]. Thus, software ELIZA [33] appeared, the first program for natural language processing that simulated the conversation between man and machine, with human characteristics similar to feelings.

Since then, chatbots development technologies have evolved, increasing their ability to dialogue with humans. Starting with an algorithm that analyzed keywords and returned preprogrammed responses [33], evolving to natural language processing via semantic mapping, using ontologies [34] to the machine learning application such as Recurrent Neural Networks (RNN) [35].

However, chatbots are not built to mimic human conversations and entertain users. They are used in different domains, such as customer service, information retrieval, e-commerce, contextual help for websites, and education [31, 32]. There are personal assistants like Siri, Google Assistant, and Cortana who virtually help their users perform daily tasks on computers and smartphones through voice commands.

The analysis and visualization of biological data through a suitable interface, facilitated by the availability of data to scientists, is a central factor for the understanding of biological research, possessing, in addition to the vast production of information, a technological gap, with a view to obtain the meaning and value of these data [36]. We believe that conversational interfaces are an emergent feature that can provide improved user interaction in bioinformatics systems.

2.4 Usability in Bioinformatics Tools

The technical characteristics of bioinformatics tools, coupled with the importance they exert for the daily activities of researchers in the field, requires alternative ways of thinking about the aspects of their use and of better taking advantage of their results.

Usability is a feature that aims to ensure that interactive systems are easy to learn to use and remember, effective, efficient, safe and enjoyable from a user’s perspective [37]. Javahery et al. [38] report that usability is a key feature for bioinformatics tools because the biomedical research community involves high-cost scientific personnel, laboratory experimentation to generate bioanalytical data, and techniques to analyze this data.

Researchers in the area take the problems related to the usability of web applications in the context of bioinformatics under different approaches [10, 39, 40]. Some focus their attention on the context of use and the way these applications are used, relying on user tests as they interact with them [8, 41, 42]. In another approach, experts rely on inspection techniques on the usability problems of these applications and on how to best solve them [8]. For example, Bolchini et al. [10] reports that usability problems compromise scientists’ ability to find the information they need for their daily research activities. Even if we consider that the applications available on the web have certain advantages when compared with their desktop versions, issues like the incompatibilities between browsers are counterproductive, often preventing the creation of custom interaction components [43].

Therefore, providing resources with good usability or ensuring the usability improvement of bioinformatics resources allows researchers to find, interact, share, compare and handle important information more effectively and efficiently. Hence, they gain improved insight into biological processes with the ultimate potential to produce new scientific results [20].

The usability of a system is a qualitative metric that depends on two factors: the combination of the system interfaces and the ability of its users to pursue specific objectives for certain tasks [43]. Systems and their interfaces must be cognitive tools that facilitate perception, reasoning, memorization, and decision making [44].

3 Related Work

There are papers that describes methods to facilitate the integration of bioinformatics tools and access to various sources of biological data, without requiring the researcher to have advanced knowledge in bioinformatics. Lemos [45] addresses the construction of systems (SGABios) that facilitate the bio sequencing analysis phase and approaches the use of workflows to compose bioinformatics processes. She proposed a framework that decomposes an SGAB into two sub-systems: a bioinformatics workflow management system, which helps researchers in the definition, validation, optimization and execution of workflows needed to perform the analyzes; and a data management system in bioinformatics, which deals with the storage and manipulation of the data involved in these analyzes. Galaxy [26] is a framework that stands out for presenting features such as unified interface, validation of data types, possibility of integration of new tools, development of pipelines graphically and extensive documentation. CelOWS [27] is an SOA for the storage, reuse and composition and execution of biological models, expressed in CellML, and its representation through an ontology, which provides a natural language for the semantic description of biological models. BioGraphBot [28] is a conversational interface based on the ALICE framework for access to a bioinformatics graph database, BioGraphDB, with the Gremlin query language. It allows the translation of queries expressed in natural language to queries expressed in Gremlin, simplifying the interaction with BioGraphDB. The authors mention that the chatbot was incorporated into the BioGraphDB Web interface, but it was not possible to verify the availability of the chat.

The differences between the Maggie architecture and the related works mentioned above are in the scope and degree of evolution. In the scope aspect, it differs from the SGABio, CelOWS and BioGraphBot architectures due to their restricted availability of specific topics: bio sequencing, biological models and graph databases. Maggie proposes to be more open and to provide a comprehensive range of services. To do so, instead of pre-selecting services, we are proposing to make the discovery and mapping of services based on a catalog (BioCatalogue).

Regarding the degree of evolution when comparing Maggie to Galaxy, the main difference is that our architecture was developed to be continually evolving in the sense of learning to use new services, which results in new knowledge. In addition, although Galaxy offers a considerable set of pre-installed tools, it requires the user to develop scripts/interfaces for new programs to be integrated into the framework [26]. Another point to consider is that Galaxy provides the structure to perform the orchestration of services, but not the services themselves.

Also, it is important to note that while Maggie uses natural language to provide the resources and assist users in the search for information, the Galaxy interface requires specific programming knowledge to enable the services to be used.

Finally, BioGraphBot has some limitation on scope and technology. The content-related limitation occurs because one can only query the contents in the database. The technological limitation lays in the fact that the database must be of the type that supports graph storage, which requires specific knowledge so that the contents can be formatted appropriately.

The use of a SOA that integrates a conversational interface, using bots, addresses the problems discussed in Sect. 2.4, since the responsibility for capturing and delivering information is left to the search engine itself.

4 Materials and Methods

Based on the scientific literature presented in Sect. 2, bioinformatics scientists have a variety of databases and services available to create their own research workflows. In fact, there are so many available, that it can be difficult for users to select the most suitable ones. With that in mind, we chose BioCatalogue^{Footnote 4} [15] as our source of services metadata.

The next goal was to learn about tools that could use this catalog. The research has identified the tools mentioned in our related work section: Galaxy, CelOWS, SGABIO and BiographBOT. We experimented with them to identify their main features to build a more flexible solution that could be easy to use and that could incorporate more data and services over time, with minimum user intervention.

For the data collection, with the application of a questionnaire^{Footnote 5}, we performed usability tests with 8 HCI and bioinformatics experts. The objective was to collect their opinion on the performance of bioinformatics tasks employing standard interfaces and Maggie to identify benefits, problems, and new requirements to further the development of Maggie.

The USE-based [46] questionnaire was composed of 8 closed questions: 4 on the profile (age, gender and academic background), and 4 on the Likert 5-point scale. It covers aspects of usefulness, ease of use, ease of learning and user satisfaction. Also, we have included 3 open questions: (1) In your experience, what other tasks do you believe could be performed by Maggie?, (2) In which scenario would you use Maggie? Please describe (e.g. it is more suited for executing specific tasks or as classroom resource), and (3) Please provide suggestions and list the main positive/negative points of Maggie.

In the usability evaluation, the participants performed 2 tasks using both the standard interfaces and Maggie. Table 1 summarizes the integrated tools, links and the activities performed for the tasks.

Table 1. List of tools and activities executed by respondents of the usability evaluation questionnaire.

Full size table

In Sect. 6 we will discuss the results of the data collection and what we have learned to apply in further development efforts of Maggie architecture.

5 Approach

Here we will describe the architectural and implementation levels of the Maggie conversational interface.

5.1 Architecture

Our proposed architecture is composed of 5 built and 2 external components distributed in 3 layers: front-end (F), middleware (M) and back-end (B), as seen in (Fig. 1). The external components are the Language Understanding Engine and the BioCatalogue. The details of each component are as follows and can be visualized in (Fig. 1):

Conversational Interface (F): It will be the entry point for users. This interface should allow the user to request biological data through typing or talking natural language sentences and receive the answers. It can be implemented in different formats: as a chatbot or as a search interface or integrated with virtual assistants such as Google Assistant, Siri or Cortana;
Language Understanding Engine (LUE) (M): Used by the conversational interface, it will be responsible for translating a natural language sentence into the intention that represents it, along with the entities involved. This is one of the external components chosen to take advantage of the machine learning capabilities to interpret the conversation. Right now, based on criteria evaluation, we have opted for the use of MS LUIS;
BioCatalogue (B): It is a catalogue of life science web services maintained by EMBL-EBI and it will be used as the source of our endpoints that will compose our services mapping layer;
BioCatalogue Endpoint Extractor (M): The endpoint extractor will navigate the BioCatalogue list of services to generate the mapping metadata that will be required for the query language and transcompilation process to work properly. Here is where Maggie will be different of the solutions cited on the related work: we will navigate through the catalogue and generate, based on their documentation, a representation of each web service, that will be then stored in our service map database and mapped against the intents and entities identified by the Language Understanding Engine;
BioCatalogue Service Map (B): This will be the database that will store the mapping data (extracted by the endpoint extractor) for use by the query language and the transcompilation service;
Transcompilation Service (M): This service will be the heart of Maggie, as it will receive the LUE output and translate it into the query language. We are using a semantic mapping table to do the mapping between intent and resources, as well as to transform the entities into parameters to query the resources;
Orchestration Service (M): The orchestration service will be the query language processor and will make calls to the required services according to the mapping metadata. It will also decide which resources to use and which order should be called.

5.2 Implementation

The implementation of the proposed architecture was done using the following technologies:

Front-end: HTML5/CSS3 using frameworks Bootstrap 3 and AngularJS;
Middleware: Python 3.6, using framework Flask 0.10.2;
Application Container: Gunicorn server 19.9.0;
Language Understanding Engine: Microsoft LUIS with Programmatic API v2.0;
Back-end: Python 3.6 using SQLAlchemy and Marshmallow, with a SQLite database

Since we were creating the prototype of the architecture, we did not have split the solution into individual services: instead, we have used the concept of blueprints in Flask to create individual subsets of endpoints within the same worker thread. This decision was taken to make it easier for the coding/testing approach. The final version of the architecture will split the blueprints into different services, as proposed in the architecture.

We will detail below how the 3 main components of our architecture are implemented: the language understanding engine, the transcompilation service, and the conversational interface.

Language Understanding Engine (LUE)

This component was implemented using Microsoft LUIS. In LUIS, we need to define 3 main concepts: the intents, the utterances and the entities. Those 3 are related in a way that when configured, a machine learning algorithm can receive a sentence in natural language and can classify it against the intent with identification of the entities involved.

The intent is a class of utterances and is the label that is returned from the LUE. The utterances are examples of sentences, marked with special notation to identify entities. For example, a Greeting intent could have utterances for the many ways we can greet someone: Hi, Hello, Howdy, etc. Entities are the variables or parameters that could appear on utterances and that we need to receive back so we can decide what action to execute. For example, we have an entity for PDB ID (the PDB unique identifier). So, in every intent that have utterances that contain words that represents a PDB ID, it will be tagged automatically.

After all the 3 elements were identified and configured, we trained the model and made it available as a REST service to be used from the middleware. When training the model, we aim the maximization of the prediction precision – the accuracy in classify an utterance sent by the user against the intents. Because of this, we had to find a way to generate as many examples as possible, without them being manually inputted: we have developed a script based in an Excel file that generates the MS LUIS configuration based on some patterns. Figure 2 shows an example of one intent worksheet with the list of the utterances and entities identified. The script merges the intent worksheet with the entities worksheet, generating for each line in the intent worksheet several utterances based on the entities samples.

Transcompilation Service

The transcompilation service was implemented as a REST API and implements a semantic mapping table that receives the result of the LUE - intent and entities and, based on this mapping, indicates what kind of execution should be done. For example, when it receives the GetPdbFileIntent and a PDB ID entity (Fig. 3), the semantic table indicates that it should be transcompiled into a call to _return_pdb_file_url function, using the PDB ID as the parameter. The result is then passed to the front-end.

Conversational Interface

The conversational interface was implemented as a single page app that represents a chat window (Fig. 4). The user can enter the sentences in the text box and the bot will answer back. The text box keeps the history of the current conversation, so the user can repeat sentences without having to type them again.

6 Usability Evaluation

As mentioned in Sect. 4, we have requested 8 specialists in HCI and/or bioinformatics to execute 2 tasks both on the traditional way (using browser and other tools if needed) and using Maggie, the conversational interface. Afterward, they answered a questionnaire. In this section, we analyze and discuss their answers. We have divided our analysis in 2 subsections. In the respondents’ background, we detailed their demographics to understand how much experience they have in bioinformatics. In the usability aspects, we will describe their impressions about the use of Maggie to fulfill the tasks requested.

6.1 Respondents’ Background

The majority (75%) of our respondents were in the age range of 25 to 35 years (Fig. 5, left), with a few (12.5%) under and above that range. This reflects in their educational background – most of them (62.5%) had at least a master’s degree (Fig. 5, right) and an overall experience with bioinformatics averaging approximately 5.7 years (Fig. 5, bottom). Only one of them had less than one year of experience.

6.2 Usability Aspects

The usability was analyzed according to 4 aspects: usefulness, ease of use, ease of learning and user satisfaction. For each one, we presented the respondents with statements and they should answer using a Likert 5-point scale ranging from Totally Disagree (1) to Totally Agree (5).

As we can see in (Fig. 6), the trend stablished for Maggie is favorable in all usability aspects. We will see the details of each one in the subsections below.

Usefulness

This aspect helps us to understand whether the conversational interface will be useful to the point of being used continually. Most answers were favorable and neutral (Fig. 7), indicating that overall the respondents saw Maggie as useful for them. Moreover, they stated that Maggie can reduce the number of steps to accomplish the tasks, thus making it faster to obtain the desired information. However, they were not certain about the skills required to use it. This concern makes sense as conversational interfaces are not common for the type of tasks they were executing.

Ease of Use

This aspect evaluates if Maggie is easy to use. The results again show a favorable trend, but it is important to highlight the statement about the written instructions: Maggie has a help function that is activated when one types “help me” or “help”, but there is no indication that it is available. The respondents wrote that it was difficult to discover the help function and that example for its use were missing (Fig. 8).

Ease of Learning

This aspect is about how easy it is to learn to use Maggie. Figure 9 shows that most users do not greet Maggie as an intuitive tool, even though they consider easy to remember how to use it. Conversational interfaces are simple and one only needs to talk with Maggie, so that explains why it is easy to remember. Maggie’s current difficulties seem to be related to the way it starts and keeps the dialog.

Satisfaction

This feature measures the satisfaction of the respondents towards Maggie. Despite the issues related to the ease of use and ease of learning, the overall experience using Maggie was considered positive. However, Fig. 10 shows that about 38% of the respondents had expectations that were not met. This is not surprising given that they tested Maggie’s first version.

7 Discussion

The questionnaire was enlightening and confirmed that the Maggie conversational interface can be helpful in bioinformatics. It allows easy retrieval of biological information from different data sources. Here, we will discuss about the positive and negative opinions informed by the respondents.

When asked about how they would use Maggie, most of them answered that they would use Maggie in a classroom context and in scientific workflows. This is aligned with their perceptions on usability aspects. They found the interface is simple enough to be used in multiple ways.

They also suggested contents to be available through the Maggie conversational interface. Among them, it features sequence alignment, homology detection, access to drug-like small molecule database, and visualization of 3D structures of proteins. These type of activities indicates that the respondents have perceived the potential to integrate Maggie to their day-to-day activities, which is positive and very encouraging.

Contrarily, most of the respondents indicated that although they can achieve their objectives interacting with Maggie, it is laborious to start due to the lack of contextual help. This is related to one key feature that is missing from the Maggie’s prototype: a dialog flow. Currently, the user types structured commands and Maggie answers back. There is yet no dialog, no humor, and no context. Maggie cannot detect if the respondent is having difficulties to interact with her and thus cannot provide an alternative. A tutorial, as mentioned by the respondents, may be a way to mitigate temporarily the lack of context in the conversation.

8 Conclusion

Conversational interfaces could be a good way to create interaction channels between users and the content they need in a way as simple as writing a text. In fact, this type of interface has being used in many areas, from consumer service to virtual classrooms and lately, as an analytical interface, it is allowing acquisition of business insights from large datasets. From a development perspective, conversational interfaces may be an easy system to create, but its complexities are not in sight. It is necessary to have a proper dialog flow, which will offer contextual help, engage, and guide the user if needed. To address these issues appropriately, it is necessary to acknowledge usability characteristics that are not related to graphical interfaces, such as intelligent interpretation, contextualization, attitude, and vocabulary, to name a few.

The results obtained from the usability evaluation were encouraging and have corroborated our assumption that a conversational interface could aggregate values to the way bioinformatics users retrieve biological information. Moreover, Maggie’s prototype has revealed itself as a suitable example of both the upside and downside of the application of conversational interfaces to a non-business specific domain such as bioinformatics usage. On the upside, Maggie engaged the respondents, allowed faster and easier execution of the tasks, instigated their curiosity on how to use it in other contexts, pointing to possible new information that could be available through it. On the downside, because Maggie currently lacks an intuitive help function and the ability to understand the context of a conversation, some respondents found it unsatisfying.

With the lessons learned during this work, we will continue to develop this SOA-conversational interface to make it serviceable to bioinformatics users in their scientific workflows and classrooms. The next steps are: (i) implement the endpoint extractor and the service mapping metadata database, so more services will be made available by Maggie; (ii) implement the transcompilation service to improve the intelligent interpretation by Maggie, and (iii) implement the context management to improve the contextualization and attitude features. We are also going to perform more evaluations, including an undergraduate biological sciences’ classroom, to analyze how Maggie assisted them in learning and using bioinformatics tools.

Notes

1.
omics - neologism used to informally reference the fields of bioinformatics study: genomics, proteomics, transcriptomics, metabolomics.
2.
NAR Database Issue - https://academic.oup.com/nar/issue/46/12.
3.
IBM developerWorks - http://www.ibm.com/developerWorks.
4.
BioCatalogue - https://www.biocatalogue.org/.
5.
Questionnaire - https://goo.gl/forms/Uq4ta9kOYnWpIkvv2.

References

Luscombe, N.M., Greenbaum, D., Gerstein, M.: What is bioinformatics? A proposed definition and overview of the field. Methods Inf. Med. 40, 346–358 (2001)
Article Google Scholar
Lesk, A.M.: Introduction to Bioinformatics. Oxford University Press, Oxford (2014)
MATH Google Scholar
Neves, R.: Uma Arquitetura de Agentes para Recomendação Contextualizada de Eventos Baseada em Propagação da Ativação (2013)
Google Scholar
Madlung, A.: Assessing an effective undergraduate module teaching applied bioinformatics to biology students. PLoS Comput. Biol. 14, e1005872 (2018)
Article Google Scholar
Welch, L., et al.: Bioinformatics curriculum guidelines: toward a definition of core competencies. PLoS Comput. Biol. 10, 1–10 (2014)
Article Google Scholar
Benson, D.A., et al.: GenBank. Nucleic Acids Res. 46, D41–D47 (2018)
Article Google Scholar
Al-Ageel, N., Al-Wabil, A., Badr, G., AlOmar, N.: Human factors in the design and evaluation of bioinformatics tools. Procedia Manuf. 3, 2003–2010 (2015)
Article Google Scholar
Paixão-Côrtes, V.S.M., Paixão-Côrtes, W.R., de Borba Campos, M., de Souza, O.N.: A panorama on selection and use of bioinformatics tools in the Brazilian University context. In: Antona, M., Stephanidis, C. (eds.) UAHCI 2018. LNCS, vol. 10908, pp. 553–573. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-92052-8_44
Chapter Google Scholar
Reichhardt, T.: It’s sink or swim as a tidal wave of data approaches. Nature 399, 517–520 (1999)
Article Google Scholar
Bolchini, D., Finkelstein, A., Perrone, V., Nagl, S.: Better bioinformatics through usability analysis. Bioinformatics 25, 406–412 (2009)
Article Google Scholar
Junior, H.L.R., de Oliveira, R.T.G., Ceccatto, V.M.: Bioinformática como recurso pedagógico para o curso de ciências biológicas na Universidade Estadual do Ceará. Acta Scientiarum Education 34, 129–140 (2012)
Google Scholar
Cattley, S., Arthur, J.W.: BioManager: the use of a bioinformatics web application as a teaching tool in undergraduate bioinformatics training. Brief. Bioinform. 8, 457–465 (2007)
Article Google Scholar
Shaer, O., Kol, G., Strait, M., Fan, C., Grevet, C., Elfenbein, S.: G-nome surfer. In: Proceedings of the 28th International Conference on Human Factors in Computing Systems – CHI 2010, p. 1427. ACM Press, New York (2010)
Google Scholar
Nihalani, N., Motwani, M., Silaka, S.: Natural language interface to database using semantic matching. Int. J. Comput. Appl. 31, 29–34 (2011)
Google Scholar
Bhagat, J., et al.: BioCatalogue: a universal catalogue of web services for the life sciences. Nucleic Acids Res. 38, W689–W694 (2010)
Article Google Scholar
Rose, P.W., et al.: The RCSB Protein Data Bank: views of structural biology for basic and applied research and education. Nucleic Acids Res. 43, D345–D356 (2015)
Article Google Scholar
Information, N.C. for B., Pike, U.S.N.L. of M. 8600 R., MD, B., Usa, 20894: PubMed Help. National Center for Biotechnology Information (US) (2016)
Google Scholar
Tan, T.W., Lim, S.J., Khan, A.M., Ranganathan, S.: A proposed minimum skill set for university graduates to meet the informatics needs and challenges of the “–omics” era. BMC Genom. 10, S36 (2009)
Article Google Scholar
de Carvalho, A.C.P.L., de Leon, F.D.P.: Grandes desafios da pesquisa em computação no brasil–2006–2016. Sociedade Brasileira de Computação, São Paulo (2006)
Google Scholar
Dayhoff, M.O. (ed.): Atlas of Protein Sequence and Structure. National Biomedical Research, Washington, D.C. (1979)
Google Scholar
Rigden, D.J., Fernandez, X.M.: The 2018 Nucleic Acids Research database issue and the online molecular biology database collection. Nucleic Acids Res. 46, D1–D7 (2018)
Article Google Scholar
Chen, C., Huang, H., Wu, C.H.: Protein bioinformatics databases and resources. In: Wu, C.H., Chen, C. (eds.) Bioinformatics for Comparative Proteomics, pp. 3–24. Humana Press (2011)
Google Scholar
Cannata, N., Schröder, M., Marangoni, R., Romano, P.: A Semantic Web for bioinformatics: goals, tools, systems, applications. BMC Bioinform. 9, S1 (2008)
Article Google Scholar
Erl, T. (ed.): SOA with REST: Principles, Patterns & Constraints for Building Enterprise Solutions with REST. Prentice Hall, Upper Saddle River (2012)
Google Scholar
Serman, D.V.: Orientação a projetos: uma proposta de desenvolvimento de uma arquitetura orientada a serviços. JISTEM-J. Inf. Syst. Technol. Manag. 7, 619–638 (2010)
Google Scholar
Afgan, E., et al.: The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update. Nucleic Acids Res. 44, W3–W10 (2018)
Article Google Scholar
Matos, E.E., Campos, F., Braga, R.M.M.: CelOWS: uma arquitetura orientada a serviços para definição, pesquisa e reuso de modelos biológicos. In: SBCARS, pp. 107–120 (2008)
Google Scholar
Messina, A., Augello, A., Pilato, G., Rizzo, R.: BioGraphBot: a conversational assistant for bioinformatics graph databases. In: Barolli, L., Enokido, T. (eds.) IMIS 2017. AISC, vol. 612, pp. 135–146. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-61542-4_12
Chapter Google Scholar
AIML: An Introduction Pandorabots Documentation. https://pandorabots.com/docs/aiml/aiml-basics.html
Morais, C.G., Gomes, A.F., Leite, J.N. de F., Kléber, K. de A., Barbalho, S.T.J.: Donuts: um bot como instrumento facilitador do processo de ensino-aprendizagem na disciplina Construção de Algoritmos. Revista Eletrônica Argentina-Brasil de Tecnologias da Informação e da Comunicação 1 (2017)
Google Scholar
Shawar, B.A., Atwell, E.: Different measurements metrics to evaluate a chatbot system. In: Proceedings of the Workshop on Bridging the Gap: Academic and Industrial Research in Dialog Technologies (NAACL-HLT-Dialog), pp. 89–96 (2007)
Google Scholar
Shawar, B.A.: A chatbot as a natural web interface to arabic web QA. Int. J. Emerg. Technol. Learn. 6, 37–43 (2011)
Google Scholar
Weizenbaum, J.: ELIZA – a computer program for the study of natural language communication between man and machine. Commun. ACM 9, 36–45 (1966)
Article Google Scholar
Al-Zubaide, H., Issa, A.A.: OntBot: ontology based ChatBot. In: 2011 4th International Symposium on Innovation in Information and Communication Technology, ISIICT 2011, pp. 7–12 (2011)
Google Scholar
Utama, P., et al.: An end-to-end neural natural language interface for databases. Cornell Univ. Libr. 1, 1–12 (2018)
Google Scholar
da Silva, N.M.: RedeR Web: uma plataforma web para organização e análise de redes modulares (2017)
Google Scholar
Preece, J., Rogers, Y., Sharp, H.: Interaction Design: Beyond Human-Computer Interaction. Wiley, Hoboken (2015)
Google Scholar
Javahery, H., Seffah, A., Radhakrishnan, T.: Beyond power: making bioinformatics tools user-centered. Commun. ACM 47, 58–63 (2004)
Article Google Scholar
Mirel, B., Wright, Z.: Heuristic evaluations of bioinformatics tools: a development case. In: Jacko, J.A. (ed.) HCI 2009. LNCS, vol. 5610, pp. 329–338. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-02574-7_37
Chapter Google Scholar
Machado, V.S., Paixão-Cortes, W.R., de Souza, O.N., de Campos, M.B.: Decision-making for interactive systems: a case study for teaching and learning in bioinformatics. In: Zaphiris, P., Ioannou, A. (eds.) LCT 2017. LNCS, vol. 10296, pp. 90–109. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-58515-4_8
Chapter Google Scholar
Paixão-Cortes, V.S.M., dos Santos da Silva Tanus, M., Paixão-Cortes, W.R., de Souza, O.N., de Borba Campos, M., Silveira, M.: Usability as the key factor to the design of a web server for the CReF protein structure predictor: the wCReF. Information. 9, 20 (2018)
Article Google Scholar
Pressman, R.S.: Software Engineering. McGraw Hill, Boston (2011)
MATH Google Scholar
Cybis, W., Betiol, A.H., Faust, R.: Ergonomia e Usabilidade. Novatec (2010)
Google Scholar
Lemos, M.: Desenvolvimento de workflow científico para bioinformática (2004)
Google Scholar
Lund, A.: Measuring usability with the use questionnaire. Usability User Exp. Newsl. STC Usability SIG 8, 3–6 (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

LABIO - School of Technology, Pontifical Catholic University of Rio Grande do Sul (PUCRS), Porto Alegre, Brazil
Walter Ritzel Paixão-Côrtes, Vanessa Stangherlin Machado Paixão-Côrtes, Cristiane Ellwanger & Osmar Norberto de Souza

Authors

Walter Ritzel Paixão-Côrtes
View author publications
You can also search for this author in PubMed Google Scholar
Vanessa Stangherlin Machado Paixão-Côrtes
View author publications
You can also search for this author in PubMed Google Scholar
Cristiane Ellwanger
View author publications
You can also search for this author in PubMed Google Scholar
Osmar Norberto de Souza
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Osmar Norberto de Souza .

Editor information

Editors and Affiliations

Tokyo University of Science, Tokyo, Japan
Sakae Yamamoto
Tokyo City University, Tokyo, Japan
Hirohiko Mori

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ritzel Paixão-Côrtes, W., Stangherlin Machado Paixão-Côrtes, V., Ellwanger, C., Norberto de Souza, O. (2019). Development and Usability Evaluation of a Prototype Conversational Interface for Biological Information Retrieval via Bioinformatics. In: Yamamoto, S., Mori, H. (eds) Human Interface and the Management of Information. Visual Information and Knowledge Management. HCII 2019. Lecture Notes in Computer Science(), vol 11569. Springer, Cham. https://doi.org/10.1007/978-3-030-22660-2_43

Download citation

DOI: https://doi.org/10.1007/978-3-030-22660-2_43
Published: 28 June 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-22659-6
Online ISBN: 978-3-030-22660-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics