Graph based model for information retrieval using a stochastic local search
Introduction
The development of information and communication technologies and the increasing number of sectors of human activity has resulted in the production of an unprecedented volume of information, databases size increases in the few past years from some Gigabytes to a thousand Exabyte [9]. In addition, the progressive interconnection of sites via large computer networks such as Internet, as well as the standardization of access and production techniques of these information (URL, HTML, XML,...) make documents available to any Internet user.
This difficulty of access to information has given rise to several information retrieval tools, with the aim of helping the user to find the relevant information he is looking for. It includes search tools by keywords (search engines), by theme (the thematic search tools), by region (geographic search tools) or by the use of several search engines (meta-engines).
Graphs have become increasingly important in modeling complicated structures and schemaless data such as social networks [1], chemical molecule structures [2] and XML documents [15]. A large number of such databases are available on the Web. Data mining and search methods for structured data are needed for users to quickly identify a small subset of relevant data for further analysis and experiments.
The classical graph query problem can be described as follows: Given a graph database D = {G1, G2, . . .,Gn} and a graph query q, find all the graphs in which q is a subgraph. The core of the problem is the complexity of subgraph isomorphism [5], a sequential scan is very costly since subgraph isomorphism is NP-complete [14].
In this work, we propose a new model based on stochastic local search (SLS) [10] meta-heuristic. The proposed SLS is used to extract from the set of graphs in the database certain subgraphs (Frequent Subgraphs [8]) that are used to build the index. The extraction process is based on the query size and the support of subgraphs. Then, a graph index is built. Finally, for a given subgraph query, all the indexed subgraphs of the query are determined, and the index is looked up with these subgraphs to obtain a candidate set of graphs containing the indexed subgraphs. The concept of query size is introduced to reduce the complexity of index construction. The proposed method for the graph querying problem is evaluated on the CACM collection.
The paper is organized as follows. Section 2 defines the graph query problem. Section 3 describes the proposed method. Section 4 discusses results of an experimental study and finally Section 5 provides conclusions and future works.
Section snippets
Definitions and problem formulation
In this section, we first give some basic definitions and then describe our graph query problem.
Proposed approach
In this paper, we propose an index construction method for the problem of graph query. In the following, we detail our proposed method.
Experimental study
In order to evaluate the performance of our method, we implemented the proposed algorithm in Java and run it on Windows machine i5-4570 3.20 GHz, 4GB of RAM.
The developed algorithm has been tested on 52 queries and the results are compared to the relevant judgments mentioned in qrels.txt file attached to CACM collection. The evaluation of the final result is based on the classical IR metrics (Precision (p) and Recall (r)):
- •
Recall rate: measures the ability of an Information Retrieval System to
Conclusion
In this study, we have shown a presentation of an IR system in a particular way. We proposed an index construction method based on frequent subgraphs and query-size. It consists of applying an SLS method to extract subgraphs that will be used in order to reduce the search space for the Information Retrieval process. Our method develops several advantages as saving running time and eliminating irrelevant information. Experiments on synthetic data show that the developed algorithm provides good
References (15)
- et al.
Temporal and social network based blogging behavior prediction in blogspace
Proc. ICDM
(2007) - et al.
Extraction and search of chemical formulae in text documents on the web
Proc. WWW
(2007) - et al.
Graph-based data mining
IEEE Intell. Syst.
(2000) Graph Theory
(2005)The Graph Isomorphism Problem
Technical Report TR96-20
(1996)- J. Savoy, D. Vrajitoru, Evaluation of learning schemes used in information retrieval,...
- et al.
Query improvement in information retrieval using genetic algorithms: a report on the experiments of the TREC project
Proceedings of the 1st Text Retrieval Conference (TREC-1)
(1993)
Cited by (10)
Document-level relation extraction via graph transformer networks and temporal convolutional networks
2021, Pattern Recognition LettersCitation Excerpt :It can predict whether a relation is included in the given text or not, and which relation class is contained in the given ontology indicated by the text [7]. RE is an important task of information retrieval in natural language processing, which has attracted extensive attentions [8,21,23,24,33,44,45] and can be used for many applications including machine reading comprehension [4,30], question answering [32], and text generation [20]. Most existing studies for RE focused on extracting entity relationships from a single input sentence and have made great progress in improving the inference capability and anti-noise ability [21,44,46].
DeepCADRME: A deep neural model for complex adverse drug reaction mentions extraction
2021, Pattern Recognition LettersCitation Excerpt :The task of extracting ADR mentions can be considered as a biomedical named entity recognition (BNER) problem [14]. The BNER has shown a growing interest in many text mining applications such as information retrieval [8] and question answering [22–26]. The detection of ADR mentions is one of the most important task of ADR systems as the overall performance of such systems is heavily depending on the effectiveness of the integrated ADR mentions extraction system: if an ADR mentions extraction system fails to identify ADR mentions, further processing steps to extract potential relationships between them will inevitably fail too.
Folksonomy-based user profile enrichment using clustering and community recommended tags in multiple levels
2018, NeurocomputingCitation Excerpt :The precision of information retrieved by search engines starts decreasing as they are inefficient to handle such a big volume of data and satisfy user information need. Farhi and Boughaci [2] designed an approach for information retrieval where stochastic modeling was used to extract the desired subgraph from a large web graph at a comparatively low computational cost. But even after embedding the approach of Farhi and Boughaci to the search engines, key issue remains the same.
Information Retrieval in XML Document: State of the Art
2024, Lecture Notes in Networks and SystemsOptimization of the results of a multilingual search engine using a fuzzy recommendation approach
2023, Journal of Information and Organizational Sciences