Skip to main content
Log in

Estimating searching cost of regular path queries on large graphs by exploiting unit-subqueries

  • Published:
Journal of Heuristics Aims and scope Submit manuscript

Abstract

Regular path queries (RPQs) are widely used on a graph whose answer is a set of tuples of nodes connected by paths corresponding to a given regular expression. Traditional automata-based approach for evaluating RPQs is restricted in the explosion of graph size, which makes graph searching take high cost (i.e. memory space and response time). Recently, a cost-based optimization technique using rare labels has been proved to be effective when it is applied to large graph. However, there is still a room for improvement, because the rare labels in the graph and/or the query are coarse information which could not guarantee the minimum searching cost all the time. This is our motivation to find a new approach using fine-grained information to estimate correctly the searching cost, which helps improving the performance of RPQs evaluation. For example, by using estimated searching cost, we can decompose an RPQ into small subqueries or separate multiple RPQs into small batch of queries in an efficient way for parallelism evaluation. In this paper, we present a novel approach for estimating the searching cost of RPQs on large graphs with cost functions based on the combinations of the searching cost of unit-subqueries (i.e. every smallest possible query). We extensively evaluated our method on real-world datasets including Alibaba, Yago, Freebase as well as synthetic datasets. Experimental results show that our estimation method obtains high accuracy which is approximately 87% on average. Moreover, two comparisons with automata-based and rare label based approaches demonstrate that our approach outperforms traditional ones.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. http://neo4j.com/docs/developer-manual/current/cypher/syntax/patterns/.

References

  • Abul-Basher, Z.: Multiple-query optimization of regular path queries. In: 2017 IEEE 33rd International Conference on Data Engineering (ICDE), pp. 1426–1430. IEEE (2017)

  • Almeida, J., Zeitoun, M.: Description and analysis of a bottom-up DFA minimization algorithm. Inf. Process. Lett. 107(2), 52–59 (2008)

    Article  MathSciNet  Google Scholar 

  • Barceló, P., Libkin, L., Lin, A.W., Wood, P.T.: Expressive languages for path queries over graph-structured data. ACM Trans. Database Syst. 37(4), 31 (2012)

    Article  Google Scholar 

  • Barceló Baeza, P.: Querying graph databases. In: Proceedings of the 32nd ACM SIGMOD–SIGACT–SIGAI Symposium on Principles of Database Systems, pp 175–188. ACM (2013)

  • Bast, H., Bäurle, F., Buchhold, B., Haußmann, E.: Easy access to the freebase dataset. In: Proceedings of the 23rd International Conference on World Wide Web. ACM, pp. 95–98 (2014)

  • Bastian, M., Heymann, S., Jacomy, M., et al.: Gephi: an open source software for exploring and manipulating networks. In: ICWSM, vol. 8, pp. 361–362 (2009)

  • Bollacker, K., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: a collaboratively created graph database for structuring human knowledge. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 1247–1250. ACM (2008)

  • Calvanese, D., De Giacomo, G., Lenzerini, M., Vardi, M.Y.: Rewriting of regular expressions and regular path queries. In: Proceedings of the Eighteenth ACM SIGMOD–SIGACT–SIGART Symposium on Principles of Database Systems, pp. 194–204. ACM (1999)

  • Cong, G., Fan, W., Kementsietsidis, A.: Distributed query evaluation with performance guarantees. In: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, pp. 509–520. ACM (2007)

  • Consens, M.P., Mendelzon, A.O.: Graphlog: a visual formalism for real life recursion. In: Proceedings of the ninth ACM SIGACT–SIGMOD–SIGART Symposium on Principles of Database Systems, pp. 404–416. ACM (1990)

  • Cruz, I.F., Mendelzon, A.O., Wood, P.T.: A graphical query language supporting recursion. In: ACM SIGMOD Record, vol. 16, pp. 323–330. ACM (1987)

  • Davoust, A., Esfandiari, B.: Processing regular path queries on arbitrarily distributed data. In: OTM Confederated International Conferences On the Move to Meaningful Internet Systems, pp. 844–861. Springer (2016)

  • Fan, W., Wang, X., Wu, Y.: Performance guarantees for distributed reachability queries. Proc. VLDB Endow. 5(11), 1304–1316 (2012)

    Article  Google Scholar 

  • Fernandez, M., Suciu, D.: Optimizing regular path expressions using graph schemas. In: Proceedings, 14th International Conference on Data Engineering, 1998, pp. 14–23. IEEE (1998)

  • Fletcher, G.H., Peters, J., Poulovassilis, A.: Efficient regular path query evaluation using path indexes. In: Proceedings of the 19th International Conference on Extending Database Technology (EDBT), pp. 636–639 (2016)

  • Goldman, R., Widom, J.: Dataguides: enabling query formulation and optimization in semistructured databases. In: VLDB’97, Proceedings of 23rd International Conference on Very Large Data Bases, 25–29 Aug 1997, Athens, Greece, pp. 436–445 (1997). http://www.vldb.org/conf/1997/P436.PDF. Accessed 23 Aug 2017

  • Grahne, G., Thomo, A.: An optimization technique for answering regular path queries. In: WebDB (Selected Papers), pp. 215–225. Springer (2000)

  • Grahne, G., Thomo, A.: Query containment and rewriting using views for regular path queries under constraints. In: Proceedings of the Twenty-Second ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 111–122. ACM (2003)

  • Hopcroft, J.E., Motwani, R., Ullman, J.D.: Introduction to Automata Theory, Languages, and Computation, 3rd edn. Addison-Wesley Longman Publishing Co., Inc. Boston, MA, USA, (2006)

  • Konstas, I., Stathopoulos, V., Jose, J.M..: On social networks and collaborative recommendation. In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 195–202. ACM (2009)

  • Koschmieder, A., Leser, U.: Regular path queries on large graphs. In: Scientific and Statistical Database Management, pp. 177–194. Springer, Berlin (2012)

  • Kossmann, D.: The state of the art in distributed query processing. ACM Comput. Surv. 32(4), 422–469 (2000)

    Article  Google Scholar 

  • Le Anh, V., Kiss, A.: Efficient processing regular queries in shared-nothing parallel database systems using tree-and structural indexes. In: ADBIS Research Communications (2007)

  • Libkin, L., Vrgoč, D.: Regular path queries on graphs with data. In: Proceedings of the 15th International Conference on Database Theory, pp. 74–85. ACM (2012)

  • Liu, T., Liu, A.X., Shi, J., Sun, Y., Guo, L.: Towards fast and optimal grouping of regular expressions via DFA size estimation. IEEE J. Sel. Areas Commun. 32(10), 1797–1809 (2014)

    Article  Google Scholar 

  • Liu, D., Huang, Z., Zhang, Y., Guo, X., Su, S.: Efficient deterministic finite automata minimization based on backward depth information. PloS ONE 11(11), e0165864 (2016)

    Article  Google Scholar 

  • Mahdisoltani, F., Biega, J., Suchanek, FM.: Yago3: a knowledge base from multilingual Wikipedias. In: CIDR (2013)

  • Mendelzon, A.O., Wood, P.T.: Finding regular simple paths in graph databases. SIAM J. Comput. 24(6), 1235–1258 (1995)

    Article  MathSciNet  Google Scholar 

  • Nguyen-Van, Q., Tung, LD., Hu, Z.: Minimizing data transfers for regular reachability queries on distributed graphs. In: Proceedings of the Fourth Symposium on Information and Communication Technology, pp. 325–334. ACM (2013)

  • Scott, J., Ideker, T., Karp, R.M., Sharan, R.: Efficient algorithms for detecting signaling pathways in protein interaction networks. J. Comput. Biol. 13(2), 133–144 (2006)

    Article  MathSciNet  Google Scholar 

  • Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: a core of semantic knowledge. In: Proceedings of the 16th International Conference on World Wide Web, pp. 697–706. ACM (2007)

  • Suciu, D.: Distributed query evaluation on semistructured data. ACM Trans. Database Syst. 27(1), 1–62 (2002)

    Article  MathSciNet  Google Scholar 

  • Trißl, S.: Cost-based optimization of graph queries. In: Proceedings of the SIGMOD/PODS PhD Workshop on Innovative Database Research (IDAR) (2007)

  • Trißl, S., Leser, U.: Estimating result size and execution times for graph queries. In: ADBIS (Local Proceedings), pp. 11–20 (2010)

  • Tung, L.D., Nguyen-Van, Q., Hu, Z.: Efficient query evaluation on distributed graphs with Hadoop environment. In: Proceedings of the Fourth Symposium on Information and Communication Technology, pp. 311–319. ACM (2013)

  • Yakovets, N., Godfrey, P., Gryz, J.: Query planning for evaluating SPARQL property paths. In: Proceedings of the 2016 International Conference on Management of Data, pp. 1875–1889. ACM (2016)

  • Yang, J., Leskovec, J.: Patterns of temporal variation in online media. In: Proceedings of the Fourth ACM International Conference on Web Search and Data Mining, pp. 177–186. ACM (2011)

  • Yang, J., Leskovec, J.: Defining and evaluating network communities based on ground-truth. Knowl. Inf. Syst. 42(1), 181–213 (2015)

    Article  Google Scholar 

  • Zahiri, J., Hannon Bozorgmehr, J., Masoudi-Nejad, A.: Computational prediction of protein–protein interaction networks: algorithms and resources. Curr. Genomics 14(6), 397–414 (2013)

    Article  Google Scholar 

Download references

Acknowledgements

This research was supported by the MSIP (Ministry of Science, ICT and Future Planning), Korea, under the ITRC (Information Technology Research Center) support program (IITP-2017-2016-0-00314) supervised by the IITP (Institute for Information & communications Technology Promotion). This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT & Future Planning (NRF-2017R1A2B4012559).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kyungbaek Kim.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nguyen, VQ., Huynh, QT. & Kim, K. Estimating searching cost of regular path queries on large graphs by exploiting unit-subqueries. J Heuristics 28, 149–169 (2022). https://doi.org/10.1007/s10732-018-9402-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10732-018-9402-0

Keywords

Navigation