Skip to main content

Efficient Large-Scale Multi-graph Similarity Search Using MapReduce

  • Conference paper
  • First Online:
Web Information Systems and Applications (WISA 2019)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11817))

Included in the following conference series:

Abstract

A multi-graph is a set consisting of multiple graphs. Multi-graph similarity search aims to find the multi-graphs similar to the query multi-graphs from the multi-graph datasets. It plays important role in a wide range of application fields, such as finding similar drugs, searching similar molecule groups and so on. However, existing algorithms of multi-graph similarity search are memory-based algorithms, which are not suitable for the large amount of multi-graph scenarios. In this paper, we propose a parallel algorithm based on the MapReduce programming model to solve the problem of the large-scale multi-graph similarity search. Our proposed algorithm consists of two MapReduce jobs, one for indexing and the other for filtering and validation. Specially, we adapt the localization strategy to further improve the performance of our algorithm, which not only reduces the communication cost, but also mitigates the load imbalance. Extensive experimental results show that our algorithm is effective and efficient.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Wu, J., Hong, Z., Pan, S., et al.: Multi-graph learning with positive and unlabeled bags. In: SDM, pp. 217–225 (2014)

    Google Scholar 

  2. Wu, J., Zhu, X., Zhang, C., et al.: Bag constrained structure pattern mining for multi-graph classification. IEEE Trans. Knowl. Data Eng. 26(10), 2382–2396 (2014)

    Article  Google Scholar 

  3. Wu, J., Pan, S., Zhu, X., et al.: Boosting for multi-graph classification. Trans. Cybern. 45(3), 430–443 (2015)

    Article  Google Scholar 

  4. Pang, J., Gu, Y., Yu, G.: A similarity search technique for graph set. J. Northeast. Univ. (Nat. Sci.) 38(5), 625–629 (2017)

    MathSciNet  MATH  Google Scholar 

  5. Pang, J., Gu, Y., Xu, J., et al.: Parallel multi-graph classification using extreme learning machine and MapReduce. Neurocomputing 261, 171–183 (2017)

    Article  Google Scholar 

  6. Pang, J., Zhao, Y., Xu, J., et al.: Super-graph classification based on composite subgraph features and extreme learning machine. Cogn. Comput. 10(6), 922–936 (2018)

    Article  Google Scholar 

  7. Pang, J., Gu, Y., Xu, J., et al.: Semi-supervised multi-graph classification using optimal feature selection and extreme learning machine. Neurocomputing 277, 89–100 (2018)

    Article  Google Scholar 

  8. Zheng, Z., Tung, A.K.H., Wang, J., et al.: Comparing stars: on approximating graph edit distance. In: Proceedings of International Conference on Very Large Databases (VLDB) Endowment, vol. 2, no. 1, pp. 25–36 (2009)

    Google Scholar 

  9. Wang, G., Wang, B., Yang, X., et al.: Efficiently indexing large sparse graphs for similarity search. IEEE Trans. Knowl. Data Eng. 24(3), 440–451 (2012)

    Article  Google Scholar 

  10. Zhao, X., Xiao, C., Lin, X., et al.: A partition-based approach to structure similarity search. In: Proceedings of International Conference on Very Large Databases (VLDB) Endowment, vol. 7, no. 3, pp. 169–180 (2013)

    Article  Google Scholar 

  11. Zheng, W., Zou, L., Lian, X., et al.: Efficient graph similarity search over large graph databases. IEEE Trans. Knowl. Data Eng. 27(4), 964–978 (2015)

    Article  Google Scholar 

  12. Zhao, P.: Similarity search in large-scale graph databases. In: Zomaya, A.Y., Sakr, S. (eds.) Handbook of Big Data Technologies, pp. 507–529. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-49340-4_15

    Chapter  Google Scholar 

  13. Sun, Z., Huo, H., Chen, X.: Fast top-k graph similarity search via representative matrices. IEEE Access 6, 21408–21417 (2018)

    Article  Google Scholar 

  14. Roy, S.B., Eliassi-Rad, T., Papadimitriou, S.: Fast best-effort search on graphs with multiple attributes. In: Proceedings of International Conference on Data Engineering (ICDE), pp. 1574–1575 (2016)

    Google Scholar 

  15. Fang, Y., Cheng, R., Li, X., et al.: Effective community search over large spatial graphs. In: Proceedings of International Conference on Very Large Databases (VLDB) Endowment, vol. 10, no. 6, pp. 709–720 (2017)

    Article  Google Scholar 

  16. Yu, W., Wang, F.: Fast exact CoSimRank search on evolving and static graphs. In: Proceedings of WWW, pp. 599–608 (2018)

    Google Scholar 

  17. Achanta, R., Shaji, A., Smith, K., et al.: SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Trans. Pattern Anal. Mach. Intell. 34(11), 2274–2282 (2012)

    Article  Google Scholar 

  18. Viktor, M.S., Kenneth, C.: Big Data: A Revolution that Will Transform How We Live, Work and Think, pp. 9–10. Houghton Mifflin Harcourt, Boston (2013)

    Google Scholar 

  19. Cao, L., Cho, B., Kim, H., et al.: Delta-SimRank computing on MapReduce. In: Proceedings of International Workshop on Big Data, pp. 28–35 (2012)

    Google Scholar 

  20. Lim, B., Chung, Y.: A parallel maximal matching algorithm for large graphs using Pregel. IEICE Trans. Inf. Syst. 97–D(7), 1910–1913 (2014)

    Article  Google Scholar 

  21. Xiong, X., Zhang, M., Zheng, J., Liu, Y.: Social network user recommendation method based on dynamic influence. In: Meng, X., Li, R., Wang, K., Niu, B., Wang, X., Zhao, G. (eds.) WISA 2018. LNCS, vol. 11242, pp. 455–466. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-02934-0_42

    Chapter  Google Scholar 

  22. Jamour, F., Skiadopoulos, S., Kalnis, P.: Parallel algorithm for incremental betweenness centrality on large graphs. IEEE Trans. Parallel Distrib. Syst. 29(3), 659–672 (2018)

    Article  Google Scholar 

  23. Afrati, N.F., Sarma, D., et al.: Fuzzy joins using MapReduce. In: Proceedings of International Conference on Data Engineering (ICDE), pp. 498–509 (2012)

    Google Scholar 

Download references

Acknowledgment

The work is partially supported by the National Natural Science Foundation of China (No. 61702381, No. 61872070, No. 61772124), the Hubei Natural Science Foundation (No. 2017CFB196), Guangdong Province Key Laboratory of Popular High Performance Computers (No. 2017B030314073), Liao Ning Revitalization Talents Program (XLYC1807158), the Scientific Research Foundation of Wuhan University of Science and Technology (2017xz015), and the Fundamental Research Funds for the Central Universities (N171605001).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jun Pang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Pang, J., Yu, M., Gu, Y. (2019). Efficient Large-Scale Multi-graph Similarity Search Using MapReduce. In: Ni, W., Wang, X., Song, W., Li, Y. (eds) Web Information Systems and Applications. WISA 2019. Lecture Notes in Computer Science(), vol 11817. Springer, Cham. https://doi.org/10.1007/978-3-030-30952-7_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-30952-7_23

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-30951-0

  • Online ISBN: 978-3-030-30952-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics