Abstract
Discovering relationships between vertices in a secured information network is an important task in information network analysis. In HIN, meta-path, or a sequence of vertex types and edge types connecting two vertices. Path instance of a meta-path is path in HIN that satisfies the meta-path. The length of meta-path is the number of relations (edges) in this meta-path. Meaningful meta-path is a meta-path with at least one path instance. Recent works on meta-path discovery mainly focus on in-memory algorithms that fit in only one computer. In this chapter, we propose distributed algorithms to discover all shortest meaningful meta-paths between two vertices of a large HIN using Apache Spark. Shortest meaningful meta-path is a meaningful meta-path with shortest length. We employ a scalable implementation of the Distributed Breadth-First Search (D-BFS) algorithm as a baseline approach. Finding all possible shortest paths in a large HIN can be time consuming. Therefore, we propose a novel algorithm called shortest meaningful meta-path based search (S-MPS). S-MPS first searches all shortest meta-path candidates between vertices in the graph of the network schema of HIN. We conduct experiments on DBLP data set to prove the efficiency of our proposed S-MPS algorithm over D-BFS.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Sun, Y., Han, J., Yan, X., Yu, P., Wu, T.: Path-Sim: meta path-based top-k similarity search in heterogeneous information networks. In: VLDB, pp. 992–1003 (2011). https://doi.org/10.14778/3402707.3402736
Shi, C., Li, Y., Zhang, J., Sun, Y., Yu, P.S.: A survey of heterogeneous information network analysis. IEEE Trans. Knowl. Data Eng. (2017). https://doi.org/10.1109/TKDE.2016.2598561
Phan, T., Do, P.: Building a Vietnamese question answering system based on knowledge graph and distributed CNN. Neural Comput. Appl. 33, 14887–14907 (2021). https://doi.org/10.1007/s00521-021-06126-z
Do, P., Pham, P.: DW-PathSim: a distributed computing model for topic-driven weighted meta-path-based similarity measure in a large-scale content-based heterogeneous information network. J. Inf. Telecommun. 3(1), 19–38 (2019). https://doi.org/10.1080/24751839.2018.1516714
Salhi, D., Tari, A., Kechadi, T.: Using clustering for forensics analysis on internet of things. Int. J. Softw. Sci. Comput. Intell. (2021)
Kong, X., Cao, B., Yu, P., Ding, Y., Wild, D.: Meta path-based collective classification in heterogeneous information. Networks (2013). https://doi.org/10.1145/2396761.2398474
Trappey, A.J., Trappey, C.V., Chang, A., Li, J.X.: Deriving competitive foresight using an ontology-based patent roadmap and valuation analysis. In: International Journal on Semantic Web and Information Systems, pp. 68–91 (2019). https://doi.org/10.4018/IJSWIS.2019040104
Ho, T., Do, P.: Discovering communities of users on social networks based on topic model combined with Kohonen network. In: Seventh International Conference on Knowledge and Systems Engineering, pp. 268–273 (2015). https://doi.org/10.1109/KSE.2015.54.
Do, P.: A system for natural language interaction with the heterogeneous information network. In: Handbook of Research on Cloud Computing and Big Data Applications in IoT (2019)
Besmir, S., Florie, I., Lule, A.: Integration of semantics into sensor data for the IoT: a systematic literature review. In: International Journal on Semantic Web and Information Systems (2020). https://doi.org/10.4018/IJSWIS.2020100101
Meng, C., Cheng, R., Maniu, S., Senellart, P., Zhang, W.: Discovering meta-paths in large heterogeneous information networks. In: Proceedings of the 24th International Conference on World Wide Web (2015). https://doi.org/10.1145/2736277.2741123
Liu, H., Jin, C., Yang, B., Zhou, A.: Finding Top-k shortest paths with diversity. In: IEEE Transactions on Knowledge and Data Engineering, pp. 488–502 (2018). https://doi.org/10.1109/TKDE.2017.2773492.
Khekare, G., Verma, P., Dhanre, U., Raut, S., Sheikh, S: The optimal path finding algorithm based on reinforcement learning. In: International Journal of Software Science and Computational Intelligence (2020). https://doi.org/10.4018/IJSSCI.2020100101
Iqbal, S., Hussain, I., Sharif, Z., Qureshi, K.H., Jabeen, J.: Reliable and energy-efficient routing scheme for underwater wireless sensor networks (UWSNs). In: International Journal of Cloud Applications and Computing (IJCAC) (2021). https://doi.org/10.4018/IJCAC.2021100103
Zhu, Z., Cheng, R., Do, L., Huang, Z., Zhang, H.: Evaluating Top-k meta path queries on large heterogeneous information networks. In: IEEE International Conference on Data Mining, pp. 1470–1475 (2018). https://doi.org/10.1109/ICDM.2018.00204
Drabas, T., Lee D.: Learning PySpark. Packt (2017)
Al-Nawasrah, A., Almomani, A.A., Atawneh, S., Alauthman, M.: A survey of fast flux botnet detection with fast flux cloud computing. In: International Journal of Cloud Applications and Computing (2021). https://doi.org/10.4018/IJCAC.2020070102
Dave, A., Jindal, A., Liy, L.E., Xin, R., Gonzalez, J., Zaharia, M.: GraphFrames: an integrated API for mixing graph and relational queries. In: Proceedings of the Fourth International Workshop on Graph Data Management Experiences and Systems (2016). https://doi.org/10.1145/2960414.2960416
Koji, U., Toyotaro, S., Naoya, M., Katsuki, F., Satoshi, M.: Efficient breadth-first search on massively parallel and distributed-memory machines. In: Data Science and Engineering (2017). https://doi.org/10.1007/s41019-016-0024-y
Shi, C., Li, Y., Zhang, J., Sun, Y.: A survey of heterogeneous information network analysis. IEEE Trans. Knowl. Data Eng. (2017)
Ni, L., William, C.: Fast query execution for retrieval models based on path-constrained random walks. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2010). https://doi.org/10.1145/1835804.1835916
Chuan, S., Xiangnan, K., Yue, H., Philip, Y., Bin, W.: HeteSim: a general framework for relevance measure in heterogeneous networks. In: IEEE Transactions on Knowledge and Data Engineering, vol. 26 (2013). https://doi.org/10.1109/TKDE.2013.2297920
Blei, D.M., Ng, A.Y., Michael, I.J.: Latent Dirichlet allocation. J. Mach. Learn. Res. (2003)
Lijun, C., Xuemin, L., Lu, Q., Jeffrey, X., Jian, P.: Efficiently computing Top-K shortest path join (2015)
Acknowledgements
This research is funded by Vietnam National University Ho Chi Minh City (VNU-HCMC) under the grant number DS2020-26-01.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Do, P. (2022). Finding All Shortest Meaningful Meta-Paths Between Two Vertices of a Secured Large Heterogeneous Information Network Using Distributed Algorithm. In: Nedjah, N., Abd El-Latif, A.A., Gupta, B.B., Mourelle, L.M. (eds) Robotics and AI for Cybersecurity and Critical Infrastructure in Smart Cities. Studies in Computational Intelligence, vol 1030. Springer, Cham. https://doi.org/10.1007/978-3-030-96737-6_10
Download citation
DOI: https://doi.org/10.1007/978-3-030-96737-6_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-96736-9
Online ISBN: 978-3-030-96737-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)