Abstract
User identification process is an important security guard towards discovering insider threat and preventing unauthorized access in enterprise networks. However, most existing user identification approaches based on behavior analysis fail to capture latent correlations between multi-domain behavior records due to the lack of a panoramic view or the disability of dealing with heterogeneous data. In light of this, this paper presents HeteroUI, a framework based on heterogeneous information network embedding for user identification in enterprise networks. In our model, multi-domain heterogeneous behavior records are first transformed into a heterogeneous information network, then the embeddings of entities will be trained iteratively according to a joint objective combining with local and global components for more accurate user identification. Experimental results on the CERT insider threat dataset r4.2 demonstrate that HeteroUI exhibits excellent performance in discovering user identities with the mean average precision reaching over 98%. Besides, HeteroUI has a certain contribution to inferring potential insiders in a multi-user and multi-domain environment.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
A query means a set of new behavior records for a certain PC to be inspected.
References
Shashanka, M., Shen, M.Y., Wang, J.: User and entity behavior analytics for enterprise security. In: 2016 IEEE International Conference on Big Data (Big Data), pp. 1867–1874. IEEE (2016)
Shi, C., Li, Y., Zhang, J., Sun, Y., Philip, S.Y.: A survey of heterogeneous information network analysis. IEEE Trans. Knowl. Data Eng. 29(1), 17–37 (2016)
Sun, Y., Han, J., Yan, X., Yu, P.S., Wu, T.: PathSim: meta path-based top-K similarity search in heterogeneous information networks. Proc. VLDB Endowment 4(11), 992–1003 (2011)
Tuor, A., Kaplan, S., Hutchinson, B., Nichols, N., Robinson, S.: Deep learning for unsupervised insider threat detection in structured cybersecurity data streams. In: Workshops at the Thirty-First AAAI Conference on Artificial Intelligence (2017)
Pei, K., et al.: HERCULE: attack story reconstruction via community discovery on correlated log graph. In: ACSAC, pp. 583–595 (2016)
Wang, J., Cai, L., Yu, A., Zhu, M., Meng, D.: TempatMDS: a masquerade detection system based on temporal and spatial analysis of file access records. In: 2018 17th IEEE International Conference on Trust, Security and Privacy in Computing and Communications/12th IEEE International Conference on Big Data Science and Engineering (TrustCom/BigDataSE), pp. 360–371. IEEE (2018)
Chen, T., Sun, Y.: Task-guided and path-augmented heterogeneous network embedding for author identification. In: Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, pp. 295–304. ACM (2017)
Du, M., Li, F., Zheng, G., Srikumar, V.: Deeplog: Anomaly detection and diagnosis from system logs through deep learning. In: Proceedings of the 2017 ACM SIGSAC, CCS, pp. 1285–1298 (2017)
Bottou, L.: Large-scale machine learning with stochastic gradient descent. In: Proceedings of COMPSTAT 2010, pp. 177–186. Physica-Verlag HD (2010)
Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., Yakhnenko, O.: Translating embeddings for modeling multi-relational data. In: Advances in Neural Information Processing Systems, pp. 2787–2795 (2013)
Tang, J., Qu, M., Mei, Q.: Pte: predictive text embedding through large-scale heterogeneous text networks. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1165–1174. ACM (2015)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Bhattacharjee, S.D., Yuan, J., Jiaqi, Z., Tan, Y.P.: Context-aware graph-based analysis for detecting anomalous activities. In: 2017 IEEE International Conference on Multimedia and Expo (ICME), pp. 1021–1026. IEEE (2017)
Le, D.C., Zincir-Heywood, A.N.: Machine learning based insider threat modelling and detection. In: 2019 IFIP/IEEE Symposium on Integrated Network and Service Management (IM), pp. 1–6. IEEE (2019)
Erhan, D., Bengio, Y., Courville, A., Manzagol, P.A., Vincent, P., Bengio, S.: Why does unsupervised pre-training help deep learning? J. Mach. Learn. Res. 11(Feb), 625–660 (2010)
Dittman, D. J., Khoshgoftaar, T. M., Napolitano, A.: The effect of data sampling when using random forest on imbalanced bioinformatics data. In: 2015 IEEE International Conference on Information Reuse and Integration, pp. 457–463. IEEE (2015)
Acknowledgments
This research was supported by the National Key R&D Program of China (No. 2016YFB0801001). We thank our shepherd Shujun Li for his valuable feedback.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Li, M., Cai, L., Yu, A., Yu, H., Meng, D. (2020). HeteroUI: A Framework Based on Heterogeneous Information Network Embedding for User Identification in Enterprise Networks. In: Zhou, J., Luo, X., Shen, Q., Xu, Z. (eds) Information and Communications Security. ICICS 2019. Lecture Notes in Computer Science(), vol 11999. Springer, Cham. https://doi.org/10.1007/978-3-030-41579-2_10
Download citation
DOI: https://doi.org/10.1007/978-3-030-41579-2_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-41578-5
Online ISBN: 978-3-030-41579-2
eBook Packages: Computer ScienceComputer Science (R0)