skip to main content
10.1145/3357384.3357966acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Author Set Identification via Quasi-Clique Discovery

Published: 03 November 2019 Publication History

Abstract

Author identification based on heterogeneous bibliographic networks, which is to identify potential authors given an anonymous paper, has been studied in recent years. However, most of the existing works merely consider the relationship between authors and anonymous papers, while ignore the relationships between authors. In this paper, we take the relationships among authors into consideration to study the problem of author set identification, which is to identify an author set rather than an individual author related to an anonymous paper. The proposed problem has important applications to new collaborator discovery and group building. We propose a novel Author Set Identification approach, namely ASI. ASI first extracts a task-guided embedding to learn the low-dimensional representations of nodes in bibliographic network. And then ASI leverages the learned embedding to construct a weighted paper-ego-network, which contains anonymous paper and candidate authors. Finally, converting the optimal author set identification to the quasi-clique discovery in the constructed network, ASI utilizes a local-search heuristic mechanism under the guidance of the devised density function to find the optimal quasiclique. Extensive experiments on bibliographic networks demonstrate that ASI outperforms the state-of-art baselines in author set identification.

References

[1]
Yuichi Asahiro, Kazuo Iwama, Hisao Tamaki, and Takeshi Tokuyama. 2000. Greedily finding a dense subgraph. Journal of Algorithms, Vol. 34, 2 (2000), 203--221.
[2]
Devora Berlowitz, Sara Cohen, and Benny Kimelfeld. 2015. Efficient Enumeration of Maximal k-Plexes. In Acm Sigmod International Conference on Management of Data .
[3]
Aditya Bhaskara, Moses Charikar, Eden Chlamtac, Uriel Feige, and Aravindan Vijayaraghavan. 2010. Detecting high log-densities: an O (n $1/4$) approximation for densest k-subgraph. In Proceedings of TOC. ACM, 201--210.
[4]
Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko. 2013. Translating embeddings for modeling multi-relational data. In Advances in neural information processing systems. 2787--2795.
[5]
Jie Chen and Yousef Saad. 2012. Dense subgraph extraction with application to community detection. IEEE Transactions on Knowledge and Data Engineering, Vol. 24, 7 (2012), 1216--1230.
[6]
Ting Chen and Yizhou Sun. 2017. Task-guided and path-augmented heterogeneous network embedding for author identification. In Proceedings of WSDM. ACM, 295--304.
[7]
Michael Cogswell, Faruk Ahmed, Ross Girshick, Larry Zitnick, and Dhruv Batra. 2015. Reducing overfitting in deep networks by decorrelating representations. arXiv preprint arXiv:1511.06068 (2015).
[8]
Alessio Conte, Donatella Firmani, Caterina Mordente, Maurizio Patrignani, and Riccardo Torlone. 2017. Fast Enumeration of Large k-Plexes. In Acm Sigkdd International Conference on Knowledge Discovery & Data Mining .
[9]
Yuxiao Dong, Nitesh V Chawla, and Ananthram Swami. 2017. metapath2vec: Scalable representation learning for heterogeneous networks. In Proceedings of SIGKDD. ACM, 135--144.
[10]
Christos Giatsidis, Dimitrios M Thilikos, and Michalis Vazirgiannis. 2013. D-cores: measuring collaboration of directed graphs based on degeneracy. Knowledge and information systems, Vol. 35, 2 (2013), 311--343.
[11]
Andrew V Goldberg. 1984. Finding a maximum density subgraph .University of California Berkeley, CA.
[12]
Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In Proceedings of SIGKDD. ACM, 855--864.
[13]
Shawndra Hill and Foster Provost. 2003. The myth of the double-blind review?: author identification using only citations. Acm Sigkdd Explorations Newsletter, Vol. 5, 2 (2003), 179--184.
[14]
Bryan Hooi, Hyun Ah Song, Alex Beutel, Neil Shah, Kijung Shin, and Christos Faloutsos. 2016. Fraudar: Bounding graph fraud in the face of camouflage. In Proceedings of SIGKDD. ACM, 895--904.
[15]
Cheng-Kang Hsieh, Longqi Yang, Yin Cui, Tsung-Yi Lin, Serge Belongie, and Deborah Estrin. 2017. Collaborative metric learning. In Proceedings of WWW. 193--201.
[16]
Zhipeng Huang, Yudian Zheng, Reynold Cheng, Yizhou Sun, Nikos Mamoulis, and Xiang Li. 2016. Meta structure: Computing relevance in large heterogeneous information networks. In Proceedings of SIGKDD. ACM, 1595--1604.
[17]
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
[18]
Jing Li, Feng Xia, Wei Wang, Zhen Chen, Nana Yaw Asabere, and Huizhen Jiang. 2014. Acrec: a co-authorship based random walk model for academic collaboration recommendation. In Proceedings of WWW. ACM, 1209--1214.
[19]
Xiaozhong Liu, Yingying Yu, Chun Guo, and Yizhou Sun. 2014. Meta-path-based ranking with pseudo relevance feedback on heterogeneous graph for citation recommendation. In Proceedings of CIKM. ACM, 121--130.
[20]
Yuanfu Lu, Chuan Shi, Linmei Hu, and Zhiyuan Liu. 2019. Relation Structure-Aware Heterogeneous Information Network Embedding. In Proceedings of AAAI .
[21]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems. 3111--3119.
[22]
Mathias Payer, Ling Huang, Neil Zhenqiang Gong, Kevin Borgolte, and Mario Frank. 2015. What you submit is who you are: a multimodal approach for deanonymizing scientific publications. IEEE Transactions on Information Forensics and Security, Vol. 10, 1 (2015), 200--212.
[23]
Jian Pei, Daxin Jiang, and Aidong Zhang. 2005. On mining cross-graph quasi-cliques. In Proceedings of SIGKDD. ACM, 228--238.
[24]
Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learning of social representations. In Proceedings of SIGKDD. ACM, 701--710.
[25]
Xiang Ren, Jialu Liu, Xiao Yu, Urvashi Khandelwal, Quanquan Gu, Lidan Wang, and Jiawei Han. 2014. Cluscite: Effective citation recommendation by information network-based clustering. In Proceedings of SIGKDD. ACM, 821--830.
[26]
Chuan Shi, Binbin Hu, Xin Zhao, and Philip Yu. 2018. Heterogeneous Information Network Embedding for Recommendation. IEEE Transactions on Knowledge and Data Engineering (2018).
[27]
Chuan Shi, Xiangnan Kong, Yue Huang, S Yu Philip, and Bin Wu. 2014. HeteSim: A General Framework for Relevance Measure in Heterogeneous Networks. IEEE Transactions on Knowledge and Data Engineering, Vol. 26, 10 (2014), 2479--2492.
[28]
Chuan Shi, Yitong Li, Jiawei Zhang, Yizhou Sun, and S Yu Philip. 2017. A survey of heterogeneous information network analysis. IEEE Transactions on Knowledge and Data Engineering, Vol. 29, 1 (2017), 17--37.
[29]
Mauro Sozio and Aristides Gionis. 2010. The community-search problem and how to plan a successful cocktail party. In Proceedings of SIGKDD. ACM, 939--948.
[30]
Yizhou Sun and Jiawei Han. 2012. Mining heterogeneous information networks: principles and methodologies. Synthesis Lectures on Data Mining and Knowledge Discovery, Vol. 3, 2 (2012), 1--159.
[31]
Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S Yu, and Tianyi Wu. 2011. Pathsim: Meta path-based top-k similarity search in heterogeneous information networks. Proceedings of the VLDB Endowment, Vol. 4, 11 (2011), 992--1003.
[32]
Jian Tang, Meng Qu, and Qiaozhu Mei. 2015a. Pte: Predictive text embedding through large-scale heterogeneous text networks. In Proceedings of SIGKDD. ACM, 1165--1174.
[33]
Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. 2015b. Line: Large-scale information network embedding. In Proceedings of WWW . 1067--1077.
[34]
Jie Tang, Sen Wu, Jimeng Sun, and Hang Su. 2012. Cross-domain collaboration recommendation. In Proceedings of SIGKDD. ACM, 1285--1293.
[35]
Jie Tang, Jing Zhang, Limin Yao, Juanzi Li, Li Zhang, and Zhong Su. 2008. Arnetminer: extraction and mining of academic social networks. In Proceedings of SIGKDD. ACM, 990--998.
[36]
Charalampos Tsourakakis, Francesco Bonchi, Aristides Gionis, Francesco Gullo, and Maria Tsiarli. 2013. Denser than the densest subgraph: extracting optimal quasi-cliques with quality guarantees. In Proceedings of SIGKDD. ACM, 104--112.
[37]
Daixin Wang, Peng Cui, and Wenwu Zhu. 2016. Structural deep network embedding. In Proceedings of SIGKDD. ACM, 1225--1234.
[38]
Xiao Wang, Yiding Zhang, and Chuan Shi. 2019. Hyperbolic Heterogeneous Information Network Embedding. In Proceedings of AAAI .
[39]
Kilian Q Weinberger and Lawrence K Saul. 2009. Distance metric learning for large margin nearest neighbor classification. Journal of Machine Learning Research, Vol. 10, Feb (2009), 207--244.
[40]
Chuxu Zhang, Chao Huang, Lu Yu, Xiangliang Zhang, and Nitesh V Chawla. 2018. Camel: Content-Aware and Meta-path Augmented Metric Learning for Author Identification. In Proceedings of WWW. 709--718.

Cited By

View all
  • (2021)Structure-Aware Parameter-Free Group Query via Heterogeneous Information Network Transformer2021 IEEE 37th International Conference on Data Engineering (ICDE)10.1109/ICDE51399.2021.00203(2075-2080)Online publication date: Apr-2021
  • (2021)Heterogeneous Graph Representation for RecommendationHeterogeneous Graph Representation Learning and Applications10.1007/978-981-16-6166-2_7(175-208)Online publication date: 5-Nov-2021

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '19: Proceedings of the 28th ACM International Conference on Information and Knowledge Management
November 2019
3373 pages
ISBN:9781450369763
DOI:10.1145/3357384
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 November 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. author set identification
  2. heterogeneous network construction
  3. meta path
  4. network embedding
  5. quasi-clique

Qualifiers

  • Research-article

Conference

CIKM '19
Sponsor:

Acceptance Rates

CIKM '19 Paper Acceptance Rate 202 of 1,031 submissions, 20%;
Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)0
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2021)Structure-Aware Parameter-Free Group Query via Heterogeneous Information Network Transformer2021 IEEE 37th International Conference on Data Engineering (ICDE)10.1109/ICDE51399.2021.00203(2075-2080)Online publication date: Apr-2021
  • (2021)Heterogeneous Graph Representation for RecommendationHeterogeneous Graph Representation Learning and Applications10.1007/978-981-16-6166-2_7(175-208)Online publication date: 5-Nov-2021

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media