Abstract
In this paper, given a set of research papers with only title and author information, a mining strategy is proposed to discover and organize the communities of authors according to both the co-author relationships and research topics of their published papers. The proposed method applies the CONGA algorithm to discover collaborative communities from the network constructed from the co-author relationship. To further group the collaborative communities of authors according to research interests, the CiteSeerX is used as an external source to discover the hidden hierarchical relationships among the topics covered by the papers. In order to evaluate whether the constructed topic-based collaborative community is semantically meaningful, the first part of evaluation is to measure the consistency between the terms appearing in the published papers of a topic-based collaborative community and the terms in the documents related to the specific topic retrieved from other external source. The experimental results show that 81.61% of the topic-based collaborative communities satisfy the consistency requirement. On the other hand, the accuracy of the discovered sub-concept relationship is verified by checking the Wikipedia categories. It is shown that 75.96% of the sub-concept terms are properly assigned in the concept hierarchy.
This work was partially supported by the R.O.C. N.S.C. under Contract No. 98-2221-E-003-017 and NSC 98-2631-S-003-002.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Deng, H., Lyu, M.R., King, I.: Effective Latent Space Graph-based Re-ranking Model with Global Consistency. In: Proceeding of the Second ACM International Conference on Web Search and Data Mining, pp. 212–221 (2009)
Ding, C.H.Q., He, X., Zha, H., Gu, M., Simon, H.D.: A Min-max Cut Algorithm for Graph Partitioning and Data Clustering. In: Proceeding of the IEEE International Conference on Data Mining, pp. 107–114 (2001)
Gregory, S.: An Algorithm to Find Overlapping Community Structure in Networks. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) PKDD 2007. LNCS (LNAI), vol. 4702, pp. 91–102. Springer, Heidelberg (2007)
Gregory, S.: A Fast Algorithm to Find Overlapping Communities in Networks. In: Proceeding of the 12th European Conference on Principles and Practice of Knowledge Discovery in Databases, pp. 408–423 (2008)
Grineva, M.P., Grinev, M.N., Lizorkin, D.: Extracting Key Terms From Noisy and Multitheme Documents. In: Proceeding of the 18th ACM International Conference on World Wide Web, pp. 661–670 (2009)
Hofmann, T.: Probabilistic Latent Semantic Indexing. In: Proceeding of the 22nd ACM SIGIR International Conference on Research and Development in Information Retrieval, pp. 50–57 (1999)
Hotho, A., Staab, S., Stumme, G.: Wordnet Improves Text Document Clustering. In: Proceeding of the 26th ACM SIGIR International Conference on Semantic Web Workshop, pp. 541–544 (2003)
Hu, X., Zhang, X., Lu, C., Park, E.K., Zhou, X.: Exploiting Wikipedia as External Knowledge for Document Clustering. In: Proceeding of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 389–396 (2009)
Ley, M.: The DBLP Computer Science Bibliography: Evolution, Research Issues, Perspectives. In: Proceeding of the 9th International Symposium on String Processing and Information Retrieval, pp. 1–10 (2002)
Mei, Q., Cai, D., Zhang, D., Zhai, C.: Topic Modeling with Network Regularization. In: Proceeding of the 17th ACM International Conference on World Wide Web, pp. 101–110 (2008)
Newman, M.E.J.: Modularity and Community Structure in Networks. Proceedings of the National Academy of Sciences of the United States of America 103(23), 8577–8582 (2006)
Shi, J., Malik, J.: Normalized Cuts and Image Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(8), 888–905 (2000)
White, S., Smyth, P.: A Spectral Clustering Approach to Finding communities in Graphs. In: Proceeding of the SIAM International Data Mining Conference, pp. 76–84 (2005)
Zaiane, O.R., Chen, J., Goebel, R.: DBConnect: Mining Research Community on DBLP Data. In: Proceeding of the First ACM Workshop on Social Network Mining and Analysis, pp. 74–81 (2007)
Zhang, H., Qiu, B., Giles, C.L., Foley, H.C., Yen, J.: An LDA-based Community Structure Discovery Approach for Large-Scale Social Networks. In: Proceeding of the IEEE International Conference on Intelligence and Security Informatics, pp. 200–207 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wu, CL., Koh, JL. (2010). Hierarchical Topic-Based Communities Construction for Authors in a Literature Database. In: García-Pedrajas, N., Herrera, F., Fyfe, C., Benítez, J.M., Ali, M. (eds) Trends in Applied Intelligent Systems. IEA/AIE 2010. Lecture Notes in Computer Science(), vol 6097. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13025-0_53
Download citation
DOI: https://doi.org/10.1007/978-3-642-13025-0_53
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13024-3
Online ISBN: 978-3-642-13025-0
eBook Packages: Computer ScienceComputer Science (R0)