Skip to main content
Log in

Node similarity in the citation graph

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Published scientific articles are linked together into a graph, the citation graph, through their citations. This paper explores the notion of similarity based on connectivity alone, and proposes several algorithms to quantify it. Our metrics take advantage of the local neighborhoods of the nodes in the citation graph. Two variants of link-based similarity estimation between two nodes are described, one based on the separate local neighborhoods of the nodes, and another based on the joint local neighborhood expanded from both nodes at the same time. The algorithms are implemented and evaluated on a subgraph of the citation graph of computer science in a retrieval context. The results are compared with text-based similarity, and demonstrate the complementarity of link-based and text-based retrieval.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. ACM Special Interest Group on Hypertext, H. and the Web (2002) Hypertext 2002 Conference. ACM, http://www.cs.umd.edu/ht02/ (last accessed Sept. 7, 2002)

  2. An Y (2001) Characterizing and mining the citation graph of the computer science literature. Technical Report CS-2001-02, Faculty of Computer Science, Dalhousie University, Halifax, Nova Scotia, Canada

  3. An Y, Janssen J, Milios E (2004) Characterizing and mining the citation graph of computer science. Knowl Inf Syst 6(6):664–678

    Google Scholar 

  4. Baeza-Yates R, Ribeiro-Neto B (1999) Modern information retrieval. Addision Wesley/ ACM Press, New York

    Google Scholar 

  5. Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. In: Proceedings of the 7th international world wide web conference April 1998, Brisbane, Australia, pp 107–117

  6. Chen C (1999) Visualising semantic spaces and author co-citation networks in digital libraries. Inf Process Manage 35(3):401–420

    Article  Google Scholar 

  7. Cortes C, Pregibon D, Volinsky C (2001) Communities of interest. In: Proceedings of the 4th international conference on advances in intelligent data analysis (IDA-2001), pp 105–114

  8. Davis R, Neviett W, Foltz M (2002) Information architecture. Technical Report http:// www.infoarch.ai.mit.edu/ (last accessed Sept. 7, 2002), MIT AI Lab

  9. Dean J, Henzinger MR (1999) Finding related web pages in the world wide web. In: Proceedings of the 8th international world wide web conference (WWW8), pp 389–401

  10. Diestel R (2000) Graph theory, 2nd edn. Springer, Berlin Heidelberg, New York

    Google Scholar 

  11. Dodge M (2002), An atlas of cyberspaces: information space maps. Technical Report http://www.cybergeography.org/atlas/info_maps.html (last accessed Sept. 7, 2002)

  12. Faloutsos C, Lin K-I (1995) Fastmap: A fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets. In: Proceedings of the 1995 ACM SIGMOD international conference on management of data, San Jose, California, United States

  13. Frantzi K, Ananiadou S, Mima H (2000) Automatic recognition of multiword terms. Int J Digit Libr 3(2):117–132

    Article  Google Scholar 

  14. Garfield E (1955) Citation indexes for science. Science 122(3159):108–111

    Google Scholar 

  15. Garfield E (1972) Citation analysis as a tool in journal evaluation. Science 178(4060):471–479

    Google Scholar 

  16. Kaufman L, Rousseeuw P (1990) Finding groups in data: an introduction to cluster analysis. Wiley, New York

    Google Scholar 

  17. Kleinberg JM (1999) Authoritative sources in a hyperlinked environment. J ACM 46(5):577–603

    Google Scholar 

  18. Kumar S, Raghavan P, Rajagopalan S, Tomkins A (1999) Extracting large scale knowledge bases from the web. In: IEEE international conference on very large databases (VLDB), Edinburgh, Scotland

  19. Lawrence S, Giles CL, Bollacker K (1999) Digital libraries and autonomous citation indexing. IEEE Comput 32(6):67–71

    Google Scholar 

  20. Lu W, Janssen J, Milios E, Japkowicz N (2001) Node similarity in networked information spaces. Technical Report CS-2001-03, Faculty of Computer Science, Dalhousie University, Halifax, Nova Scotia, Canada

  21. Manning C, Schuetze H (1999) Foundations of statistical natural language processing. MIT Press, Cambridge, Massachusetts

    MATH  Google Scholar 

  22. Milios E, Zhang Y, He B, Dong L (2003), Automatic term extraction and document similarity in special text corpora. In: Proceedings of the 6th conference of the pacific association for computational linguistics (PACLing′03), Halifax, Nova Scotia, Canada, pp 275–284

  23. Ng A, Zheng A, Jordan M (2001) Stable algorithms for link analysis. In: Proceedings of the 24th annual internation ACM SIGIR conference on research and devlopment in information retrieval (SIGIR)

  24. Small H (1973) Co-citation in the scientific literature: A new measure of the relationship between two documents. J Am Soc Inf Sci 24:265–269

    Google Scholar 

  25. Small H (1986) The synthesis of specialty narratives from co-citation clusters. J Am Soc Inf Sci 37:97–110

    Google Scholar 

  26. Tjaden G (2002) The knowledge enterprise in information space. Technical Report http://www.ces.btc.gatech.edu/report4.html (last accessed Sept. 7, 2002), The Centre for Enterprise Systems, Georgia Institute of Technology

  27. Treasury U (n.d.) Financial Crimes Enforcement Network (FinCEN). Technical Report http://www.ustreas.gov/fincen/sitemap.html (accessed Oct. 25, 2001), US Government

  28. van Rijsbergen C (1999) Information retrieval. http://www.dcs.gla.ac.uk/∼iain/keith/ index.htm, 2nd ed., last accessed on Apr. 17, 2002

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to E. Milios.

Additional information

Wangzhong Lu holds a Bachelor's degree from Hefei University of Technology (1993), and a Master's degree from Dalhousie University (2001), both in computer science. From 1993 to 1999 he worked as a developer with China National Computer Software and Technical Service Corp. in Beijing. From 2001 to 2005 he held industrial positions as a senior software architect in Atlantic Canada. He is currently with DST Systems, Charlotte, NC, as a senior data architect.

Jeannette Janssen's research area is applied graph theory. She has worked on the problem of frequency assignment in cellular and digital broadcasting networks. Her current interest is in graph theory applied to the World Wide Web and other networked information spaces. Dr. Janssen did her Master's studies at Eindhoven University of Technology in the Netherlands, and her doctorate at Lehigh University, USA. She is currently an associate professor at Dalhousie University, Canada.

Evangelos Milios received a diploma in electrical engineering from the National Technical University of Athens, and Master's and Ph.D. degrees in electrical engineering and computer science from the Massachusetts Institute of Technology. He held faculty positions at the University of Toronto and York University. He is currently a professor of computer science at Dalhousie University, Canada, where he was Director of the Graduate Program. He has served on the committees of the ACM Dissertation Award, and the AAAI/SIGART Doctoral Consortium. He has worked on the interpretation of visual and range signals for landmark-based positioning, navigation and map construction in single- and multi-agent robotics. His current research activity is centered on Networked Information Spaces, Web information retrieval, and aquatic robotics. He is a senior member of the IEEE.

Nathalie Japkowicz is an associate professor at the School of Information Technology and Engineering of the University of Ottawa. She obtained her Ph.D. from Rutgers University, her M.Sc. from the University of Toronto, and her B.Sc. from McGill University. Prior to joining the University of Ottawa, she taught at Ohio State University and Dalhousie University. Her area of specialization is Machine Learning and her most recent research interests focused on the class imbalance problem. She made over 50 contributions in the form of journal articles, conference articles, workshop articles, magazine articles, technical reports or edited volumes.

Yongzheng Zhang obtained a B.E. in computer applications from Southeast University, China, in 1997 and a M.S. in computer science from Dalhousie University in 2002. From 1997 to 1999 he was an instructor and undergraduate advisor at Southeast University. He also worked as a software engineer in Ricom Information and Telecommunications Co. Ltd., China. He is currently a Ph.D. candidate at Dalhousie University. His research interests are in the areas of Information Retrieval, Machine Learning, Natural Language Processing, and Web Mining, particularly centered on Web Document Summarization. A paper based on his Master's thesis received the best paper award at the 2003 Canadian Artificial Intelligence conference.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lu, W., Janssen, J., Milios, E. et al. Node similarity in the citation graph. Knowl Inf Syst 11, 105–129 (2007). https://doi.org/10.1007/s10115-006-0023-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-006-0023-9

Keywords

Navigation