Abstract
This paper investigates the utility of the Inclusion Index, the Jaccard Index and the Cosine Index for calculating similarities of documents, as used for mapping science and technology. It is shown that, provided that the same content is searched across various documents, the Inclusion Index generally delivers more exact results, in particular when computing the degree of similarity based on citation data. In addition, various methodologies such as co-word analysis, Subject-Action-Object (SAO) structures, bibliographic coupling, co-citation analysis, and self-citation links are compared. We find that the two former ones tend to describe rather semantic similarities that differ from knowledge flows as expressed by the citation-based methodologies.
Similar content being viewed by others
References
Ahlgren, P., Jarneving, B., Rousseau, R. (2003), Requirements for a cocitation similarity measure, with special reference to Pearson’s correlation coefficient, Journal of the American Society for Information Science, 54: 550–560.
Bartkowski, A., Hill, J., Lühr, C., Schramm, R. (2004), Rationelle Patentrecherche und Patentanalyse. In: R. Schramm, S. Milde (Eds), PATINFO 2004 Patentrecht und Patentinformation — Mittel zur Innovation. pp. 177–204.
Bergmann, I., Butzke, D., Walter, L., Fuerste, J. P., Moehrle, M. G., Erdmann, V. A. (2007), Evaluating the Risk of Patent Infringement by Means of Semantic Patent Analysis: The Case of DNA Chips, Proceedings of the R&D Management Conference, Bremen, July 4–6, 2007.
Blanchard, A. (2007), Understanding and customizing stopword lists for enhanced patent mapping, World Patent Information, 29: 308–316.
Boerner, K., Chen, C., Boyack, K. W. (2003), Visualizing knowledge domains, Annual Review of Information Science and Technology, 37: 179–255.
Borgatti, S. P., Everett, M. G., Freeman, L. (1999), Ucinet 6 for Windows — Software for Social Network Analysis, Harvard, MA: Analytic Technologies.
Callon, M., Courtial, J. P., Laville, F. (1991), Co-word analysis as a tool for describing the network of interactions between basic and technological research: The case of polymer chemistry, Scientometrics, 22: 155–205.
Clarkson, G. (2004), Objective Identification of Patent Thickets: A Network Analytic Approach, Harvard Business School Doctoral Thesis http://www.si.umich.edu/stiet/researchseminar/Fall%202004/Patent%20Thickets%20v3.9.pdf.
Dreßler, A. (2006), Patente in technologieorientierten Mergers und Acquisitions, Dt. Univ.-Verl, Wiesbaden.
Golbeck, J., Mutton, P. (2006), Spring-embedded graphs for semantic visualization. In: V. Geroimenko, C. Chen (Eds), Visualizing the Semantic Web — XML-based Internet and Information Visualization. Springer, pp. 172–182.
Hamers, L., Hemeryck, Y., Herweyers, G., Janssen, M., Keters, H., Rousseau, R., Vanhoutte, A. (1989), Similarity measures in scientometric research: the Jaccard index versus Salton’s cosine formula, Information Processing and Management, 25: 315–318.
Harter, S. P., Nisonger, T. E., Weng, A. (1993), Semantic relationships between cited and citing articles in library and information science journals, Journal of the American Society for Information Science, 44: 543–552.
Invention Machine Corporation (no date), Accelerating the speed of knowledge, White Paper, http://lsdis.cs.uga.edu/SemWebCourse_files/WP/Invention_Machine.pdf (March 09, 2007).
Jaccard, P. (1901), Bulletin del la Société Vaudoisedes Sciences Naturelles, 37: 241–272.
Jarneving, B. (2005), A comparison of two bibliometric methods for mapping of the research front, Scientometrics, 65: 245–263.
Kamada, T., Kawai, S. (1989), An algorithm for drawing general undirected graphs, Information Processing Letters, 31: 7–15.
Kessler, M. M. (1963), Bibliographic coupling between scientific papers, American Documentation, 14: 10–25.
Leydesdorff, L. (1987), Various methods for the mapping of science, Scientometrics, 11: 295–324.
Marshakova, I. V. (1973), System of document connections based on references, Scientific and Technical Information Serial of VINITI, 6: 3–8.
Moehrle, M. G., Walter, L., Geritz, A., Müller, S. (2005), Patent-based inventor profiles as a basis for human resource decisions in research and development, R & D Management, 35: 513–524.
Peters, H., Braam, R., Raan, A. (1995), Cognitive resemblance and citation relations in chemical engineering publications, Journal of the American Society for Information Science, 46: 9–21.
Porter, M. (1980), An algorithm for suffix stripping program, Program, 14: 130–137.
Qin, J. (2000), Semantic similarities between a keyword database and a controlled vocabulary database: An investigation in the antibiotic resistance literature, Journal of the American Society for Information Science, 51: 166–180.
Ramlogan, R., Mina, A., Tampubolon, G., Metcalfe, J. (2007), Networks of knowledge: The distributed nature of medical innovation, Scientometrics, 70: 459–489.
Rijsbergen, C. V. (1979), Information Retrieval, Butterworth, London.
Rip, A., Courtial, J. (1984), Co-word maps of biotechnology: An example of cognitive scientometrics, Scientometrics, 6: 381–400.
Salton, G., Macgill, M. J. (1983), Introduction to Modern Information Retrieval, McGraw-Hill, New York.
Sharabchiev, J. T. (1989), Cluster analysis of bibliographic references as a scientometric method, Scientometrics, 15: 127–137.
Small, H., Griffith, B. C. (1974), The structure of scientific literatures I: Identifying and graphing specialties, Science Studies, 4: 17–40.
Small, H. (1973), Co-citation in the scientific literature: A new measure of the relationship between two documents, Journal of the american Society for Information Science, 24: 265–269.
Sternitzke, C., Bartkowski, A., Schramm, R. (2007), Regional PATLIB centres as integrated one-stop service providers for intellectual property services, World Patent Information, 29: 241–245.
Tsourikov, V. M., Batchilo, L. S., Sovpel, I. V. (2000), Document semantic analysis/selection with knowledge creativity capability utilizing subject-action-object (SAO) structures, United States Patent No. 6167370.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Sternitzke, C., Bergmann, I. Similarity measures for document mapping: A comparative study on the level of an individual scientist. Scientometrics 78, 113–130 (2009). https://doi.org/10.1007/s11192-007-1961-z
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11192-007-1961-z