Experimental evaluation of parameter settings in calculation of hybrid similarities: effects of first- and second-order similarity, edge cutting, and weighting factors

Meyer-Brötz, Fabian; Schiebel, Edgar; Brecht, Leo

doi:10.1007/s11192-017-2366-2

Experimental evaluation of parameter settings in calculation of hybrid similarities: effects of first- and second-order similarity, edge cutting, and weighting factors

Published: 01 April 2017

Volume 111, pages 1307–1325, (2017)
Cite this article

Scientometrics Aims and scope Submit manuscript

Fabian Meyer-Brötz¹,
Edgar Schiebel² &
Leo Brecht¹

483 Accesses
8 Citations
Explore all metrics

Abstract

The ongoing discussion in the bibliometric community about the best similarity measures has led to diverse insights. Although these insights are sometimes contradicting, there is one very consistent conclusion: Hybrid measures outperform the application of their singular components. While this initially answers the question as to what is the best similarity measure, it also raises issues which have been resolved in part for conventional similarity measures. Given this, in this study we investigate the impact of the right weighting factors, the appropriate level of edge cutting, the performance of first- in contrast to second-order similarities, and the interaction of these three parameters in the context of hybrid similarities. Building upon a dataset of over 8000 articles from the manufacturing engineering field and using different parameter settings we calculated over 100 similarity matrices. For each matrix we determined several cluster solutions of different resolution levels, ranging from 100 to 1000 clusters, and evaluated them quantitatively with the help of a textual coherence value based on the Jensen Shannon Divergence. We found that second-order hybrid similarity measures calculated with a weighting factor of 0.6 for the citation-based similarity and a reduction to only the strongest values yield the best clustering results. Furthermore, we found the assessed parameters to be highly interdependent, where for example hybrid first-order outperforms second-order when no edge cutting is applied. Given this, our results can serve the bibliometric community as a guideline for the appropriate application of hybrid measures.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mathematical Modeling: Interdisciplinary Similarity Studies

A Tale of Four Metrics

Using mutual information as a cocitation similarity measure

Article 13 April 2019

References

Ahlgren, P., & Colliander, C. (2009). Document–document similarity approaches and science mapping: Experimental comparison of five approaches. Journal of Informetrics, 3(1), 49–63. doi:10.1016/j.joi.2008.11.003.
Article Google Scholar
Arenas, A., Fernández, A., & Gómez, S. (2008). Analysis of the structure of complex networks at different resolution levels. New Journal of Physics, 10(5), 53039.
Article Google Scholar
Benoit, K., & Nulty P. (2016). quanteda: Quantitative analysis of textual data. https://CRAN.R-project.org/package=quanteda. Accessed January 31, 2016.
Blondel, V. D., Guillaume, J.-L., Lambiotte, R., & Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 10, 10008ff.
Article Google Scholar
Boyack, K. W., & Klavans, R. (2010). Co-citation analysis, bibliographic coupling, and direct citation: Which citation approach represents the research front most accurately? Journal of the American Society for Information Science and Technology, 61(12), 2389–2404. doi:10.1002/asi.21419.
Article Google Scholar
Boyack, K. W., & Klavans, R. (2014). Creation of a highly detailed, dynamic, global model and map of science. Journal of the Association for Information Science and Technology, 65(4), 670–685. doi:10.1002/asi.22990.
Article Google Scholar
Colliander, C., & Ahlgren, P. (2012). Experimental comparison of first and second-order similarities in a scientometric context. Scientometrics, 90(2), 675–685. doi:10.1007/s11192-011-0491-x.
Article Google Scholar
Csardi, G., & Nepusz, T. (2006). The igraph software package for complex network research (p. 1695). Complex Systems: InterJournal.
Google Scholar
Eisenhardt, K. M. (1989). Building theories from case study research. Academy of Management Review, 14(4), 532–550.
Google Scholar
Feinerer, I., & Hornik, K. (2015). tm: Text mining package. https://CRAN.R-project.org/package=tm. Accessed January 31, 2016.
Frey, B. J., & Dueck, D. (2007). Clustering by passing messages between data points. Science, 315(5814), 972–976. doi:10.1126/science.1136800.
Article MathSciNet MATH Google Scholar
Glänzel, W. (2012). Bibliometric methods for detecting and analysing emerging research topics. Profesional De La Informacion, 21(2), 194–201. doi:10.3145/epi.2012.mar.11.
Article Google Scholar
Glänzel, W., & Thijs, B. (2011). Using ‘core documents’ for the representation of clusters and topics. Scientometrics, 88(1), 297–309. doi:10.1007/s11192-011-0347-4.
Article Google Scholar
Hornik, K., Buchta, C., & Zeileis, A. (2009). Open-source machine learning: R meets Weka. Computational Statistics, 24(2), 225–232. doi:10.1007/s00180-008-0119-7.
Article MathSciNet MATH Google Scholar
Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2(1), 193–218. doi:10.1007/BF01908075.
Article MATH Google Scholar
Janssens, F., Glänzel, W., & Moor, B. (2008). A hybrid mapping of information science. Scientometrics, 75(3), 607–631. doi:10.1007/s11192-007-2002-7.
Article Google Scholar
Janssens, F., Zhang, L., de Moor, B., & Glänzel, W. (2009). Hybrid clustering for validation and improvement of subject-classification schemes. Information Processing and Management, 45(6), 683–702. doi:10.1016/j.ipm.2009.06.003.
Article Google Scholar
Klavans, R., & Boyack, K. W. (2017). Which type of citation analysis generates the most accurate taxonomy of scientific and technical knowledge? Journal of the Association for Information Science and Technology, 68, 984–998. doi:10.1002/asi.23734.
Article Google Scholar
Li, Y., Zhang, G., Feng, Y., & Wu, C. (2015). An entropy-based social network community detecting method and its application to scientometrics. Scientometrics, 102(1), 1003–1017. doi:10.1007/s11192-014-1377-5.
Article Google Scholar
Lin, J. (1991). Divergence measures based on the Shannon entropy. IEEE Transactions on Information Theory, 37(1), 145–151. doi:10.1109/18.61115.
Article MathSciNet MATH Google Scholar
Liu, X., Glänzel, W., & de Moor, B. (2012). Optimal and hierarchical clustering of large-scale hybrid networks for scientific mapping. Scientometrics, 91(2), 473–493. doi:10.1007/s11192-011-0600-x.
Article Google Scholar
Martin, S., Brown, W. Michael, Klavans, R., & Boyack, K. W. (2011). OpenOrd: An open-source toolbox for large graph layout. Proceedings of SPIE - The International Society for Optical Engineering, 7868, 786–806. doi:10.1117/12.871402.
Google Scholar
Meng, X., Liu, X., Tong, Y., Glänzel, W., & Tan, S. (2015). Multi-view clustering with exemplars for scientific mapping. Scientometrics, 105(3), 1527–1552. doi:10.1007/s11192-015-1682-7.
Article Google Scholar
Newman, M. (2004). Fast algorithm for detecting community structure in networks. Physical Review E, 69(6), 066133. doi:10.1103/PhysRevE.69.066133.
Article Google Scholar
R Core Team (2016). R: A language and environment for statistical computing. Vienna: R Foundation for statistical computing. URL https://www.R-project.org/. Accessed January 31, 2016.
Salton, G., & McGill, M. J. (1983). Introduction to modern information retrieval (McGraw-Hill computer science series). New York: McGraw-Hill.
MATH Google Scholar
Schiebel, E. (2012). Visualization of research fronts and knowledge bases by three-dimensional areal densities of bibliographically coupled publications and co-citations. Scientometrics, 91(2), 557–566. doi:10.1007/s11192-012-0626-8.
Article Google Scholar
Sharma, V., Prakash, U., & Kumar, B. V. M. (2015). Surface composites by friction stir processing: A review. Journal of Materials Processing Technology, 224, 117–134. doi:10.1016/j.jmatprotec.2015.04.019.
Article Google Scholar
Sims, G. E., Jun, S.-R., Wu, G. A., & Kim, S.-H. (2008). Alignment-free genome comparison with feature frequency profiles (FFP) and optimal resolutions. Proceedings of the National Academy of Sciences of the United States of America, 106(8), 2677–2682. doi:10.1073/pnas.0813249106.
Article Google Scholar
Strehl, A., & Ghosh, J. (2003). Cluster ensembles—a knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research, 3, 583–617. doi:10.1162/153244303321897735.
MathSciNet MATH Google Scholar
Thijs, B., Schiebel, E., & Glänzel, W. (2013). Do second-order similarities provide added-value in a hybrid approach? Scientometrics, 96(3), 667–677. doi:10.1007/s11192-012-0896-1.
Article Google Scholar
Zhang, L., Glänzel, W., & Ye, F. Y. (2015). The Dynamic evolution of core documents: An experimental study based on h-related literature (2005–2013). Scientometrics, 106(1), 369–381. doi:10.1007/s11192-015-1705-4.
Article Google Scholar

Download references

Author information

Authors and Affiliations

University of Ulm, Ulm, Germany
Fabian Meyer-Brötz & Leo Brecht
AIT Austrian Institute of Technology GmbH, Vienna, Austria
Edgar Schiebel

Authors

Fabian Meyer-Brötz
View author publications
You can also search for this author in PubMed Google Scholar
Edgar Schiebel
View author publications
You can also search for this author in PubMed Google Scholar
Leo Brecht
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fabian Meyer-Brötz.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Meyer-Brötz, F., Schiebel, E. & Brecht, L. Experimental evaluation of parameter settings in calculation of hybrid similarities: effects of first- and second-order similarity, edge cutting, and weighting factors. Scientometrics 111, 1307–1325 (2017). https://doi.org/10.1007/s11192-017-2366-2

Download citation

Received: 08 April 2016
Published: 01 April 2017
Issue Date: June 2017
DOI: https://doi.org/10.1007/s11192-017-2366-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Experimental evaluation of parameter settings in calculation of hybrid similarities: effects of first- and second-order similarity, edge cutting, and weighting factors

Abstract

Access this article

Similar content being viewed by others

Mathematical Modeling: Interdisciplinary Similarity Studies

A Tale of Four Metrics

Using mutual information as a cocitation similarity measure

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Experimental evaluation of parameter settings in calculation of hybrid similarities: effects of first- and second-order similarity, edge cutting, and weighting factors

Abstract

Access this article

Similar content being viewed by others

Mathematical Modeling: Interdisciplinary Similarity Studies

A Tale of Four Metrics

Using mutual information as a cocitation similarity measure

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation