Skip to main content
Log in

A multi-faceted method for science classification schemes (SCSs) mapping in networking scientific resources

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

Science classification schemes (SCSs) are built to categorize scientific resources (e.g. research publications and research projects) into disciplines for effective research analytics and management. With the explosive growth of the number of scientific resources in distributed research institutions in recent years, effectively mapping different SCSs, especially heterogeneous SCSs that categorize different kinds of scientific resources, is becoming an increasingly challenging problem for facilitating information interoperability and networking scientific resources. To effectively realize the heterogeneous SCSs mapping, we design a novel multi-faceted method to measure the similarity between two classes based on three important facets, namely descriptors, individuals, and semantic neighborhood. Particularly, the proposed approach leverages a hybrid method combining statistical learning, semantic analysis and structure analysis for effective measurement with the exploitation of symmetric Tversky’s index, WordNet dictionary and the Hungarian Algorithm. The method has been evaluated based on two main SCSs that need mapping for information management and policy-making in NSFC, and shown satisfying results. The interoperability among heterogeneous SCSs is resolved to enhance the access to heterogeneous scientific resources and the development of appropriate research analytics policies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. http://www.nsfc.gov.cn/nsfc/cen/xmtj/

  2. http://isisn.nsfc.gov.cn/egrantweb/

  3. http://www.scholarmate.com/

References

  • Antelman, K., Lynema, E., & Pace, A. K. (2013). Toward a twenty-first century catalog. Information Technology and Libraries, 25(3), 128–139.

    Article  Google Scholar 

  • Attig, A., & Perner, P. (2011). The problem of normalization and a normalized similarity measure by online data. Tran CBR, 4(1), 3–17.

    Google Scholar 

  • Avesani, P., Giunchiglia, F., & Yatskevich, M. (2005). A large scale taxonomy mapping evaluation. Proceeding ISWC’05. Proceedings of the 4th international conference on The Semantic Web, pp. 67–81.

  • Banerjee, S., & Pedersen, T. (2003). Extended gloss overlaps as a measure of semantic relatedness. Ijcai, 3, 805–810.

    Google Scholar 

  • Batet, M., Sánchez, D., & Valls, A. (2011). An ontology-based measure to compute semantic similarity in biomedicine. Journal of Biomedical Informatics, 44(1), 118–125.

    Article  Google Scholar 

  • Bergamaschi, S., Castano, S., Vincini, M., & Beneventano, D. (2001). Semantic integration of heterogeneous information sources. Data and Knowledge Engineering, 36(3), 215–249.

    Article  MATH  Google Scholar 

  • Boyack, K. W., Klavans, R., & Börner, K. (2005). Mapping the backbone of science. Scientometrics, 64(3), 351–374.

    Article  Google Scholar 

  • Boyd, K., Eng, K. H., & Page, C. D. (2013). Area under the Precision-Recall Curve: Point estimates and confidence intervals. In Machine learning and knowledge discovery in databases (pp. 451–466). Springer.

  • Breitman, K. K., Brauner, D., Casanova, M. A., Milidiú, R., Gazola, A., & Perazolo, M. (2008). Instance-based ontology mapping. In Engineering of Autonomic and Autonomous Systems, 2008. EASE 2008. Fifth IEEE Workshop on IEEE, pp. 67–74.

  • Chan, L. M. (2000). Exploiting Lcsh, Lcc, and Ddc to retrieve networked resources: Issues and challenges.

  • Chaplan, M. A. (1995). Mapping laborline thesaurus terms to library of congress subject headings: Implications for vocabulary switching. The Library Quarterly, pp. 39–61.

  • Choi, N., Song, I.-Y., & Han, H. (2006). A survey on ontology mapping. SIGMOD Record, 35(3), 34–41.

    Article  Google Scholar 

  • Davis, L. (1991). Handbook of genetic algorithms. New York: Van Nostrand Reinhold.

    Google Scholar 

  • Davis, J., & Goadrich, M. 2006. The Relationship between Precision-Recall and Roc Curves. Proceedings of the 23rd international conference on Machine learning: ACM, pp. 233–240.

  • Du, W., Xu, W., Jiang, H., & Ma, J. (2014). Fuzzy Classification Scheme Mapping for Decision Making. In Thirty Fifth International Conference on Information Systems. Auckland.

  • Duong, T. H., Nguyen, N. T., & Jo, G. S. (2009). A hybrid method for integrating multiple ontologies. Cybernetics and Systems: An International Journal, 40(2), 123–145.

    Article  MATH  Google Scholar 

  • Eck, N. J. V., & Waltman, L. (2009). How to normalize cooccurrence data? An analysis of some well-known similarity measures. Journal of the American Society for Information Science and Technology, 60(8), 1635–1651.

    Article  Google Scholar 

  • Fall, C. J., Törcsvári, A., Benzineb, K., & Karetka, G. (2003). Automated categorization in the international patent classification (pp. 10–25). ACM SIGIR Forum: ACM.

    Google Scholar 

  • Fleiss, J. L. (1971). Measuring nominal scale agreement among many raters. Psychological Bulletin, 76(5), 378–382.

    Article  Google Scholar 

  • Gao, W., and Xu, T. (2013). Stability analysis of learning algorithms for ontology similarity computation. Abstract and Applied Analysis, 2013. doi:10.1155/2013/174802.

  • Geertzen, J. (2012). Inter-rater agreement with multiple raters and variables.

  • Genesereth, M. R., Keller, A. M., & Duschka, O. M. (1997). Infomaster: An information integration system (pp. 539–542). ACM SIGMOD Record: ACM.

    Google Scholar 

  • Glänzel, W., & Schubert, A. (2003). A new classification scheme of science fields and subfields designed for scientometric evaluation purposes. Scientometrics, 56(3), 357–367.

    Article  Google Scholar 

  • Glänzel, W., Schubert, A., & Czerwon, H. J. (1999). An item-by-item subject classification of papers published in multidisciplinary and general journals using reference analysis. Scientometrics, 44(3), 427–439.

    Article  Google Scholar 

  • Hamers, L., Hemeryck, Y., Herweyers, G., Janssen, M., Keters, H., Rousseau, R., & Vanhoutte, A. (1989). Similarity measures in scientometric research: The Jaccard index Versus Salton’s cosine formula. Information Processing and Management, 25(3), 315–318.

    Article  Google Scholar 

  • Hossein Zadeh, D., & Reformat, M. Z. (2013). Assessment of semantic similarity of concepts defined in ontology. Information Sciences, 250, 21–39.

    Article  Google Scholar 

  • Jiang, J. J., & Conrath, D. W. (1997). Semantic similarity based on corpus statistics and lexical taxonomy. arXiv preprint cmp-lg/9709008).

  • Jimenez, S., Becerra, C., Gelbukh, A., Bátiz, A. J. D., & Mendizábal, A. (2013). Softcardinality-core: Improving text overlap with distributional measures for semantic textual similarity. p. 194. Atlanta, Georgia, USA.

  • Kalfoglou, Y., & Schorlemmer, M. (2003). Ontology mapping: The state of the art. The Knowledge Engineering Review, 18(01), 1–31.

    Article  Google Scholar 

  • Kuhn, H. W. (1955). The Hungarian method for the assignment problem. Naval Research Logistics Quarterly, 2(1–2), 83–97.

    Article  MathSciNet  Google Scholar 

  • Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1), 159–174.

    Article  MathSciNet  MATH  Google Scholar 

  • Larkey, L. S. (1999). A patent search and classification system. Acm dl, pp. 179–187.

  • Lei Zeng, M., & Mai Chan, L. (2004). Trends and issues in establishing interoperability among knowledge organization systems. Journal of the American Society for Information Science and Technology, 55(5), 377–395.

    Article  Google Scholar 

  • Ma, F.-C., Lyu, P.-H., Yao, Q., Yao, L., & Zhang, S.-J. (2014). Publication trends and knowledge maps of global translational medicine research. Scientometrics, 98(1), 221–246.

    Article  Google Scholar 

  • Makhoul, J., Kubala, F., Schwartz, R., & Weischedel, R. (1999). Performance measures for information extraction. Proceedings of DARPA broadcast news workshop, pp. 249–252.

  • Marshall, B., Chen, H., & Madhusudan, T. (2006). Matching knowledge elements in concept maps using a similarity flooding algorithm. Decision Support Systems, 42(3), 1290–1306.

    Article  Google Scholar 

  • McCulloch, E., Shiri, A., & Nicholson, D. (2005). Challenges and issues in terminology mapping: A digital library perspective. The Electronic Library, 23(6), 671–677.

    Article  Google Scholar 

  • Melnik, S., Garcia-Molina, H., & Rahm, E. (2002). Similarity flooding: A versatile graph matching algorithm and its application to schema matching. Proceedings of the. 18th ICDE Conf.(Best Student Paper award).

  • Miller, G. A. (1995). Wordnet: A lexical database for english. Communications of the ACM, 38(11), 39–41.

    Article  Google Scholar 

  • Noyons, E. (2001). Bibliometric mapping of science in a policy context. Scientometrics, 50(1), 83–98.

    Article  Google Scholar 

  • Noyons, E. C., Moed, H. F., & Van Raan, A. F. (1999). Integrating research performance analysis and science mapping. Scientometrics, 46(3), 591–604.

    Article  Google Scholar 

  • Omelayenko, B. (2002). Integrating vocabularies: Discovering and representing vocabulary maps. In The Semantic WebIswc 2002. , pp. 206–220. Springer.

  • Papakonstantinou, Y., Garcia-Molina, H., & Widom, J. (1995). Object exchange across heterogeneous information sources. Data Engineering, 1995. Proceedings of the eleventh international conference on: IEEE, pp. 251–260.

  • Patwardhan, S., Banerjee, S., & Pedersen, T. (2003). Using measures of semantic relatedness for word sense disambiguation. In Computational linguistics and intelligent text processing. Springer, pp. 241–257.

  • Pedersen, T., Patwardhan, S., & Michelizzi, J. (2004). Wordnet: Similarity: Measuring the relatedness of concepts. Demonstration Papers at HLT-NAACL 2004: Association for Computational Linguistics, pp. 38–41.

  • Perner, J., and Zotenko, E. (2011). Characterizing Cell types through differentially expressed gene clusters using a model-based approach. Springer.

  • Pfeffer, M. (2014). Using clustering across union catalogues to enrich entries with indexing information. In: Data Analysis, Machine Learning and Knowledge Discovery. pp. 437–445. Springer.

  • Rafols, I., & Leydesdorff, L. (2009). Content-based and algorithmic classifications of journals: Perspectives on the dynamics of scientific communication and indexer effects. Journal of the American Society for Information Science and Technology, 60(9), 1823–1835.

    Article  Google Scholar 

  • Rahm, E., & Bernstein, P. A. (2001). A survey of approaches to automatic schema matching. The VLDB Journal, 10(4), 334–350.

    Article  MATH  Google Scholar 

  • Robinson-García, N., & Calero-Medina, C. (2013). What do university rankings by fields rank? Exploring discrepancies between the organizational structure of universities and bibliometric classifications. Scientometrics, 98(3), 1955–1970.

    Article  Google Scholar 

  • Rodríguez, M. A., & Egenhofer, M. J. (2003). Determining semantic similarity among entity classes from different ontologies. Knowledge and Data Engineering, IEEE Transactions on, 15(2), 442–456.

    Article  Google Scholar 

  • Rowley, J. (1994). The controlled versus natural indexing languages debate revisited: A perspective on information retrieval practice and research. Journal of information science, 20(2), 108–118.

    Article  Google Scholar 

  • Silva, T., Guo, Z., Ma, J., Jiang, H., & Chen, H. (2013). A social network-empowered research analytics framework for project selection. Decision Support Systems, 55(4), 957–968.

    Article  Google Scholar 

  • Sokal, R. R. (1974). Classification: purposes, principles, progress, prospects. Science, 185(4157), 1115–1123.

    Article  Google Scholar 

  • Szostak, R. (2008). Classification, interdisciplinarity, and the study of science. Journal of documentation, 64(3), 319–332.

    Article  Google Scholar 

  • Thor, A., Kirsten, T., & Rahm, E. (2007). Instance-based matching of hierarchical ontologies. BTW, 103, 436–448.

    Google Scholar 

  • Tijssen, R. J. W. (1992). A quantitative assessment of interdisciplinary structures in science and technology: Co-classification analysis of energy research. Research Policy, 21(1), 27–44.

    Article  Google Scholar 

  • Truong, H. B., Duong, T. H., & Nguyen, N. T. (2013). A hybrid method for fuzzy ontology integration. Cybernetics and Systems, 44(2–3), 133–154.

    Article  Google Scholar 

  • Tversky, A. (1977). Features of similarity. Psychological Review, 84(4), 327.

    Article  Google Scholar 

  • Vugteveen, P., Lenders, R., & Van den Besselaar, P. (2014). The dynamics of interdisciplinary research fields: The case of river research. Scientometrics.

  • Xu, Y., Guo, X., Hao, J., Ma, J., Lau, R. Y. K., & Xu, W. (2012). Combining social network and semantic concept analysis for personalized academic researcher recommendation. Decision Support Systems, 54(1), 564–573.

    Article  Google Scholar 

  • Yau, C.-K., Porter, A., Newman, N., & Suominen, A. (2014). Clustering scientific documents with topic modeling. Scientometrics, 100(3), 767–786.

    Article  Google Scholar 

  • Zdonik, S. B., & Maier, D. (1990). Readings in object-oriented database systems. Morgan Kaufmann.

  • Zhang, Y., Peng, J., Huang, D., & Li, F. (2011). Design of automatic mapping system between Ddc and Clc. In Digital libraries: For cultural heritage, knowledge dissemination, and future creation (pp. 357–366). Springer.

  • Zins, C., & Santos, P. L. (2011). Mapping the knowledge covered by library classification systems. Journal of the American Society for Information Science and Technology, 62(5), 877–901.

    Article  Google Scholar 

Download references

Acknowledgments

The authors gratefully thank the Editor and all reviewers. The authors also acknowledge with gratitude the generous support of the University Grants Committee (UGC) of Hong Kong (CityU 148012), National Natural Science Foundation of China (71501057), and City University of Hong Kong.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wei Du.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Du, W., Lau, R.Y.K., Ma, J. et al. A multi-faceted method for science classification schemes (SCSs) mapping in networking scientific resources. Scientometrics 105, 2035–2056 (2015). https://doi.org/10.1007/s11192-015-1742-z

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-015-1742-z

Keywords

Navigation