Abstract
Scientific domain vocabularies play an important role in academic communication and lean research management. Confronted with the dramatic increasing of new keywords, the continuous development of a domain vocabulary is important for the domain to keep its long survival in the scientific context. Current methods based either on statistical or linguistic approaches can automatically generate vocabularies that consist of popular keywords, but these approaches fail to capture high-quality standardized terms due to the lack of human intervention. Manual methods take use of human knowledge, but they are both time-consuming and expensive. In order to overcome these deficiencies, this research proposes a novel social voting approach to construct scientific domain vocabularies. It integrates automatic system and human knowledge based on the theory of linguistic arbitrariness and selects widely accepted standardized set of keywords based on social voting. A social voting system has been implemented to aid scientific domain vocabulary construction in the National Natural Science Foundation of China. Two experiments are conducted to demonstrate the effectiveness and validity of the built system. The results show that the constructed domain vocabulary using this system covers a wide range of areas under a discipline and it facilitates the standardization of scientific terminology.







Similar content being viewed by others
Notes
The readers can refer to http://www.niso.org/schemas/iso25964.
The readers can refer to “http://www.nsfc.gov.cn/nsfc/cen/daima/index.html”.
References
Barki, H., Rivard, S., & Talbot, J. (1988). An information systems keyword classification scheme. MIS Quarterly, 12(2), 299–322.
Bowen, L. (2013). Weighted voting systems. Retrieved January 05, 2013, from http://www.ctl.ua.edu/math103/power/wtvoting.htm.
Buckland, M. (1999). Vocabulary as a central concept in library and information science. In Proceedings of the third international conference on conceptions of library and information science (pp. 23–26).
Bullinger, A. C., Hallerstede, S. H., Renken, U., Soeldner, J. H., & Moeslein, K. M. (2010). Towards research collaboration—A taxonomy of social research network sites. In: Proceedings of the 16th Americas conference on information systems (AMCIS) (pp. 12–15).
Cai, S., & Zou, C. (2010). Formal theories of natural languages. Kunming: People’s Publishing House.
Chung, T. M., & Nation, P. (2004). Identifying technical vocabulary. System, 32(2), 251–263.
Coursey, K. H., Mihalcea, R., & Moen, W. E. (2009). Automatic keyword extraction for learning object repositories. Proceedings of the American Society for Information Science and Technology, 45(1), 1–10.
Du, W., Lau, R. Y. K., Ma, J., & Xu, W. (2015). A multi-faceted method for science classification schemes (SCSS) mapping in networking scientific resources. Scientometrics, 105(3), 2035–2056.
Ercan, G., & Cicekli, I. (2007). Using lexical chains for keyword extraction. Information Processing and Management, 43(6), 1705–1714.
Fei, L., Feifan, L., & Yang, L. (2011). A supervised framework for keyword extraction from meeting transcripts. IEEE Transactions on Audio, Speech, and Language Processing, 19(3), 538–548.
Garrod, S. (1998). How groups co-ordinate their concepts and terminology: Implications for medical informatics. Methods of Information in Medicine, 37, 471–476.
Gašević, D., Guizzardi, G., Taveter, K., & Wagner, G. (2010). Vocabularies, ontologies, and rules for enterprise and business process modeling and management. Information Systems, 35(4), 375–378.
HaCohen-Kerner, Y., Gross, Z., & Masa, A. (2005). Automatic extraction and learning of keyphrases from scientific articles. In A. Gelbukh (Ed.), Computational linguistics and intelligent text processing (pp. 657–669). Berlin: Springer.
HaCohen-Kerner, Y., Stern, I., Korkus, D., & Fredj, E. (2007). Automatic machine learning of keyphrase extraction from short html documents written in Hebrew. Cybernetics and Systems: An International Journal, 38(1), 1–21.
Hervás, R., Francisco, V., & Gervás, P. (2013). Assessing the influence of personal preferences on the choice of vocabulary for natural language generation. Information Processing and Management, 49(4), 817–832.
Hörlesberger, M., Roche, I., Besagni, D., Scherngell, T., François, C., Cuxac, P., et al. (2013). A concept for inferring ‘frontier research’ in grant proposals. Scientometrics, 97(2), 129–148.
Hulth, A. (2003). Improved automatic keyword extraction given more linguistic knowledge. In Proceedings of the 2003 conference on empirical methods in natural language processing (pp. 216–223). Association for Computational Linguistics.
Jones, S., & Paynter, G. W. (2002). Automatic extraction of document keyphrases for use in digital libraries: Evaluation and applications. Journal of the American Society for Information Science and Technology, 53(8), 653–677.
Kageura, K., & Umino, B. (1996). Methods of automatic term recognition: A review. Terminology, 3(2), 259–289.
Kim, S. J., Lee, H., & Kim, H. J. (2007). Adaptive partitioned indexes for efficient XML keyword search. Journal of Research and Practice in Information Technology, 39(3), 211–228.
Merriam-Webster. (2013). How does a word get into a Merriam-Webster Dictionary? Retrieved January 05, 2013, from http://www.merriam-webster.com/help/faq/words_in.htm.
Missikoff, M., Velardi, P., & Fabriani, P. (2003). Text mining techniques to automatically enrich a domain ontology. Applied Intelligence, 18(3), 323–340.
National Information Standards Organization. (2005). Guidelines for the construction, format, and management of monolingual controlled vocabularies. Baltimore, Maryland: NISO Press.
Pardo, J. S. (2006). On phonetic convergence during conversational interaction. The Journal of the Acoustical Society of America, 119(4), 2382–2393.
Reitter, D., & Lebiere, C. (2011). How groups develop a specialized domain vocabulary: A cognitive multi-agent model. Cognitive Systems Research, 12(2), 175–185.
Rowley, J. (1994). The controlled versus natural indexing languages debate revisited: A perspective on information retrieval practice and research. Journal of Information Science, 20(2), 108–119.
Saussure, F. D. (1959). Course in general linguistics. New York: McGraw-Hill Book Company.
Spies, M. (2010). An ontology modelling perspective on business reporting. Information Systems, 35(4), 404–416.
Turney, P. D. (2000). Learning algorithms for keyphrase extraction. Information Retrieval, 2(4), 303–336.
Wan, X., Yang, J., & Xiao, J. (2007). Towards an iterative reinforcement approach for simultaneous document summarization and keyword extraction. In Annual meeting-association for computational linguistics (pp. 552–559).
Wang, X. (2008). Distinction between langue and parole and research subject of lexicology. Journal of Bohai University (Philosophy & Social Science Edition), 30(6), 29–35.
Yang, C., Ma, J., Silva, T., Liu, X., & Hua, Z. (2014). A multilevel information mining approach for expert recommendation in online scientific communities. The Computer Journal, 58(9), 1921–1936.
Yoon, B., Lee, S., & Lee, G. (2010). Development and application of a keyword-based knowledge map for effective R&D planning. Scientometrics, 85(3), 803–820.
Yule, G. (2006). The study of language. New York: Cambridge University Press.
Zaharee, M. (2013). Building controlled vocabularies for metadata harmonization. Bulletin of the American Society for Information Science and Technology, 39(2), 39–42.
Zheng, X. (2010). On the arbitrariness of linguistic signs. Cross-Cultural Communication, 5(4), 86–91.
Acknowledgments
This research was partially supported by the General Research Fund of the Hong Kong Research Grant Council (CityU 119611, CityU 148012), the National Natural Science Foundation of China (71371164) and City University of Hong Kong Teaching Development Grant (6000201).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Jiang, H., Yang, C., Ma, J. et al. A social voting approach for scientific domain vocabularies construction. Scientometrics 108, 803–820 (2016). https://doi.org/10.1007/s11192-016-1990-6
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11192-016-1990-6