Abstract
The automatically construction of term taxonomy can enhance our ability for expressing the science mapping. In this paper, we introduce the definition of weighted co-occurring word pair and corresponding improved method of word co-occurrence analysis. An application and evaluation of this proposed method in the library and information science is also discussed, which includes how to get the expanded effective keywords, how to calculate the weight of keywords and their relations, and how to abstract the hierarchical structures and other relations such as synonyms and etc. A visualization tool and a prototype search system are designed for browsing the term taxonomy identified. Finally, we report the experiment of evaluation and comparison. The experiment results prove that this proposed method in helping users doing semantic searches and expanding their searching results is effective and can meet the requirement of some specific domains.
Similar content being viewed by others
References
Benz, D., Hotho, A., & Stumme, G. (2010). Semantics made by you and me: Self-emerging ontologies can capture the diversity of shared knowledge. In Proceedings of the 2nd web science conference (WebSci10), Raleigh, NC, USA.
Bordag, S. (2008). A comparison of co-occurrence and similarity measures as simulations of context. In Computational linguistics and intelligent text processing (pp. 52–63). Berlin: Springer.
Börner, K., Chen, C., & Boyack, K. W. (2003). Visualizing knowledge domains. Annual Review of Information Science and Technology, 37(1), 179–255.
Buitelaar, P., Cimiano, P., Haase, P., & Sintek, M. (2009). Towards linguistically grounded ontologies. In The semantic web: research and applications (pp. 111–125). Berlin: Springer.
Callon, M., Courtial, J. P., & Laville, F. (1991). Co-word analysis as a tool for describing the network of interactions between basic and technological research: The case of polymer chemsitry. Scientometrics, 22(1), 155–205.
Chen, H. P., He, L., Chen, B., & Gu, J. G. (2009). Design and implementation of ontology generator based on relational database. Computer Engineering, 35(5), 34–36.
Choi, S. S., Cha, S. H., & Tappert, C. C. (2010). A survey of binary similarity and distance measures. Journal of Systemics, Cybernetics and Informatics, 8(1), 43–48.
Cobo, M. J., López-Herrera, A. G., Herrer-Viedma, E., & Herrera, F. (2012). SciMAT: A new science mapping analysis software tool. Journal of the American Society for Information Science and Technology, 63(8), 1609–1630.
Dellschaft, K., & Staab, S. (2006). On how to perform a gold standard based evaluation of ontology learning. The Semantic Web-ISWC 2006 (pp. 228–241). Heidelberg: Springer.
Doyle, L. B. (1962). Indexing and abstracting by association. American Documentation, 13(4), 378–390.
Eck, N. J. V., & Waltman, L. (2010). Software survey: VOSviewer, a computer program for bibliometric mapping. Scientometrics, 84(2), 523–538.
Egghe, L., & Leydesdorff, L. (2009). The relation between Pearson’s correlation coefficient r and Salton’s cosine measure. Journal of the American Society for Information Science and Technology, 60(5), 1027–1036.
Fellbaum, C. (Ed.). (1998). WordNet: An electronic lexical database. Cambridge, MA: MIT Press.
Geng, Q., & Geng, C. (2006). Concept extraction in automatic ontology construction using words co-occurrence. New Technology of Library and Information Service, 22(2), 43–45.
Gillum, T. L. (1964). Compiling a Technical Thesaurus. Journal of Chemical Documentation, 4(1), 29–32.
Hadzic, M., & Chang, E. (2005). Ontology-based support for human disease study. In proceedings of the 38th annual hawaii international conference on system sciences, HICSS’05, IEEE.
Jones, K. S. (1972). A statistical interpretation of term specificity and its application in retrieval. Journal of documentation, 28(1), 11–21.
Jung, Y., Ryu, J., Kim, K. M., & Myaeng, S. H. (2010). Automatic construction of a large scale situation ontology by mining how to instructions from the web. Web Semantics: Science, Services and Agents on the World Wide Web, 8(2), 110–124.
Labrou, Y., Stergiou, S., Adler, B. T., Marvit, D. L., & Reinhardt, A. (2012). U.S. Patent no. 8,280,886. Washington, DC: U.S. Patent and Trademark Office.
Lim, E. H., Tam, H. W., Wong, S. W., Liu, J. N., & Lee, R. S. (2009). Collaborative content and user-based web ontology learning system. In Fuzzy systems, 2009. FUZZ-IEEE 2009. IEEE international conference on (pp. 1050–1055). IEEE.
López-Herrera, A. G., Cobo, M. J., Herrera-Viedma, E., & Herrera, F. (2010). A bibliometric study about the research based on hybrid-dating the fuzzy logic field and the other computational intelligent techniques: A visual approach. International Journal of Hybrid Intelligent Systems, 7(1), 17–32.
Lu, X. B., Meng, X., & Zhang, J. (2012). Visualization of hot topics in social tagging based on co-words analysis method. Journal of the China Society for Scientific and Technical Information, 31(2), 204–212.
Martínez, M. A., Cobo, M. J., Herrera, M., & Herrera-Viedma, E. (2014). Analyzing the scientific evolution of social work discipline using science mapping. research on social work practice, 1049731514522101.
Miller, G. A. (1995). WordNet: a lexical database for English. Communications of the ACM, 38(11), 39–41.
Morita, T., Shigeta, Y., Sugiura, N., Fukuta, N., Izumi, N., & Yamaguchi, T. (2004). DODDLE-OWL: On-the-fly ontology construction with ontology quality management. In Proceedings of the 3rd international semantic web conference (ISWC).
Murgado-Armenteros, E. M., Gutiérrez-Salcedo, M., Torres-Ruiz, F. J., & Cobo, M. J. (2015). Analysing the conceptual evolution of qualitative marketing research through science mapping analysis. Scientometrics, 102(1), 519–557.
Nickerson, R. C., Varshney, U., & Muntermann, J. (2013). A method for taxonomy development and its application in information systems. European Journal of Information Systems, 22(3), 336–359.
Peat, H. J., & Willett, P. (1991). The limitations of term co-occurrence data for query expansion in document retrieval systems. Journal of the American Society for Information Science, 42(5), 378–383.
Qiu, J. G., Zhang, B., Wang, H. L., & Zhang, K. (2012). Research on regularity in adjacent co-occurrence between semantic relationsin objective knowledge system. Journal of the China Society for Scientific and Technical Information, 31(2), 126–135.
Salton, G., & McGill, M. J. (1983). Introduction to modern information retrieval. New York: McGraw-Hill.
Sheng, L., & Li, C. (2009). English and Chinese languages as weighted complex networks. Physica A: Statistical Mechanics and its Applications, 388(12), 2561–2570.
Soergel, D. (1974a). Automatic and semi-automatic methods as an aid in construction of indexing languages and thesauri. International Classification, 1(1), 34–38.
Soergel, D. (1974b). Indexing languages and thesauri: construction and maintenance. Los Angeles, CA: Melville Publishing Company.
Van Eck, N. J., & Waltman, L. (2009). How to normalize cooccurrence data? An analysis of some well-known similarity measures. Journal of the American Society for Information Science and Technology, 60(8), 1635–1651.
Wang, Y. F., Song, S., & Miao, L. (2006). Application study of co-occurrence analysis in knowledge service. New Technology of Library and Information Service, 22(4), 29–34.
Xu, S., Qiao, X. D., Zhu, L. J., Zhang, Y. L., & Xue, C. X. (2012). A novel approach for co-occurrence clustering analysis: maximal frequent itemset mining. Journal of the China Society for Scientific and Technical Information, 31(2), 143–150.
Yu, C. M., & Zhou, D. (2010). The complexity analysis of the emotional word co-occurrence network. Journal of the China Society for Scientific and Technical Information, 29(5), 906–914.
Zhang, Y. F., & Cai, J. J. (2011). Research on the user interest ontology learning based on web mining technology. Journal of the China Society for Scientific and Technical Information, 30(4), 380–386.
Zhang, Z. L., Zhang, Z. Q., & Li, X. Y. (2011). Co-occurrence analysis between research institutes and keywords based on 2-mode network. Journal of the China Society for Scientific and Technical Information, 30(12), 1249–1260.
Zhong, M. J., Wan, C. X., & Liu, A. H. (2009). Question answering system based on frequently asked questions using co-occurrence word model. Journal of the China Society for Scientific and Technical Information, 28(2), 242–247.
Acknowledgments
This work has been supported by Social Science Foundation of Jiangsu Province 2014SJB144 (2014), and Chinese National Natural Science Foundation 71103081 (2011).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Li, S., Sun, Y. & Soergel, D. A new method for automatically constructing domain-oriented term taxonomy based on weighted word co-occurrence analysis. Scientometrics 103, 1023–1042 (2015). https://doi.org/10.1007/s11192-015-1571-0
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11192-015-1571-0