Skip to main content
Log in

A new method for automatically constructing domain-oriented term taxonomy based on weighted word co-occurrence analysis

  • Published:
Scientometrics Aims and scope Submit manuscript

An Erratum to this article was published on 12 January 2016

Abstract

The automatically construction of term taxonomy can enhance our ability for expressing the science mapping. In this paper, we introduce the definition of weighted co-occurring word pair and corresponding improved method of word co-occurrence analysis. An application and evaluation of this proposed method in the library and information science is also discussed, which includes how to get the expanded effective keywords, how to calculate the weight of keywords and their relations, and how to abstract the hierarchical structures and other relations such as synonyms and etc. A visualization tool and a prototype search system are designed for browsing the term taxonomy identified. Finally, we report the experiment of evaluation and comparison. The experiment results prove that this proposed method in helping users doing semantic searches and expanding their searching results is effective and can meet the requirement of some specific domains.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  • Benz, D., Hotho, A., & Stumme, G. (2010). Semantics made by you and me: Self-emerging ontologies can capture the diversity of shared knowledge. In Proceedings of the 2nd web science conference (WebSci10), Raleigh, NC, USA.

  • Bordag, S. (2008). A comparison of co-occurrence and similarity measures as simulations of context. In Computational linguistics and intelligent text processing (pp. 52–63). Berlin: Springer.

  • Börner, K., Chen, C., & Boyack, K. W. (2003). Visualizing knowledge domains. Annual Review of Information Science and Technology, 37(1), 179–255.

    Article  Google Scholar 

  • Buitelaar, P., Cimiano, P., Haase, P., & Sintek, M. (2009). Towards linguistically grounded ontologies. In The semantic web: research and applications (pp. 111–125). Berlin: Springer.

  • Callon, M., Courtial, J. P., & Laville, F. (1991). Co-word analysis as a tool for describing the network of interactions between basic and technological research: The case of polymer chemsitry. Scientometrics, 22(1), 155–205.

    Article  Google Scholar 

  • Chen, H. P., He, L., Chen, B., & Gu, J. G. (2009). Design and implementation of ontology generator based on relational database. Computer Engineering, 35(5), 34–36.

    Google Scholar 

  • Choi, S. S., Cha, S. H., & Tappert, C. C. (2010). A survey of binary similarity and distance measures. Journal of Systemics, Cybernetics and Informatics, 8(1), 43–48.

    Google Scholar 

  • Cobo, M. J., López-Herrera, A. G., Herrer-Viedma, E., & Herrera, F. (2012). SciMAT: A new science mapping analysis software tool. Journal of the American Society for Information Science and Technology, 63(8), 1609–1630.

    Article  Google Scholar 

  • Dellschaft, K., & Staab, S. (2006). On how to perform a gold standard based evaluation of ontology learning. The Semantic Web-ISWC 2006 (pp. 228–241). Heidelberg: Springer.

    Chapter  Google Scholar 

  • Doyle, L. B. (1962). Indexing and abstracting by association. American Documentation, 13(4), 378–390.

    Article  MathSciNet  Google Scholar 

  • Eck, N. J. V., & Waltman, L. (2010). Software survey: VOSviewer, a computer program for bibliometric mapping. Scientometrics, 84(2), 523–538.

    Article  Google Scholar 

  • Egghe, L., & Leydesdorff, L. (2009). The relation between Pearson’s correlation coefficient r and Salton’s cosine measure. Journal of the American Society for Information Science and Technology, 60(5), 1027–1036.

    Article  Google Scholar 

  • Fellbaum, C. (Ed.). (1998). WordNet: An electronic lexical database. Cambridge, MA: MIT Press.

  • Geng, Q., & Geng, C. (2006). Concept extraction in automatic ontology construction using words co-occurrence. New Technology of Library and Information Service, 22(2), 43–45.

    Google Scholar 

  • Gillum, T. L. (1964). Compiling a Technical Thesaurus. Journal of Chemical Documentation, 4(1), 29–32.

    Article  Google Scholar 

  • Hadzic, M., & Chang, E. (2005). Ontology-based support for human disease study. In proceedings of the 38th annual hawaii international conference on system sciences, HICSS’05, IEEE.

  • Jones, K. S. (1972). A statistical interpretation of term specificity and its application in retrieval. Journal of documentation, 28(1), 11–21.

    Article  Google Scholar 

  • Jung, Y., Ryu, J., Kim, K. M., & Myaeng, S. H. (2010). Automatic construction of a large scale situation ontology by mining how to instructions from the web. Web Semantics: Science, Services and Agents on the World Wide Web, 8(2), 110–124.

    Article  Google Scholar 

  • Labrou, Y., Stergiou, S., Adler, B. T., Marvit, D. L., & Reinhardt, A. (2012). U.S. Patent no. 8,280,886. Washington, DC: U.S. Patent and Trademark Office.

  • Lim, E. H., Tam, H. W., Wong, S. W., Liu, J. N., & Lee, R. S. (2009). Collaborative content and user-based web ontology learning system. In Fuzzy systems, 2009. FUZZ-IEEE 2009. IEEE international conference on (pp. 1050–1055). IEEE.

  • López-Herrera, A. G., Cobo, M. J., Herrera-Viedma, E., & Herrera, F. (2010). A bibliometric study about the research based on hybrid-dating the fuzzy logic field and the other computational intelligent techniques: A visual approach. International Journal of Hybrid Intelligent Systems, 7(1), 17–32.

    MATH  Google Scholar 

  • Lu, X. B., Meng, X., & Zhang, J. (2012). Visualization of hot topics in social tagging based on co-words analysis method. Journal of the China Society for Scientific and Technical Information, 31(2), 204–212.

    MathSciNet  Google Scholar 

  • Martínez, M. A., Cobo, M. J., Herrera, M., & Herrera-Viedma, E. (2014). Analyzing the scientific evolution of social work discipline using science mapping. research on social work practice, 1049731514522101.

  • Miller, G. A. (1995). WordNet: a lexical database for English. Communications of the ACM, 38(11), 39–41.

    Article  Google Scholar 

  • Morita, T., Shigeta, Y., Sugiura, N., Fukuta, N., Izumi, N., & Yamaguchi, T. (2004). DODDLE-OWL: On-the-fly ontology construction with ontology quality management. In Proceedings of the 3rd international semantic web conference (ISWC).

  • Murgado-Armenteros, E. M., Gutiérrez-Salcedo, M., Torres-Ruiz, F. J., & Cobo, M. J. (2015). Analysing the conceptual evolution of qualitative marketing research through science mapping analysis. Scientometrics, 102(1), 519–557.

  • Nickerson, R. C., Varshney, U., & Muntermann, J. (2013). A method for taxonomy development and its application in information systems. European Journal of Information Systems, 22(3), 336–359.

    Article  Google Scholar 

  • Peat, H. J., & Willett, P. (1991). The limitations of term co-occurrence data for query expansion in document retrieval systems. Journal of the American Society for Information Science, 42(5), 378–383.

    Article  Google Scholar 

  • Qiu, J. G., Zhang, B., Wang, H. L., & Zhang, K. (2012). Research on regularity in adjacent co-occurrence between semantic relationsin objective knowledge system. Journal of the China Society for Scientific and Technical Information, 31(2), 126–135.

    Google Scholar 

  • Salton, G., & McGill, M. J. (1983). Introduction to modern information retrieval. New York: McGraw-Hill.

    MATH  Google Scholar 

  • Sheng, L., & Li, C. (2009). English and Chinese languages as weighted complex networks. Physica A: Statistical Mechanics and its Applications, 388(12), 2561–2570.

    Article  Google Scholar 

  • Soergel, D. (1974a). Automatic and semi-automatic methods as an aid in construction of indexing languages and thesauri. International Classification, 1(1), 34–38.

    Google Scholar 

  • Soergel, D. (1974b). Indexing languages and thesauri: construction and maintenance. Los Angeles, CA: Melville Publishing Company.

  • Van Eck, N. J., & Waltman, L. (2009). How to normalize cooccurrence data? An analysis of some well-known similarity measures. Journal of the American Society for Information Science and Technology, 60(8), 1635–1651.

    Article  Google Scholar 

  • Wang, Y. F., Song, S., & Miao, L. (2006). Application study of co-occurrence analysis in knowledge service. New Technology of Library and Information Service, 22(4), 29–34.

    Google Scholar 

  • Xu, S., Qiao, X. D., Zhu, L. J., Zhang, Y. L., & Xue, C. X. (2012). A novel approach for co-occurrence clustering analysis: maximal frequent itemset mining. Journal of the China Society for Scientific and Technical Information, 31(2), 143–150.

    Google Scholar 

  • Yu, C. M., & Zhou, D. (2010). The complexity analysis of the emotional word co-occurrence network. Journal of the China Society for Scientific and Technical Information, 29(5), 906–914.

    Google Scholar 

  • Zhang, Y. F., & Cai, J. J. (2011). Research on the user interest ontology learning based on web mining technology. Journal of the China Society for Scientific and Technical Information, 30(4), 380–386.

    Google Scholar 

  • Zhang, Z. L., Zhang, Z. Q., & Li, X. Y. (2011). Co-occurrence analysis between research institutes and keywords based on 2-mode network. Journal of the China Society for Scientific and Technical Information, 30(12), 1249–1260.

    Google Scholar 

  • Zhong, M. J., Wan, C. X., & Liu, A. H. (2009). Question answering system based on frequently asked questions using co-occurrence word model. Journal of the China Society for Scientific and Technical Information, 28(2), 242–247.

    Google Scholar 

Download references

Acknowledgments

This work has been supported by Social Science Foundation of Jiangsu Province 2014SJB144 (2014), and Chinese National Natural Science Foundation 71103081 (2011).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shuqing Li.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, S., Sun, Y. & Soergel, D. A new method for automatically constructing domain-oriented term taxonomy based on weighted word co-occurrence analysis. Scientometrics 103, 1023–1042 (2015). https://doi.org/10.1007/s11192-015-1571-0

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-015-1571-0

Keywords

Navigation