Abstract
In recent years, text analysis has become increasingly heated in many fields. And now, majority methods of text analysis are using Word2vec, Naïve Bayes or so on to classify the large number of texts. But for the text itself, not all samples are useful for some high-requirement researches and only use one keywords to get the related sample is definitely not enough. In this paper, we provide a novel model of second text filtering with Chinese Thesauri. It includes roughly 5 steps: sample collecting, thesauri establishment, word-segment algorithm, word-frequency statistics and the calculation of text relevance. Its main purpose is making the sample texts more accurate with the keywords which are input by the user and avoiding the needless time and space waste.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Jing, Y., Crof, W.B.: An Association Thesauri for Information Retrieval (1994)
Mihalcea, R., Corley, C.: Corpus-based and Knowledge-based Measures of Text Semantic Similarity (2006)
Tausczik, Y.R., Pennebaker, J.W.: The Psychological Meaning of Words: LIWC and Computerized Text Analysis Methods (2010)
Scott, S., Matwin, S.: Text Classification Using WordNet Hypernyms (1998)
Roberts, C.W.: Text Analysis for the Social Sciences: Methods for Drawing Statistical Inferences from Texts and Transcript. Lawrence Erlbaum Associates, Mahwah (1997)
Lacity, M.C., Janson, M.A.: Understanding qualitative data: a framework of text analysis methods. J. Manage. Inf. Syst. 11(2), 137–155 (1994)
Stone, P.J.: Thematic text analysis: new agendas for analyzing text content. In: Roberts, C. (ed.) Text Analysis for the Social Sciences. Lawrence Erlbaum Associates, Mahwah (1997)
Lehnert, W., Sundheim, B.: A Performance Evaluation of Text-Analysis Technologies. www.aaai.org
Soergel, D.: Indexing languages and thesauri: construction and maintenance (1974). www.dsoergel.com
Wang, Y.-C., Vandendorpe, J., Evens, M.: Relational thesauri in information retrieval. J. Am. Soc. Inf. Sci. 36(1), 15–27 (1985). America
Larsen, H.L., Yager, R.R.: The use of fuzzy relational thesauri for classificatory problem solving in information retrieval and expert systems. IEEE Trans. Syst. Man Cybern. 23(1), 31–41 (2002)
Budanitsky, A., Hirst, G.: Semantic distance in WordNet: an experimental, application-oriented evaluation of five measures (2001)
Acknowledgements
The research was supported in part by the National Science Foundation of China under No.61672104, 61170209, 61502038,U1509214;Program for New Century Excellent Talents in University No.NCET-13-0676. Key Program of BFSU 2011 Collaborative Innovation Center No.BFSU2011-ZD04.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
About this paper
Cite this paper
Chen, F., Liu, X., Xu, Y., Xu, M., Shi, G. (2017). A Method on Chinese Thesauri. In: Wang, S., Zhou, A. (eds) Collaborate Computing: Networking, Applications and Worksharing. CollaborateCom 2016. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 201. Springer, Cham. https://doi.org/10.1007/978-3-319-59288-6_60
Download citation
DOI: https://doi.org/10.1007/978-3-319-59288-6_60
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-59287-9
Online ISBN: 978-3-319-59288-6
eBook Packages: Computer ScienceComputer Science (R0)