Abstract
To address the need for researching the security of Chinese and Tibetan networks, the classification of public opinion of Chinese and Tibetan texts is proposed. First, web pages are collected. Second, preprocessing is conducted to extract the useful information from web pages. Third, a table of the Chinese and Tibetan public opinion key words is built. Finally, text similarity calculation is proposed to classify the text according to the table of public opinion words. A Chinese–Tibetan text-level alignment approach that is based on Chinese and Tibetan translation dictionary is proposed to match word frequency and position. Furthermore, sentence-level alignment algorithm is studied. The alignment performance is related to the Chinese and Tibetan translation dictionary. Text classification of public opinion and Chinese–Tibetan text alignment system is developed. After public opinion classification of Chinese text, the alignment software can discover the most similar Tibetan text and present it to the user. This research can effectively contribute to identifying Chinese and Tibetan public opinion text and is meaningful for information retrieval, text clustering, and Chinese and Tibetan machine translation.
Similar content being viewed by others
References
Dong, J.F.: Research of internet public opinion pre-warning on emergent event based on web mining. J. Mod. Inf. 34(2), 43–47 (2014)
Hao, Y.Z., Zheng, Q.H., Chen, Y.P., Yan, C.X.: Recognition of abnormal behavior based on data of public opinion on the web. J. Comput. Res. Dev. 53(3), 611–620 (2016)
Mu, J.G., Liu, L.H., Lian, S.X.: A historical retrospection to the research on network public opinion in China. J. Ningbo Radio Tv Univ. 4, 8–11 (2008)
Yu, X., Wu, J., Hong, J.L.: Research and realization of dictionary-based Chinese–Tibetan sentence alignment. J. Chin. Inf. Process. 25(4), 57–62 (2011)
Li, Z.J.: Internal public opinions monitor system based on topic detection and clustering. Compu. Sci. 39(12), 237–240 (2012)
Zhang, X.M., Li, Z.J., Chao, W.H.: Research of automatic topic detection based on incremental clustering. J. Softw. 23(6), 1578–1587 (2012)
Li, Y.Q., Sun, L.: Hot-word detection for internet public sentiment. J. Chin. Inf. Process. 25(1), 40–48 (2011)
Jia, Z.Y., He, Q., Zhang, H.J., Li, J.Y., et al.: A news event detection and tracking algorithm based on dynamic evolution model. J. Comput. Res. Dev. 41(7), 1273–1280 (2004)
Zhao, H., Zhao, T.J., Zhang, S., et al.: Topic detection research based on content analysis. J. HarBin Inst. Technol. 38(10), 1740–1743 (2006)
Yu, M.Q., Luo, W.H., Xu, H.B., Bai, S.: Research on hierarchical topic detection in topic detection and tracking. J. Comput. Res. Dev. 43(3), 489–495 (2006)
Luo, W.H., Yu, M.Q., Xu, H.B., et al.: The study of topic detection based on algorithm of division and multi-level clustering with multi-strategy optimization. J. Chin. Inf. Process. 20(1), 29–36 (2006)
Li, Y., Cao, X., Li, J.: A new cyber security risk evaluation method for oil and gas SCADA based on factor state space. Chaos Solitons Fractals 89, 203–209 (2015)
Luhn, H.P.: The automatic creation of literature abstracts. IBM J. Res. Dev. 2(2), 159–165 (2010)
Hou, H.Q.: A brief discussion on the development trend of classification. Inf. Sci. 1, 58–63 (1981)
Zhang, F.: Information Organization Science, pp. 411–412. Science Press, New York (2005)
Jin, Z., Lin, H.F., Zhao, J.: Study on topic tracking and tendency classification based on HowNet. J. China Soc. Sci. Tech. Inf. 24(5), 555–561 (2005)
Hou, S.: The research of text categorization for situation analysis of public opinion in internet. National University of Defense Technology (2009)
Liu, M.: New sentiment word detection in web texts and key sentiment sentence extraction. Zhengzhou University (2015)
Hua, Q.C.R.: Automatic alignment strategy of Tibetan–Chinese bilingual sentences. J. Qinghai Norm. Univ. (Nat. Sci.) 26(4), 39–43 (2010)
An, J.C.R., Wang, L.L.: Chinese–Tibetan bilingual sentence alignment algorithm. Microprocessor 32(3), 55–57 (2011)
Cai, Z.T., Suo, N.C.R.: Research on the alignment method of Chinese–Tibetan sentences based on the combination of anchor point information and sentence length. J. Minor. Teach. Coll. Qinghai Teach. Univ. 27(01), 91–93 (2016)
Gale, W.A., Church, K.W.: A program for aligning sentences in bilingual corpora. Meet. Assoc. Comput. Linguist. 19, 177–184 (1991)
Brown, P.F., Lai, J.C., Mercer, R.L.: Aligning sentences in parallel corpora. In: Meeting on Association for Computational Linguistics, pp. 169–176 (1991)
Wu, D.: Aligning a parallel English–Chinese corpus statistically with lexical criteria. Comput. Sci. 12, 80–87 (2012)
Liu, X., Zhou, M., Zhu, S.H., Huang, C.N.: Aligning sentences in parallel corpora using self-extracted lexical information. Chin. J. Comput. 21, 151–158 (1998)
Yang, L., Geng, X., Liao, H.: A web sentiment analysis method on fuzzy clustering for mobile social media users. Eurasip J. Wirel. Commun. Netw. 2016(1), 1–13 (2016)
Hao, W.N., Feng, B., Chen, G., et al.: Document vector space model construction based on domain ontology. Appl. Res. Comput. 30(3), 764–767 (2013)
Xu, X.U., Zhang, W.Z., Zhang, H.L., Fang, B.X.: WAN-based distributed web crawling. J. Softw. 21(5), 1067–1082 (2010)
Zhang, Y.F.: Reseach on the analysis of DOM4j technology. Mod. Comput. 17, 39–42 (2011)
Zhu, J., Tianrui, L.I.: Research on Tibetan stop words selection and automatic processing method. J. Chin. Inf. Process. 29(2), 125–132 (2015)
Yang, L., Geng, X., Cao, X.: A novel knowledge representation model based on factor state space. Optik - Int. J. Light Elect. Opt. 127(12), 5141–5147 (2016)
Cai, R.L.J.: Research and implementation on the Tibetan and Chinese automatic sentence alignment system. Tibet University (2013)
Yang, S., Lou, X.Y.: Research on sentence similarity based on VSM with semantic of word. J. Chengdu Univ. Inf. Technol. 27(3), 239–242 (2012)
Acknowledgements
This work was supported by the Beijing Social Science Foundation (No. 14WYB040), First class university, First class discipline construction funds of Minzu University of China (No. 2017MDYL12), the National Key Technology Research and Development Program of the Ministry of Science and Technology of China (No. 2014BAK10B03), and the National Natural Science Foundation of China (No. 61309012, No. 61331013).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Xu, G., Yao, H., Wu, D. et al. Public opinion classification and text alignment based on Chinese and Tibetan corpus. Cluster Comput 22 (Suppl 4), 10263–10274 (2019). https://doi.org/10.1007/s10586-017-1267-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-017-1267-8