Skip to main content
Log in

Public opinion classification and text alignment based on Chinese and Tibetan corpus

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

To address the need for researching the security of Chinese and Tibetan networks, the classification of public opinion of Chinese and Tibetan texts is proposed. First, web pages are collected. Second, preprocessing is conducted to extract the useful information from web pages. Third, a table of the Chinese and Tibetan public opinion key words is built. Finally, text similarity calculation is proposed to classify the text according to the table of public opinion words. A Chinese–Tibetan text-level alignment approach that is based on Chinese and Tibetan translation dictionary is proposed to match word frequency and position. Furthermore, sentence-level alignment algorithm is studied. The alignment performance is related to the Chinese and Tibetan translation dictionary. Text classification of public opinion and Chinese–Tibetan text alignment system is developed. After public opinion classification of Chinese text, the alignment software can discover the most similar Tibetan text and present it to the user. This research can effectively contribute to identifying Chinese and Tibetan public opinion text and is meaningful for information retrieval, text clustering, and Chinese and Tibetan machine translation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Dong, J.F.: Research of internet public opinion pre-warning on emergent event based on web mining. J. Mod. Inf. 34(2), 43–47 (2014)

    Google Scholar 

  2. Hao, Y.Z., Zheng, Q.H., Chen, Y.P., Yan, C.X.: Recognition of abnormal behavior based on data of public opinion on the web. J. Comput. Res. Dev. 53(3), 611–620 (2016)

    Google Scholar 

  3. Mu, J.G., Liu, L.H., Lian, S.X.: A historical retrospection to the research on network public opinion in China. J. Ningbo Radio Tv Univ. 4, 8–11 (2008)

    Google Scholar 

  4. Yu, X., Wu, J., Hong, J.L.: Research and realization of dictionary-based Chinese–Tibetan sentence alignment. J. Chin. Inf. Process. 25(4), 57–62 (2011)

    Google Scholar 

  5. Li, Z.J.: Internal public opinions monitor system based on topic detection and clustering. Compu. Sci. 39(12), 237–240 (2012)

    Google Scholar 

  6. Zhang, X.M., Li, Z.J., Chao, W.H.: Research of automatic topic detection based on incremental clustering. J. Softw. 23(6), 1578–1587 (2012)

    Article  Google Scholar 

  7. Li, Y.Q., Sun, L.: Hot-word detection for internet public sentiment. J. Chin. Inf. Process. 25(1), 40–48 (2011)

    Google Scholar 

  8. Jia, Z.Y., He, Q., Zhang, H.J., Li, J.Y., et al.: A news event detection and tracking algorithm based on dynamic evolution model. J. Comput. Res. Dev. 41(7), 1273–1280 (2004)

    Google Scholar 

  9. Zhao, H., Zhao, T.J., Zhang, S., et al.: Topic detection research based on content analysis. J. HarBin Inst. Technol. 38(10), 1740–1743 (2006)

    Google Scholar 

  10. Yu, M.Q., Luo, W.H., Xu, H.B., Bai, S.: Research on hierarchical topic detection in topic detection and tracking. J. Comput. Res. Dev. 43(3), 489–495 (2006)

    Article  Google Scholar 

  11. Luo, W.H., Yu, M.Q., Xu, H.B., et al.: The study of topic detection based on algorithm of division and multi-level clustering with multi-strategy optimization. J. Chin. Inf. Process. 20(1), 29–36 (2006)

    Google Scholar 

  12. Li, Y., Cao, X., Li, J.: A new cyber security risk evaluation method for oil and gas SCADA based on factor state space. Chaos Solitons Fractals 89, 203–209 (2015)

    Google Scholar 

  13. Luhn, H.P.: The automatic creation of literature abstracts. IBM J. Res. Dev. 2(2), 159–165 (2010)

    Article  MathSciNet  Google Scholar 

  14. Hou, H.Q.: A brief discussion on the development trend of classification. Inf. Sci. 1, 58–63 (1981)

    Google Scholar 

  15. Zhang, F.: Information Organization Science, pp. 411–412. Science Press, New York (2005)

    Google Scholar 

  16. Jin, Z., Lin, H.F., Zhao, J.: Study on topic tracking and tendency classification based on HowNet. J. China Soc. Sci. Tech. Inf. 24(5), 555–561 (2005)

    Google Scholar 

  17. Hou, S.: The research of text categorization for situation analysis of public opinion in internet. National University of Defense Technology (2009)

  18. Liu, M.: New sentiment word detection in web texts and key sentiment sentence extraction. Zhengzhou University (2015)

  19. Hua, Q.C.R.: Automatic alignment strategy of Tibetan–Chinese bilingual sentences. J. Qinghai Norm. Univ. (Nat. Sci.) 26(4), 39–43 (2010)

    Google Scholar 

  20. An, J.C.R., Wang, L.L.: Chinese–Tibetan bilingual sentence alignment algorithm. Microprocessor 32(3), 55–57 (2011)

    Google Scholar 

  21. Cai, Z.T., Suo, N.C.R.: Research on the alignment method of Chinese–Tibetan sentences based on the combination of anchor point information and sentence length. J. Minor. Teach. Coll. Qinghai Teach. Univ. 27(01), 91–93 (2016)

    Google Scholar 

  22. Gale, W.A., Church, K.W.: A program for aligning sentences in bilingual corpora. Meet. Assoc. Comput. Linguist. 19, 177–184 (1991)

    Google Scholar 

  23. Brown, P.F., Lai, J.C., Mercer, R.L.: Aligning sentences in parallel corpora. In: Meeting on Association for Computational Linguistics, pp. 169–176 (1991)

  24. Wu, D.: Aligning a parallel English–Chinese corpus statistically with lexical criteria. Comput. Sci. 12, 80–87 (2012)

    Google Scholar 

  25. Liu, X., Zhou, M., Zhu, S.H., Huang, C.N.: Aligning sentences in parallel corpora using self-extracted lexical information. Chin. J. Comput. 21, 151–158 (1998)

    Google Scholar 

  26. Yang, L., Geng, X., Liao, H.: A web sentiment analysis method on fuzzy clustering for mobile social media users. Eurasip J. Wirel. Commun. Netw. 2016(1), 1–13 (2016)

    Article  Google Scholar 

  27. Hao, W.N., Feng, B., Chen, G., et al.: Document vector space model construction based on domain ontology. Appl. Res. Comput. 30(3), 764–767 (2013)

    Google Scholar 

  28. Xu, X.U., Zhang, W.Z., Zhang, H.L., Fang, B.X.: WAN-based distributed web crawling. J. Softw. 21(5), 1067–1082 (2010)

    Article  Google Scholar 

  29. Zhang, Y.F.: Reseach on the analysis of DOM4j technology. Mod. Comput. 17, 39–42 (2011)

    Google Scholar 

  30. Zhu, J., Tianrui, L.I.: Research on Tibetan stop words selection and automatic processing method. J. Chin. Inf. Process. 29(2), 125–132 (2015)

    Google Scholar 

  31. Yang, L., Geng, X., Cao, X.: A novel knowledge representation model based on factor state space. Optik - Int. J. Light Elect. Opt. 127(12), 5141–5147 (2016)

    Article  Google Scholar 

  32. Cai, R.L.J.: Research and implementation on the Tibetan and Chinese automatic sentence alignment system. Tibet University (2013)

  33. Yang, S., Lou, X.Y.: Research on sentence similarity based on VSM with semantic of word. J. Chengdu Univ. Inf. Technol. 27(3), 239–242 (2012)

    Google Scholar 

Download references

Acknowledgements

This work was supported by the Beijing Social Science Foundation (No. 14WYB040), First class university, First class discipline construction funds of Minzu University of China (No. 2017MDYL12), the National Key Technology Research and Development Program of the Ministry of Science and Technology of China (No. 2014BAK10B03), and the National Natural Science Foundation of China (No. 61309012, No. 61331013).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guixian Xu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xu, G., Yao, H., Wu, D. et al. Public opinion classification and text alignment based on Chinese and Tibetan corpus. Cluster Comput 22 (Suppl 4), 10263–10274 (2019). https://doi.org/10.1007/s10586-017-1267-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-017-1267-8

Keywords

Navigation