Semantic classification method for network Tibetan corpus

Xu, Gui-Xian; Wang, Chang-Zhi; Wang, Li-Hui; Zhou, Yu-Hong; Li, Wei-Kang; Xu, Hao; Huang, Qing

doi:10.1007/s10586-017-0742-6

Semantic classification method for network Tibetan corpus

Published: 19 January 2017

Volume 20, pages 155–165, (2017)
Cite this article

Cluster Computing Aims and scope Submit manuscript

Gui-Xian Xu¹,
Chang-Zhi Wang¹,
Li-Hui Wang²,
Yu-Hong Zhou³,
Wei-Kang Li¹,
Hao Xu¹ &
…
Qing Huang¹

288 Accesses
Explore all metrics

Abstract

Tibetan web pages appear enormously. It is meaningful that the information processing technology is utilized to find the useful knowledge from the Tibetan web information. Tibetan semantic ontology can enrich the Tibetan digital resource and is helpful to improve the information processing performance. In this paper, semantic classification of Tibetan network corpus is studied. Firstly Tibetan web pages are collected. Secondly preprocessing is conducted to extract the useful information from Web pages. Thirdly the word segmentation and text representation are introduced. Finally the text similarity classification algorithm is proposed to classify the text. During the experiment, the comparison between semantic classification and non semantic classification is conducted. The results show that the semantic classification performance is obviously superior to non semantic classification. This means that making full use of ontology semantic relationship can greatly enhance the classification accuracy. The research is useful and helpful to the study of Tibetan semantic information processing.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Feng, X.: Analysis of computer information processing technology under the background of ‘big data’. Comput. CD Softw. Appl. 16(05), 105–107 (2014)
Google Scholar
Min, H., Wu, L., Wu, D.: Average multinomial naive Bayesian text classification based on MapReduce. Appl. Res. Comput. 32(01), 115–117 (2016)
MathSciNet Google Scholar
Huo, S.: Realization of Chinese text classification by using BP neural network. Comput. Era 32(11), 58–61 (2015)
MathSciNet Google Scholar
Wang, J.: Based on semantic similarity web text classification research. Res. Libr. Sci. 9, 65–65 (2012)
Google Scholar
Leone, A., Distante, C.: Shadow detecting for moving objects based on texture analysis. Pattern Recogn. 40(2), 1222–1233 (2007)
Article MATH Google Scholar
Agirree, R.: G.A proposal for word sense disambiguation using conceptual distance. In: Proceedings of International Conference on Recent Advances in Natural Language Processing, pp. 258–264 (1995)
Che, W., Liu, T., Qin, B.: Facing the dual statement for retrieval of Chinese Sentence Similarity Computing. In: Proceedings of the Seventh National Conference on Computational Linguistics, pp. 81–88 (2003)
Liu, Q., Li, S.: The lexical semantic similarity calculation based on HowNet. In: Proceedings of the third session of Chinese lexical semantic symposium, pp. 59–76 (2002)
Li, S.: Study on the relevancy between sentences based on semantic computation. Comput. Eng. Appl. 75–78 (2002)
Batsakis, S., Petrakis, E.G.M., Milios, E., et al.: Improving the performance of focused web crawlers. Data Knowl. Eng. 68(10), 926–945 (2009)
Article Google Scholar
Li-wei, S.U.N., Guo-hui, H.E., Li-fa, W.U.: Research on the web Crawler. Comput. Knowl. Technol. 6(15), 4112–4115 (2010)
Google Scholar
Hadrien, B., Gupta, S.K., Mohania, M.K., et al.: A Data-Mining Approach for Optimizing Performance of an Incremental Crawler. In: 2003 IEEE/WIC International Conference on Web Intelligence (WI 2003), pp. 610–615 (2003)
Chen, J.: Research on of Chinese problem in Nutch. Modern Comput. 7, 60–62 (2009)
Google Scholar
Du, J.: The research and improvement of Chinese segmentation in Nutch. Softw. guide 10(6), 19–20 (2011)
Google Scholar
Diligenti, M., Coetzee, F., Lawrence, S., et al.: Focused Crawling using context graphs. In: International Conference on Very Large Databases, pp. 527–534 (2002)
Broder, A.Z., Glassman, S.C., Manasse, M.S., Zweig, G.: Syntactic clustering of the web. Comput. Netw. 29(1157–1166), 8–13 (1997)
Google Scholar
Charikar, M.: Similarity estimation techniques from rounding algorithms. In: Proceedings of 34th Annual ACM Symposium on Theory of Computing, (Montreal, Quebec, Canada, May 19–21, 2002), pp. 380–388 (2002)
Gu, Y., Tian, W.: Extraction of information from web pages based on extended DOM tree. Comput. Sci. 36(11), 235–237 (2009)
Google Scholar
Wang, J., Lochovsky, F.H.: Date-rich section extraction from HTML pages. In: Proceeding of the Third International Conference on Web Information Systems Engineering (Workshops). IEEE Computer Society, Singapore 20(2):313–32 (2002)
Deng, C., Yu, S., Wen, J., et al.: Extracting Content Structure for Web Pages Based on Visual Representation. In: Proceeding of the 6th Asia Pacific Web conference, pp. 4–7 (2003)
Xiang, C., Yu, W.: A Template-Based Tibetan Web Text Information Extraction Method. In: 2011 4th International Conference Intelligent, pp. 218–221
William, W., Cohen, W.F.: Learning page-independent heuristics for extracting data from Web pages. Comput. Netw. 31(11–16), 1641–1652 (1999)
Google Scholar
Lin, Z., Beijun, S.: Statistics-based automatic web news text extraction. Comput. Appl. Softw. 12, 232–235 (2010)
Google Scholar
Chen, Y., Li, B., Yu, S., Lan, C.: An automatic Tibetan segmentation scheme based on case-auxiliary words and continuous features. Appl. Linguist. 11(01), 75–82 (2003)
Google Scholar
Liu, H., Nuo, M., Zhao, W., Wu, J., He, Yeping: SegT: a practical Tibetan word segmentation system. J. Chin. Inf. Process. 26(01), 97–103 (2012)
Google Scholar
Jia, H., Li, Y.: Design and implementation of Tibetan text classifier. Guide Sci-tech Mag. 17(12), 32–33 (2010)
Google Scholar
Jia, H.: Tibetan text classified based on KNN. J. Northwest Univ. Natl. (Nat. Sci.) 31(03), 27–32 (2011)
Google Scholar
Xu, G., Xiang, C., Yu, W., Zhao, X., Yang, G.: Automatic text classification of Tibetan web pages based on column. J. Chin. Inf. Process. 25(4), 20–23 (2011)
Google Scholar
Tao, J., Jing, J., Yu-gang, D., Ailin, L.: Research on Tibetan public opinion platform of cloud analysis system. Netinfo Secur. 13(09), 92–94 (2014)
Google Scholar
Jia, H., Liu, X., Yu, H.: Research of Feature Methods Based on Part of Speech in Tibetan Documents Classification. In: CCF NCSC 2011-The second session of the National Conference on Service Computing, pp. 93–97 (2007)
Li, H., Yu, H.: Tibetan text sentiment classification system design. Sci. Tech. Inf. Gansu 40(01), 107–108 (2011)
Google Scholar
Renqing, N., Su, Y., Sun, Y.: Design and implementation of Tibetan bad text recognition system based on Maximum Entropy Model [J]. Tibet Sci. Technol. 38(03), 77–78 (2014)
Google Scholar
Huang, X.T.: Research on semantic Web text classification based on ontology. Library 3(3), 47–49 (2009)
MathSciNet Google Scholar
Tsytsarau, M., Palpanas, T.: Survey on mining subjective data on the web. Data Min. Knowl. Discov. 24(3), 478–514 (2012)
Article MATH Google Scholar
Vapnik, V.: Statistical Learning Theory. Wiley, New York (1998)
MATH Google Scholar
Liao, K., Yang, B.: Similarity computing of documents based on weighted semantic network. J. Intell. 31(7), 182–186 (2012)
Google Scholar
Huang, C., Yin, J., Hou, F.: A combination of word semantic information and TF-IDF method of text similarity measure method. Chin. J. Comput. 34(5), 856–864 (2011)
Article Google Scholar
Hammer, J., Molina, H., Cho, J.: Extracting Semistructured Information from the Web, pp. 23–24. Department of Computer Science, Stanford University, Stanford (1997)
Google Scholar
Zh, Z., Li, J.: A preprocessing framework and approach for web applications. Web Eng. 12(3), 175–181 (2004)
Google Scholar
Yang, L., Geng, X., Liao, H.: A web sentiment analysis method on fuzzy clustering for mobile social media users. Eurasip J. Wirel. Commun. Netw. 2016(1), 1–13 (2016)
Article Google Scholar
Yang, Li, Geng, Xinyu, Cao, X.: A novel knowledge representation model based on factor state space. Optik 127(12), 5141–5147 (2016)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Information Engineering College, Minzu University of China, Beijing, China
Gui-Xian Xu, Chang-Zhi Wang, Wei-Kang Li, Hao Xu & Qing Huang
School of Electronic Engineering and Computer Science, Peking University, Beijing, China
Li-Hui Wang
College of Software Engineering, Zhejiang University, Hangzhou, Zhejiang Province, China
Yu-Hong Zhou

Authors

Gui-Xian Xu
View author publications
You can also search for this author inPubMed Google Scholar
Chang-Zhi Wang
View author publications
You can also search for this author inPubMed Google Scholar
Li-Hui Wang
View author publications
You can also search for this author inPubMed Google Scholar
Yu-Hong Zhou
View author publications
You can also search for this author inPubMed Google Scholar
Wei-Kang Li
View author publications
You can also search for this author inPubMed Google Scholar
Hao Xu
View author publications
You can also search for this author inPubMed Google Scholar
Qing Huang
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Gui-Xian Xu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xu, GX., Wang, CZ., Wang, LH. et al. Semantic classification method for network Tibetan corpus. Cluster Comput 20, 155–165 (2017). https://doi.org/10.1007/s10586-017-0742-6

Download citation

Received: 05 October 2016
Revised: 02 January 2017
Accepted: 09 January 2017
Published: 19 January 2017
Issue Date: March 2017
DOI: https://doi.org/10.1007/s10586-017-0742-6

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Semantic classification method for network Tibetan corpus

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Research on Web Page Classification Method Based on Query Log

Web Document Categorization Using Knowledge Graph and Semantic Textual Topic Detection

Domain Ontology Graph Approach Using Markov Clustering Algorithm for Text Classification

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Semantic classification method for network Tibetan corpus

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Research on Web Page Classification Method Based on Query Log

Web Document Categorization Using Knowledge Graph and Semantic Textual Topic Detection

Domain Ontology Graph Approach Using Markov Clustering Algorithm for Text Classification

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now