Abstract
With the rapid popularization of the Internet and the multimedia that be deemed to a new information transmission mode, people can not only get the information you want easily, but also post the information that you have in the world. At the same time, with the introduction of a variety of tablet PCs, smart phones and other network terminals, and the emergence of a variety of social networks, greatly accelerated the pace of information on the internet. People can update a variety of text, pictures, video and other data in a variety of applications every day. There is data show that the Internet has an exponential level of information data and news or media company will typically see hundreds and thousands of submissions every day, people have been in a very expansive information time. In the face of such huge information resources, how to manage it effectively, make people get the target information more convenient and fast, has become a hot research topic. And text classification technology in text information mining is effective to solve this problem. We mainly study the mobile text classification technology based on the maximum entropy model and implement the automatic classification system of texts in cloud computing, and through technical improvements, for a large number of documents in the network, given technical solutions in mobile environment. This paper introduces the text classification methods and features of the maximum entropy model with improved information gain selection method and the pretreatment method and the MapReduce programming method, the experimental results have a good accuracy and recall, the classification of large amounts of text, meeting the requirements of practical application.
Similar content being viewed by others
References
Basu T, Murthy CA (2012) Effective text classification by a supervised feature selection approach. Proc 12th Int Conf Data Mining Workshops (ICDMW). IEEE. 2012:918–925
Berger AL, Della Pietra SA, Della Pietra VJ (1996) A maximum entropy approach to natural language processing[J]. Comput Linguist 22(1):38–73
Chen K, Zheng W (2009) Cloud computing: an example of the system and the study of the present situation. J Soft Ware 20(5):1337–1348
Fei H, Kang S (2005) Study on word frequency statistics based on Chinese [J]. Comput Eng Appl 41(7):67–68
Gu B, Sheng VS, Tay KY, Romano W, Li S (2014) Incremental support vector learning for ordinal regression. IEEE Trans Neural Netw Learn Syst 26(7):1403–1416
Gu B, Sheng VS, Wang Z, Ho D, Osman S, Li S (2015) Incremental learning for ν-support vector regression. Neural Netw 67:140–150
Jiang J (2010) Feature extraction and feature weighting in text classification [D]. Chongqing University, China
Li R (2005) Text classification and related technology research [D]. FuDan University, China
Li R, Wang J, Chen X, Tao X, Hu Y (2005) Using the maximum entropy model for Chinese text classification[J]. Comput Res Dev 01:94–101
Li J, Zhu Q, Li P (2005) A text categorization based on maximum entropy model[A]. China Chinese information society, information retrieval and information content security professional committee, the second national information retrieval and content security academic conference(NCIRCS-2005) proceedings[C], China
Peng X (2012) Naive Bayesian text classification research and implementation in cloud computing environment. Huazhong University of Science and Technology, China
Shang W (2007) Text classification and related technology research [D]. Beijing JiaoTong University, China
Song F (2004) Research on some basic problems of automatic text classification [D]. Nanjing University of Science and Technology, China
Wang J (2000) Research on web text mining technology. Comput Res Dev 37(5):513–520
Xue D (2004) Research on key issues in automatic classification of chinese text (: Bachelor’s degree thesis). Tsinghua University, China, Beijing
Yin C (2014) Towards accurate node-based detection of P2P Botnets. Sci World J 2014:425491
Yin C, Zou M, Iko D, Wang J (2013) Botnet detection based on correlation of malicious behaviors. Int J Hybrid Inf Technol 6(6):291–300
Zhang M (2005) The research and improvement of bayes text classifier [D]. Taiyuan University of Technology, China
Zhang Q, Zhu L, Zhang Y (2008) Overview of Chinese word segmentation algorithm [J]. Inf Explor 11:53–56
Acknowledgments
Foundation item: This work was funded by the National Natural Science Foundation of China (No.61373134). It was also supported by the Priority Academic Program Development of Jiangsu Higher Education Institutions (PAPD), Jiangsu Key Laboratory of Meteorological Observation and Information Processing (No.KDXS1105) and Jiangsu Collaborative Innovation Center on Atmospheric Environment and Equipment Technology (CICAEET).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Yin, C., Xi, J. Maximum entropy model for mobile text classification in cloud computing using improved information gain algorithm. Multimed Tools Appl 76, 16875–16891 (2017). https://doi.org/10.1007/s11042-016-3545-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-016-3545-5