Abstract
The paper presents an idea to combine variety of Natural Language Processing techniques with different classification methods as a tool for automatic prediction mechanism of related phenomenon. Different types of preprocessing techniques are used and verified, in order to find the best set of them. It is assumed that such approach allows to recognize the phenomenon which is related to the text. Research uses the real input from the big data systems. The news website articles are the source of raw text data. The paper proposes the new, promising ways of automatic data and content mining methods for the big data systems. The presented accuracy results are much better than average classification for sentimental analysis done by the human.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aggarwal, C.C., Zhai, C.X.: Mining Text Data, pp. 12–14. Springer US (2012)
Chandrasekar, R., Srinivas, B.: Automatic induction of rules for text simplification. University of Pennsylvania Institute for Research in Cognitive Science Technical Report No. IRCS-96-30 (1996)
Colas, F., Brazdil, P.: Comparison of svm and some older classification algorithms in text classification tasks. In: Bramer, M. (ed.) Artificial Intelligence in Theory and Practice. IFIP, vol. 217, pp. 169–178. Springer, Boston (2006)
Definition of word lammatize (2014), http://www.thefreedictionary.com/lemmatise
Esuli, A., Baccianella, S., Sebastiani, F.: Sentiwordnet3.0: An enhanced lexical resource for sentiment analysis and opinion mining (2010)
Frank, E., Witten, I.H., Hall, M.A.: Data Mining: Practical Machine Learning Tools and Techniques, 3rd edn. Morgan Kaufmann (2011)
Gabrilovich, E., Markovitch, S.: Text categorization with many redundant features: Using aggressive feature selection to make svms competitive with c4.5. In: ICML 2004, pp. 321–328 (2004)
Kao, A., Poteet, S.R.: Natural Language Processing and Text Mining, p. 12. Springer, London (2007)
Beigman Klebanov, B., Knight, K., Marcu, D.: Text simplification for information-seeking applications. In: Meersman, R., Tari, Z. (eds.) OTM 2004. LNCS, vol. 3290, pp. 735–747. Springer, Heidelberg (2004)
Konchady, M.: Text Mining Application Programming. Cengage Learning (2006)
Liu, H., Christiansen, T.: Biolemmatizer: A lemmatization tool for morphological processing of biomedical text. Journal of Biomedical Semantics 2012 (2012)
Martin, J., Jurafsky, D.: Speech and language processing: An introduction to natural language processing, computational linguistics and speech recognition, 2nd edn. Prentice Hall. (2008)
Miner, G.: Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications, 1st edn. Academic Press (2012)
Nltk tokenization methods (2014), https://nltk.googlecode.com/svn/trunk/doc/howto/tokenize.html
Pang, B., Lee, L.: Thumbs up? sentiment classification using machine learning techniques. In: Proceedings of the ACL 2002 Conference on Empirical Methods in Natural Language Processing, vol. 10, pp. 79–86 (2002)
Pimienta, D., Prado, D., Blanco, A.: Twelve years of measuring linguistic diversity in the internet. UNESCO (2009)
Sober, M.M., Soria, O.E., Guerrero, J.D.M.: Information Science Reference. In: Handbook of Research on Machine Learning Applications and Trends: Algorithms, Methods, and Techniques, ch. 14, pp. 302–324 (2009)
Strapparava, C., Valitutti, A.: Wordnet-affect: an affective extension of wordnet. In: Proceedings of the 4th International Conference on Language Resources and Evaluation (2004)
Cha, S.-H., Ahmed, B., Charles, T.: Language identification from text using n-gram based cumulative frequency addition. Proceedings of Student/Faculty Research Day, CSIS, Pace University (2004)
Q-Success. Usage of content languages for websites (2014)
Vatanen, T., Vyrynen, J.J., Virpioja, S.: Language identification of short text segments with n-gram models. LREC (2010)
Wordnet (2014), http://wordnetweb.princeton.edu
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Horecki, K., Mazurkiewicz, J. (2015). Natural Language Processing Methods Used for Automatic Prediction Mechanism of Related Phenomenon. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L., Zurada, J. (eds) Artificial Intelligence and Soft Computing. ICAISC 2015. Lecture Notes in Computer Science(), vol 9120. Springer, Cham. https://doi.org/10.1007/978-3-319-19369-4_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-19369-4_2
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-19368-7
Online ISBN: 978-3-319-19369-4
eBook Packages: Computer ScienceComputer Science (R0)