Natural Language Processing Methods Used for Automatic Prediction Mechanism of Related Phenomenon

Horecki, Krystian; Mazurkiewicz, Jacek

doi:10.1007/978-3-319-19369-4_2

Krystian Horecki¹⁰ &
Jacek Mazurkiewicz¹¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9120))

Included in the following conference series:

International Conference on Artificial Intelligence and Soft Computing

1926 Accesses
4 Citations

Abstract

The paper presents an idea to combine variety of Natural Language Processing techniques with different classification methods as a tool for automatic prediction mechanism of related phenomenon. Different types of preprocessing techniques are used and verified, in order to find the best set of them. It is assumed that such approach allows to recognize the phenomenon which is related to the text. Research uses the real input from the big data systems. The news website articles are the source of raw text data. The paper proposes the new, promising ways of automatic data and content mining methods for the big data systems. The presented accuracy results are much better than average classification for sentimental analysis done by the human.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aggarwal, C.C., Zhai, C.X.: Mining Text Data, pp. 12–14. Springer US (2012)
Google Scholar
Chandrasekar, R., Srinivas, B.: Automatic induction of rules for text simplification. University of Pennsylvania Institute for Research in Cognitive Science Technical Report No. IRCS-96-30 (1996)
Google Scholar
Colas, F., Brazdil, P.: Comparison of svm and some older classification algorithms in text classification tasks. In: Bramer, M. (ed.) Artificial Intelligence in Theory and Practice. IFIP, vol. 217, pp. 169–178. Springer, Boston (2006)
Google Scholar
Definition of word lammatize (2014), http://www.thefreedictionary.com/lemmatise
Esuli, A., Baccianella, S., Sebastiani, F.: Sentiwordnet3.0: An enhanced lexical resource for sentiment analysis and opinion mining (2010)
Google Scholar
Frank, E., Witten, I.H., Hall, M.A.: Data Mining: Practical Machine Learning Tools and Techniques, 3rd edn. Morgan Kaufmann (2011)
Google Scholar
Gabrilovich, E., Markovitch, S.: Text categorization with many redundant features: Using aggressive feature selection to make svms competitive with c4.5. In: ICML 2004, pp. 321–328 (2004)
Google Scholar
Kao, A., Poteet, S.R.: Natural Language Processing and Text Mining, p. 12. Springer, London (2007)
Google Scholar
Beigman Klebanov, B., Knight, K., Marcu, D.: Text simplification for information-seeking applications. In: Meersman, R., Tari, Z. (eds.) OTM 2004. LNCS, vol. 3290, pp. 735–747. Springer, Heidelberg (2004)
Google Scholar
Konchady, M.: Text Mining Application Programming. Cengage Learning (2006)
Google Scholar
Liu, H., Christiansen, T.: Biolemmatizer: A lemmatization tool for morphological processing of biomedical text. Journal of Biomedical Semantics 2012 (2012)
Google Scholar
Martin, J., Jurafsky, D.: Speech and language processing: An introduction to natural language processing, computational linguistics and speech recognition, 2nd edn. Prentice Hall. (2008)
Google Scholar
Miner, G.: Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications, 1st edn. Academic Press (2012)
Google Scholar
Nltk tokenization methods (2014), https://nltk.googlecode.com/svn/trunk/doc/howto/tokenize.html
Pang, B., Lee, L.: Thumbs up? sentiment classification using machine learning techniques. In: Proceedings of the ACL 2002 Conference on Empirical Methods in Natural Language Processing, vol. 10, pp. 79–86 (2002)
Google Scholar
Pimienta, D., Prado, D., Blanco, A.: Twelve years of measuring linguistic diversity in the internet. UNESCO (2009)
Google Scholar
Sober, M.M., Soria, O.E., Guerrero, J.D.M.: Information Science Reference. In: Handbook of Research on Machine Learning Applications and Trends: Algorithms, Methods, and Techniques, ch. 14, pp. 302–324 (2009)
Google Scholar
Strapparava, C., Valitutti, A.: Wordnet-affect: an affective extension of wordnet. In: Proceedings of the 4th International Conference on Language Resources and Evaluation (2004)
Google Scholar
Cha, S.-H., Ahmed, B., Charles, T.: Language identification from text using n-gram based cumulative frequency addition. Proceedings of Student/Faculty Research Day, CSIS, Pace University (2004)
Google Scholar
Q-Success. Usage of content languages for websites (2014)
Google Scholar
Vatanen, T., Vyrynen, J.J., Virpioja, S.: Language identification of short text segments with n-gram models. LREC (2010)
Google Scholar
Wordnet (2014), http://wordnetweb.princeton.edu

Download references

Author information

Authors and Affiliations

Nokia Networks, Technology Center Wroclaw, Poland, ul. Strzegomska 36, 53-611, Wroclaw, Poland
Krystian Horecki
Department of Computer Engineering, Wroclaw University of Technology, ul. Wybrzeze Wyspianskiego 27, 50-370, Wroclaw, Poland
Jacek Mazurkiewicz

Authors

Krystian Horecki
View author publications
You can also search for this author in PubMed Google Scholar
Jacek Mazurkiewicz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Krystian Horecki .

Editor information

Editors and Affiliations

Czestochowa University of Technology, Czestochowa, Poland
Leszek Rutkowski
Czestochowa University of Technology, Czestochowa, Poland
Marcin Korytkowski
Czestochova Univ of Technology, Częstochova, Poland
Rafal Scherer
AGH University of Science and Technology, Krakow, Poland
Ryszard Tadeusiewicz
University of California, Berkeley, California, USA
Lotfi A. Zadeh
Louisville, Kentucky, USA
Jacek M. Zurada

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Horecki, K., Mazurkiewicz, J. (2015). Natural Language Processing Methods Used for Automatic Prediction Mechanism of Related Phenomenon. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L., Zurada, J. (eds) Artificial Intelligence and Soft Computing. ICAISC 2015. Lecture Notes in Computer Science(), vol 9120. Springer, Cham. https://doi.org/10.1007/978-3-319-19369-4_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-19369-4_2
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-19368-7
Online ISBN: 978-3-319-19369-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics