The Comparison of Effects of Relevant-Feature Selection Algorithms on Certain Social-Network Text-Mining Viewpoints

Žižka, Jan; Dařena, František

doi:10.1007/978-3-319-57261-1_35

Jan Žižka¹⁹ &
František Dařena¹⁹

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 573))

Included in the following conference series:

Computer Science On-line Conference

1164 Accesses

Abstract

This research addresses a well-known problem in the area of text mining: The high computational complexity caused by many irrelevant features (terms, words), which may play an appreciable role of noise from the classification point of view and non-linearly rule the time and memory requirements. Using a set of real-world textual documents represented by sentiment related to three selected and extensively tracked Internet sources freely written in English, a group of available algorithms (Gain Ratio, Chi Square, Info Gain, Symmetrical Uncertainty, Winnow, One R, Relief F, Principal Components, SVM, LSA) applied to discovering relevant features was tested with 10,000, 25,000, and 50,000 social-network entries. All the algorithms provided very similar results concerning looking for the relevant features – typically, only the feature significance rank was slightly different. Except for some slower algorithms, the term-preselecting time ranged from seconds to minutes to a couple of hours. However, after using only a relevant fraction of features instead of all of them, the entry length very considerably decreased by several orders of magnitude, particularly for larger data sets having very high dimensionality degree. Despite the extremely strong reduction of the number of words, the classification accuracy remained the same independently on the relevant-feature selection algorithm choice.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 299.00; Price excludes VAT (USA)

Softcover Book: USD 379.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Automated Mining of Relevant N-grams in Relation to Predominant Topics of Text Documents

Integrated Feature Selection Methods Using Metaheuristic Algorithms for Sentiment Analysis

Methods for Optimal Feature Selection for Sentiment Analysis

References

Amazon.com (2016). https://www.amazon.com
Booking.com (2016). https://www.booking.com
Yahoo.com (2016). https://finance.yahoo.com
Blum, A.L., Langley, P.: Selection of relevant features and examples in machine learning. J. Artif. Intell. 97(1–2), 245–271 (1997)
Article MathSciNet MATH Google Scholar
Dessi, N., Pes, B.: Similarity of feature selection methods: an empirical study across data intensive classification tasks. Expert Syst. Appl. 42(10), 4632–4642 (2015)
Article Google Scholar
Yang, Y., Pederson, J.O.: A comparative study on feature selection in text categorization. In Proceedings of the Fourteenth International Conference on Machine Learning, pp. 412–420 (1997)
Google Scholar
Žižka, J., Svoboda, A.: Customers’ opinion mining from extensive amount of textual reviews in relation to induced knowledge growth. J. Acta Univ. Agric. Silvic. Mendelianae Brun. 63, 2229–2237 (2015)
Article Google Scholar
Data mining tools See5 and C5.0. RuleQuest Research (2016). https://www.rulequest.com/see5-info.html
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, New York (1993)
Google Scholar
Bellman, R.E.: Dynamic Programming. Counter Dover Publications (2003)
Google Scholar
Tang, J., Alelyani, S., Liu, H.: Feature selection for classification: a review. In: Aggarwal, C.C. (ed.) Data Classification: Algorithms and Applications, pp. 37–64. CRC Press (2014)
Google Scholar
Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34(1), 1–47 (2002)
Article Google Scholar
Witten, I.H., Frank, E., Hall, M.A.: Data Mining: Practical Machine Learning Tools and Techniques. Data Transformations. Morgan Kaufmann, San Francisco (2011). Chap. 7
Google Scholar
Chikalov, I.: Average Time Complexity of Decision Trees. Intelligent Systems Reference Library, vol. 21. Springer, Heidelberg (2011)
MATH Google Scholar
Dařena, F., Žižka, J.: Interdependence of text mining quality and the input data preprocessing. In: Silhavy, R., Senkerik, R., Oplatkova, Z.K., Prokopova, Z., Silhavy, P. (eds.) Artificial Intelligence Perspectives and Applications. AISC, vol. 347, pp. 141–150. Springer, Cham (2015). doi:10.1007/978-3-319-18476-0_15
Google Scholar
Bifet, A., Holmes, G., Kirkby, R., Pfahringer, B.: MOA: massive online analysis. J. Mach. Learn. Res. 11, 1601–1604 (2010)
Google Scholar

Download references

Acknowledgments

This research was funded by the Czech Science Foundation, grant No. 16-26353S “Sentiment and its Impact on Stock Markets”.

Author information

Authors and Affiliations

Department of Informatics, FBE, Mendel University in Brno, Zemědělská 1, 613 00, Brno, Czech Republic
Jan Žižka & František Dařena

Authors

Jan Žižka
View author publications
You can also search for this author in PubMed Google Scholar
František Dařena
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jan Žižka .

Editor information

Editors and Affiliations

Faculty of Applied Informatics, Tomas Bata University in Zlín, Zlin, Czech Republic
Radek Silhavy
Faculty of Applied Informatics, Tomas Bata University in Zlín, Zlin, Czech Republic
Roman Senkerik
Faculty of Applied Informatics, Tomas Bata University in Zlín, Zlin, Czech Republic
Zuzana Kominkova Oplatkova
Faculty of Applied Informatics, Tomas Bata University in Zlín, Zlin, Czech Republic
Zdenka Prokopova
Faculty of Applied Informatics, Tomas Bata University in Zlín, Zlin, Czech Republic
Petr Silhavy

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Žižka, J., Dařena, F. (2017). The Comparison of Effects of Relevant-Feature Selection Algorithms on Certain Social-Network Text-Mining Viewpoints. In: Silhavy, R., Senkerik, R., Kominkova Oplatkova, Z., Prokopova, Z., Silhavy, P. (eds) Artificial Intelligence Trends in Intelligent Systems. CSOC 2017. Advances in Intelligent Systems and Computing, vol 573. Springer, Cham. https://doi.org/10.1007/978-3-319-57261-1_35

Download citation

DOI: https://doi.org/10.1007/978-3-319-57261-1_35
Published: 07 April 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-57260-4
Online ISBN: 978-3-319-57261-1
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics