Skip to main content

The Comparison of Effects of Relevant-Feature Selection Algorithms on Certain Social-Network Text-Mining Viewpoints

  • Conference paper
  • First Online:
Artificial Intelligence Trends in Intelligent Systems (CSOC 2017)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 573))

Included in the following conference series:

  • 1164 Accesses

Abstract

This research addresses a well-known problem in the area of text mining: The high computational complexity caused by many irrelevant features (terms, words), which may play an appreciable role of noise from the classification point of view and non-linearly rule the time and memory requirements. Using a set of real-world textual documents represented by sentiment related to three selected and extensively tracked Internet sources freely written in English, a group of available algorithms (Gain Ratio, Chi Square, Info Gain, Symmetrical Uncertainty, Winnow, One R, Relief F, Principal Components, SVM, LSA) applied to discovering relevant features was tested with 10,000, 25,000, and 50,000 social-network entries. All the algorithms provided very similar results concerning looking for the relevant features – typically, only the feature significance rank was slightly different. Except for some slower algorithms, the term-preselecting time ranged from seconds to minutes to a couple of hours. However, after using only a relevant fraction of features instead of all of them, the entry length very considerably decreased by several orders of magnitude, particularly for larger data sets having very high dimensionality degree. Despite the extremely strong reduction of the number of words, the classification accuracy remained the same independently on the relevant-feature selection algorithm choice.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Amazon.com (2016). https://www.amazon.com

  2. Booking.com (2016). https://www.booking.com

  3. Yahoo.com (2016). https://finance.yahoo.com

  4. Blum, A.L., Langley, P.: Selection of relevant features and examples in machine learning. J. Artif. Intell. 97(1–2), 245–271 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  5. Dessi, N., Pes, B.: Similarity of feature selection methods: an empirical study across data intensive classification tasks. Expert Syst. Appl. 42(10), 4632–4642 (2015)

    Article  Google Scholar 

  6. Yang, Y., Pederson, J.O.: A comparative study on feature selection in text categorization. In Proceedings of the Fourteenth International Conference on Machine Learning, pp. 412–420 (1997)

    Google Scholar 

  7. Žižka, J., Svoboda, A.: Customers’ opinion mining from extensive amount of textual reviews in relation to induced knowledge growth. J. Acta Univ. Agric. Silvic. Mendelianae Brun. 63, 2229–2237 (2015)

    Article  Google Scholar 

  8. Data mining tools See5 and C5.0. RuleQuest Research (2016). https://www.rulequest.com/see5-info.html

  9. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, New York (1993)

    Google Scholar 

  10. Bellman, R.E.: Dynamic Programming. Counter Dover Publications (2003)

    Google Scholar 

  11. Tang, J., Alelyani, S., Liu, H.: Feature selection for classification: a review. In: Aggarwal, C.C. (ed.) Data Classification: Algorithms and Applications, pp. 37–64. CRC Press (2014)

    Google Scholar 

  12. Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34(1), 1–47 (2002)

    Article  Google Scholar 

  13. Witten, I.H., Frank, E., Hall, M.A.: Data Mining: Practical Machine Learning Tools and Techniques. Data Transformations. Morgan Kaufmann, San Francisco (2011). Chap. 7

    Google Scholar 

  14. Chikalov, I.: Average Time Complexity of Decision Trees. Intelligent Systems Reference Library, vol. 21. Springer, Heidelberg (2011)

    MATH  Google Scholar 

  15. Dařena, F., Žižka, J.: Interdependence of text mining quality and the input data preprocessing. In: Silhavy, R., Senkerik, R., Oplatkova, Z.K., Prokopova, Z., Silhavy, P. (eds.) Artificial Intelligence Perspectives and Applications. AISC, vol. 347, pp. 141–150. Springer, Cham (2015). doi:10.1007/978-3-319-18476-0_15

    Google Scholar 

  16. Bifet, A., Holmes, G., Kirkby, R., Pfahringer, B.: MOA: massive online analysis. J. Mach. Learn. Res. 11, 1601–1604 (2010)

    Google Scholar 

Download references

Acknowledgments

This research was funded by the Czech Science Foundation, grant No. 16-26353S “Sentiment and its Impact on Stock Markets”.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jan Žižka .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Žižka, J., Dařena, F. (2017). The Comparison of Effects of Relevant-Feature Selection Algorithms on Certain Social-Network Text-Mining Viewpoints. In: Silhavy, R., Senkerik, R., Kominkova Oplatkova, Z., Prokopova, Z., Silhavy, P. (eds) Artificial Intelligence Trends in Intelligent Systems. CSOC 2017. Advances in Intelligent Systems and Computing, vol 573. Springer, Cham. https://doi.org/10.1007/978-3-319-57261-1_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-57261-1_35

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-57260-4

  • Online ISBN: 978-3-319-57261-1

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics