Abstract
The enormous number of articles published daily on the Internet, by a diverse array of authors, often offers misleading or unwanted information, rendering activities such as sports betting riskier. As a result, extracting meaningful and reliable information from these sources becomes a time-consuming and near impossible task. In this context, labeling articles as objective or subjective is not a simple natural language processing task because subjectivity can take several forms. With the rise of online sports betting due to the revolution in Internet and mobile technology, an automated system capable of sifting through all these data and finding relevant sources in a reasonable amount of time presents itself as a desirable and marketable product. In this work, we present a framework for the classification of sports articles composed of three stages: The first stage extracts articles from web pages using text extraction libraries, parses the text and then tags words using Stanford’s parts of speech tagger; the second stage extracts unique syntactic and semantic features, and reduces them using our modified cortical algorithm (CA)—hereafter CA*—while the third stage classifies these texts as objective or subjective. Our framework was tested on a database containing 1000 articles, manually labeled using Amazon’s crowdsourcing tool, Mechanical Turk; and results using CA, CA*, support vector machines and one of its soft computing variants (LMSVM) as classifiers were reported. A testing accuracy of 85.6% was achieved on a fourfold cross-validation with a 40% reduction in features using CA* that was trained using an entropy weight update rule and a cross-entropy cost function.
Similar content being viewed by others
Notes
References
Hashmi AG, Lipasti MH (2009) Cortical columns: building blocks for intelligent systems. In: IEEE symposium on computational intelligence for multimedia signal and vision processing, pp 21–28
Hashmi AG, Lipasti MH (2010) Discovering cortical algorithms. In: Proceedings of the international conference on fuzzy computation and international conference on neural computation, Valencia, Spain, pp 196–204
Rizk Y, Mitri N, Awad M (2013) A local mixture based SVM for an efficient supervised binary classification. In: International joint conference on neural networks, Dallas, TX
Rizk Y, Awad M (2012) Syntactic genetic algorithm for a subjectivity analysis of sports articles. In: 11th IEEE international conference on cybernetic intelligent systems, Limerick, Ireland
Esuli A, Sebastiani F (2006) SentiWordNet: a publicly available lexical resource for opinion mining. In: Proceedings of language resources and evaluation, pp 417–422
Yu H, Hatzivassiloglou V (2003) Towards answering opinion questions: separating facts from opinions and identifying the polarity of opinion sentences. In: Proceedings of the 2003 conference on empirical methods in natural language processing, pp 129–136
Wilson T, Wiebe J, Hoffmann P (2005) Recognizing contextual polarity in phrase-level sentiment analysis. In: Proceedings of the human language technology conference and the conference on empirical methods in natural language processing, pp 347–354
Wiebe J, Riloff E (2011) Finding mutual benefit between subjectivity analysis and information extraction. IEEE Trans Affect Comput 2(4):175–191
Das A, Bandyopadhyay S (2010) Subjectivity detection using genetic algorithm. In: The 1st Workshop on Computational Approaches to Subjectivity and Sentiment Analysis (WASSA10), Lisbon, Portugal
Toutanova K, Klein D, Manning CD, Singer Y (2003) Feature-rich part-of-speech tagging with a cyclic dependency network. In: NAACL’03: proceedings of the 2003 conference of the North American chapter of the association of computational linguistics on human language technology, Edmonton, Canada, pp 173–180
Heerschop B, Hogenboom A, Frasincar F (2011a) Sentiment lexicon creation from lexical resources. In: 14th International conference on business information systems, vol 87, pp 185–196
Pang B, Lee L, Vaithyanathan S (2002) Thumbs up? Sentiment classification using machine learning techniques. In: Proceedings empirical methods in natural language processing, Philadelphia, pp 79–86
Abbasi A, France S, Zhang Z, Chen H (2011) Selecting attributes for sentiment classification using feature relation networks. IEEE Trans Knowl Data Eng 23:447–462
Wiebe J, Wilson T, Bruce R, Bell M, Martin M (2002) Learning subjective language. In: Technical report TR-02-100, Department of Computer Science, University of Pittsburgh, Pittsburgh, Pennsylvania
Devitt A, Ahmad K (2007) Sentiment analysis in financial news: A cohesion-based approach. In: Proceedings of the association for computational linguistics, pp 984–991
Godbole N, Srinivasaiah M, Skiena S (2007) Large-scale sentiment analysis for news and blog. In: Proceedings of the international conference on weblogs and social media, pp 219–222
Heerschop B, Van Iterson P, Hogenboom A, Frasincar F, Kaymak U (2011) Analyzing sentiment in a large set of web data while accounting for negation. Adv Intell Web Mastering 3:195–205
Benamara F, Cesarano C, Picariello A, Reforgiato D, Subrahmanian VS (2007) Sentiment analysis: adjectives and adverbs are better than adjectives alone. In: Proceedings of the international conference on weblogs and social media
Wang H, Can D, Kazemzadeh A, Bar F, Narayanan S (2012) A system for real-time twitter sentiment analysis of 2012 us presidential election cycle. In: Proceedings of the ACL 2012 system demonstrations, pp 115–120
Guerra PHC, Veloso A, Meira Jr W, Almeida V (2011) From bias to opinion: a transfer-learning approach to real-time sentiment analysis. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, pp 150–158
Berger AL, Pietra VJD, Pietra SAD (1996) A maximum entropy approach to natural language processing. Comput Linguist 22(1):39–71
Yang Y, Pedersen JO (1997) A comparative study on feature selection in text categorization. In: ICML, vol 97, pp 412–420
Liu T, Liu S, Chen Z, Ma WY (2003) An evaluation on feature selection for text clustering. In: Proceedings of the 20th international conference on machine learning (ICML-03), pp 488–495
Kim H, Howland P, Park H (2005) Dimension reduction in text classification with support vector machines. J Mach Learn Res 6:37–53
Shafiei M, Wang S, Zhang R, Milios E, Tang B, Tougas J, Spiteri R (2007) Document representation and dimension reduction for text clustering. In: 2007 IEEE 23rd international conference on data engineering workshop. IEEE, pp 770–779
Chua FCT (2009) Dimensionality reduction and clustering of text documents. Singapore Management University, Singapore
Mao Y, Balasubramanian K, Lebanon G (2010) Dimensionality reduction for text using domain knowledge. In: Proceedings of the 23rd international conference on computational linguistics: posters. Association for Computational Linguistics, pp 801–809
Bian W, Tao D (2011) Max–min distance analysis by using sequential SDP relaxation for dimension reduction. IEEE Trans Pattern Anal Mach Intell 33(5):1037–1050
Tang EK, Suganthan PN, Yao X, Qin AK (2005) Linear dimensionality reduction using relevance weighted LDA. Pattern Recogn 38(4):485–493
Chen Y, Miao D, Wang R, Wu K (2011) A rough set approach to feature selection based on power set tree. Knowl Based Syst 24(2):275–281
Han Y, Yu L (2012) A variance reduction framework for stable feature selection. Stat Anal Data Min 5(5):428–445
Raymer ML, Punch WF, Goodman ED, Kuhn LA, Jain AK (2000) Dimensionality reduction using genetic algorithms. IEEE Trans Evol Comput 4(2):164–171
Atyabi A, Luerssen M, Fitzgibbon S, Powers DM (2012) Evolutionary feature selection and electrode reduction for EEG classification. In: IEEE congress on evolutionary computation (CEC2012), pp 1–8
Perantonis SJ, Virvilis V (1999) Input feature extraction for multilayered perceptrons using supervised principal component analysis. Neural Process Lett 10(3):243–252
Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507
Deepthi DR, Krishna GR, Eswaran K (2007) Automatic pattern classification by unsupervised learning using dimensionality reduction of data with mirroring neural networks. Preprint arXiv:0712.0938
Bi J, Bennett K, Embrechts M, Breneman C, Song M (2003) Dimensionality reduction via sparse support vector machines. J Mach Learn Res 3:1229–1243
Wang M, Sha F, Jordan MI (2010) Unsupervised kernel dimension reduction. In: Advances in neural information processing systems, pp 2379–2387
Formisano E, De Martino F, Bonte M, Goebel R (2008) “Who” is saying “what”? Brain-based decoding of human voice and speech. Science 322(5903):970–973
Edelman GM, Mountcastle VB (1982) The mindful brain. The MIT Press, Cambridge
Hajj N, Awad M (2013) Weighted entropy cortical algorithms for modern standard arabic speech recognition. In: International joint conference on neural networks (IJCNN), Dallas, TX
Silva LM, Marques de Sá J, Alexandre LA (2005) Neural network classification using Shannon’s entropy. In: ESANN, pp 217–222
Silva LM, Marques de Sá J, Alexandre LA (2008) Data classification with multilayer perceptrons using a generalized error function. Neural Netw 21(9):1302–1310
Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2:27
Moller C (2011) Experiments with MATLAB. The MathWorks Co, Natick
Acknowledgements
This work was partially funded by Intel and the University Research Board at the American University of Beirut. We would also like to acknowledge the help of Professor Lina Choueiri from the Department of English at the American University of Beirut.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Hajj, N., Rizk, Y. & Awad, M. A subjectivity classification framework for sports articles using improved cortical algorithms. Neural Comput & Applic 31, 8069–8085 (2019). https://doi.org/10.1007/s00521-018-3549-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-018-3549-3