research-article

A Big Data framework based on Apache Spark for Industry-specific Lexicon Generation for Stock Market Prediction

Authors:
Simone Angioni

Department of Mathematics and Computer Science, University of Cagliari, Italy

Department of Mathematics and Computer Science, University of Cagliari, Italy
View Profile

,
Salvatore Carta

Department of Mathematics and Computer Science, University of Cagliari, Italy

Department of Mathematics and Computer Science, University of Cagliari, Italy
View Profile

,
Sergio Consoli

European Commission, Joint Research Centre (JRC), Italy

European Commission, Joint Research Centre (JRC), Italy
View Profile

,
Diego Reforgiato Recupero

Department of Mathematics and Computer Science, University of Cagliari, Italy

Department of Mathematics and Computer Science, University of Cagliari, Italy
View Profile

,
Maria Madalina Stanciu

Department of Mathematics and Computer Science, University of Cagliari, Italy

Department of Mathematics and Computer Science, University of Cagliari, Italy
View Profile

ICFNDS 2021: The 5th International Conference on Future Networks & Distributed SystemsDecember 2021Pages 616–624https://doi.org/10.1145/3508072.3508196

Published:13 April 2022Publication History

ICFNDS 2021: The 5th International Conference on Future Networks & Distributed Systems

Pages 616–624

References

Mattia Atzeni, Amna Dridi, and Diego Reforgiato Recupero. 2018. Using frame-based resources for sentiment analysis within the financial domain. Progress in AI 7, 4 (2018), 273–294. https://doi.org/10.1007/s13748-018-0162-8Google Scholar
Mattia Atzeni and Diego Reforgiato Recupero. 2020. Multi-domain sentiment analysis with mimicked and polarized word embeddings for human-robot interaction. Future Gener. Comput. Syst. 110 (2020), 984–999. https://doi.org/10.1016/j.future.2019.10.012Google ScholarCross Ref
Luca Barbaglia, Sergio Consoli, and Sebastiano Manzan. 2021. Exploring the Predictive Power of News and Neural Machine Learning Models for Economic Forecasting. In Mining Data for Financial Applications, Vol. 12591. Springer, Switzerland AG, 135–149.Google Scholar
Luca Barbaglia, Sergio Consoli, Sebastiano Manzan, Diego Reforgiato Recupero, Michaela Saisana, and Luca Tiozzo Pezzoli. 2021. Data Science Technologies in Economics and Finance: A Gentle Walk-In. In Data Science for Economics and Finance: Methodologies and Applications. Springer Nature, Switzerland AG, 1–17.Google Scholar
Johan Bollen, Huina Mao, and Xiaojun Zeng. 2011. Twitter mood predicts the stock market. Journal of computational science 2, 1 (2011), 1–8.Google ScholarCross Ref
Salvatore Carta, Sergio Consoli, Luca Piras, Alessandro Podda, and Diego Reforgiato Recupero. 2020. Dynamic Industry-Specific Lexicon Generation for Stock Market Forecast. In Lecture Notes in Computer Science, Vol. 12565. Springer Nature, Switzerland AG, 162–176.Google Scholar
Salvatore Carta, Sergio Consoli, Luca Piras, Alessandro Podda, and Diego Reforgiato Recupero. 2021. Explainable Machine Learning Exploiting News and Domain-Specific Lexicon for Stock Market Forecasting. IEEE Access 9(2021), 30193–30205.Google ScholarCross Ref
Salvatore Carta, Sergio Consoli, Luca Piras, Alessandro Sebastian Podda, and Diego Reforgiato Recupero Recupero. 2021. Event detection in finance using hierarchical clustering algorithms on news and tweets. PeerJ Computer Science 7(2021), e438.Google ScholarCross Ref
Salvatore Carta, Sergio Consoli, Alessandro Podda, Diego Reforgiato Recupero, and Maria Madalina Stanciu. 2021. Ensembling and Dynamic Asset Selection for Risk-Controlled Statistical Arbitrage. IEEE Access 9(2021), 29942–29959. https://doi.org/10.1109/ACCESS.2021.3059187Google ScholarCross Ref
Salvatore Carta, Andrea Medda, Alessio Pili, Diego Reforgiato Recupero, and Roberto Saia. 2019. Forecasting E-Commerce Products Prices by Combining an Autoregressive Integrated Moving Average (ARIMA) Model and Google Trends Data. Future Internet 11, 1 (2019), 5. https://doi.org/10.3390/fi11010005Google ScholarCross Ref
Sergio Consoli, Diego Reforgiato Recupero, and Milan Petkovic. 2019. Data Science for Healthcare: Methodologies and Applications. Springer, Switzerland AG.Google Scholar
Sergio Consoli, Diego Reforgiato Recupero, and Michaela Saisana. 2021. Data Science for Economics and Finance: Methodologies and Applications. Springer Nature, Switzerland AG. https://doi.org/10.1007/978-3-030-66891-4Google Scholar
Sergio Consoli, Luca Tiozzo Pezzoli, and Elisa Tosetti. 2021. Emotions in Macroeconomic News and their Impact on the European Bond Market. Journal of International Money and Finance Volume 118 (2021), 102472.Google ScholarCross Ref
Sanjiv R Das and Mike Y Chen. 2007. Yahoo! for Amazon: Sentiment extraction from small talk on the web. Management science 53, 9 (2007), 1375–1388.Google Scholar
Xiao Ding, Yue Zhang, Ting Liu, and Junwen Duan. 2015. Deep learning for event-driven stock prediction. In Twenty-fourth international joint conference on artificial intelligence. Proceedings IJCAI 2015, Buenos Aires, 2327–2333.Google Scholar
Amna Dridi, Mattia Atzeni, and Diego Reforgiato Recupero. 2019. FineNews: fine-grained semantic sentiment analysis on financial microblogs and news. Int. J. Machine Learning & Cybernetics 10, 8 (2019), 2199–2207. https://doi.org/10.1007/s13042-018-0805-xGoogle ScholarCross Ref
Amna Dridi and Diego Reforgiato Recupero. 2019. Leveraging semantics for sentiment polarity detection in social media. Int. J. Mach. Learn. Cybern. 10, 8 (2019), 2045–2055. https://doi.org/10.1007/s13042-017-0727-zGoogle ScholarCross Ref
Ingrid E Fisher, Margaret R Garnsey, and Mark E Hughes. 2016. Natural language processing in accounting, auditing and finance: A synthesis of the literature with a roadmap for future research. Intelligent Systems in Accounting, Finance and Management 23, 3(2016), 157–214.Google ScholarDigital Library
Sven S Groth and Jan Muntermann. 2011. An intraday market risk management approach based on textual analysis. Decision Support Systems 50, 4 (2011), 680–691.Google ScholarDigital Library
Michael Hagenau, Michael Liebmann, and Dirk Neumann. 2013. Automated news reading: Stock price prediction based on financial news using context-capturing features. Decision Support Systems 55, 3 (2013), 685–697.Google ScholarCross Ref
William L Hamilton, Kevin Clark, Jure Leskovec, and Dan Jurafsky. 2016. Inducing domain-specific sentiment lexicons from unlabeled corpora. In Conference on Empirical Methods in Natural Language Processing, Vol. 2016. NIH Public Access, Proceedings EMNLP 2016, Austin, US, 595–605.Google ScholarCross Ref
Clayton J Hutto and Eric Gilbert. 2014. Vader: A parsimonious rule-based model for sentiment analysis of social media text. In Eighth international AAAI conference on weblogs and social media. Proceedings ICWSM 2014, Ann Arbor, US, 216–225.Google Scholar
J. Korst, V. Pronk, M. Barbieri, and S. Consoli. 2019. Introduction to classification algorithms and their performance analysis using medical examples. In Data Science for Healthcare: Methodologies and Applications. Springer Nature, Switzerland AG, 39–73.Google Scholar
Victor Lavrenko, Matt Schmill, Dawn Lawrie, Paul Ogilvie, David Jensen, and James Allan. 2000. Language models for financial news recommendation. In Proceedings of the ninth international conference on information and knowledge management. Association for Computing Machinery, New York, US, 389–396.Google ScholarDigital Library
Andrew W Lo. 2004. The adaptive markets hypothesis. The Journal of Portfolio Management 30, 5 (2004), 15–29.Google ScholarCross Ref
Hassan H Malik, Vikas S Bhardwaj, and Huascar Fiorletta. 2011. Accurate information extraction for quantitative financial events. In Proceedings of the 20th ACM international conference on information and knowledge management. Association for Computing Machinery, New York, US, 2497–2500.Google ScholarDigital Library
Burton G Malkiel and Eugene F Fama. 1970. Efficient capital markets: A review of theory and empirical work. The journal of Finance 25, 2 (1970), 383–417.Google ScholarCross Ref
T. Matsubara, R. Akita, and K. Uehara. 2018. Stock price prediction by deep neural generative model of news articles. IEICE Transactions on Information and Systems E101D, 4(2018), 901–908. https://doi.org/10.1587/transinf.2016IIP0016 cited By 2.Google Scholar
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems. Proceedings NIPS 2013, Nevada, US, 3111–3119.Google Scholar
Antonio Moreno-Ortiz and Javier Fernández-Cruz. 2015. Identifying polarity in financial texts for sentiment analysis: a corpus-based approach. Procedia-Social and Behavioral Sciences 198 (2015), 330–338.Google ScholarCross Ref
G. Moro, R. Pasolini, G. Domeniconi, A. Pagliarani, and A. Roli. 2019. Prediction and trading of dow jones from twitter: A boosting text mining method with relevant tweets identification. Communications in Computer and Information Science 976 (2019), 26–42.Google ScholarCross Ref
Michael Nofer and Oliver Hinz. 2015. Using twitter to predict the stock market. Business & Information Systems Engineering 57, 4 (2015), 229–242.Google ScholarCross Ref
Nuno Oliveira, Paulo Cortez, and Nelson Areal. 2016. Stock market sentiment lexicon acquisition using microblogging data and statistical measures. Decision Support Systems 85 (2016), 62–73.Google ScholarDigital Library
Diego Reforgiato Recupero, Andrea Nuzzolese, Sergio Consoli, Valentina Presutti, Silvio Peroni, and Misael Mongiovì. 2015. Extracting knowledge from text using SHELDON, a semantic holistic framEwork for LinkeD ONtology data. In WWW 2015 Companion - Proceedings of the 24th International Conference on World Wide Web. ACM, New York, USA, 235–238. https://doi.org/10.1145/2740908.2742842Google ScholarDigital Library
Jianfeng Si, Arjun Mukherjee, Bing Liu, Qing Li, Huayi Li, and Xiaotie Deng. 2013. Exploiting topic based twitter sentiment for stock prediction. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Association for Computing Linguistics, New York, US, 24–29.Google Scholar
Sahar Sohangir, Nicholas Petty, and Dingding Wang. 2018. Financial sentiment lexicon analysis. In 2018 IEEE 12th International Conference on Semantic Computing (ICSC). IEEE, US, 286–289.Google ScholarCross Ref
Maite Taboada, Julian Brooke, Milan Tofiloski, Kimberly Voll, and Manfred Stede. 2011. Lexicon-based methods for sentiment analysis. Computational linguistics 37, 2 (2011), 267–307.Google Scholar
Paul C Tetlock, Maytal Saar-Tsechansky, and Sofus Macskassy. 2008. More than words: Quantifying language to measure firms’ fundamentals. The Journal of Finance 63, 3 (2008), 1437–1467.Google ScholarCross Ref
M.R. Vargas, C.E.M. Dos Anjos, G.L.G. Bichara, and A.G. Evsukoff. 2018. Deep Learning for Stock Market Prediction Using Technical Indicators and Financial News Articles. In 2018 International Joint Conference on Neural Networks (IJCNN), Vol. 2018-July. Proceedings IJCNN 2018, IEEE, Rio de Janeiro, Brazil, 8489208.Google Scholar
Frank Z Xing, Erik Cambria, and Roy E Welsch. 2018. Natural language based financial forecasting: a survey. Artificial Intelligence Review 50, 1 (2018), 49–73.Google ScholarDigital Library

Index Terms

A Big Data framework based on Apache Spark for Industry-specific Lexicon Generation for Stock Market Prediction

Index terms have been assigned to the content through auto-classification.

Recommendations

A comparative between hadoop mapreduce and apache Spark on HDFS
IML '17: Proceedings of the 1st International Conference on Internet of Things and Machine Learning

Data is growing now in a very high speed with a large volume, Spark and MapReduce¹ both provide a processing model for analyzing and managing this large data -Big Data- stored on HDFS. In this paper, we discuss a comparative between Apache Spark and ...
Read More
Performance comparison of Apache Hadoop and Apache Spark
ICAICR '19: Proceedings of the Third International Conference on Advanced Informatics for Computing Research

The term 'Big Data' is a broad term used for the data sets, which is enormous and traditional data processing applications find it hard to process. Both Apache Spark and Apache Hadoop are one of the significant parts of the big data family. Some of the ...
Read More
Big data software analytics with Apache Spark
ICSE '18: Proceedings of the 40th International Conference on Software Engineering: Companion Proceeedings

At the beginning of every research effort, researchers in empirical software engineering have to go through the processes of extracting data from raw data sources and transforming them to what their tools expect as inputs. This step is time consuming ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

ICFNDS 2021: The 5th International Conference on Future Networks & Distributed Systems
December 2021
847 pages
ISBN:9781450387347
DOI:10.1145/3508072

Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 April 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Apache Spark
Big Data
Financial Technology.
Machine Learning
Natural Language Processing
Stock Market Forecasting
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 73
  Total Downloads
- Downloads (Last 12 months)37
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

A Big Data framework based on Apache Spark for Industry-specific Lexicon Generation for Stock Market Prediction

ICFNDS 2021: The 5th International Conference on Future Networks & Distributed Systems

References

Cited By

Index Terms

Recommendations

A comparative between hadoop mapreduce and apache Spark on HDFS

Performance comparison of Apache Hadoop and Apache Spark

Big data software analytics with Apache Spark

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

A Big Data framework based on Apache Spark for Industry-specific Lexicon Generation for Stock Market Prediction

ICFNDS 2021: The 5th International Conference on Future Networks & Distributed Systems

References

Cited By

Index Terms

Recommendations

A comparative between hadoop mapreduce and apache Spark on HDFS

Performance comparison of Apache Hadoop and Apache Spark

Big data software analytics with Apache Spark

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media