skip to main content
10.1145/3508072.3508196acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicfndsConference Proceedingsconference-collections
research-article

A Big Data framework based on Apache Spark for Industry-specific Lexicon Generation for Stock Market Prediction

Authors Info & Claims
Published:13 April 2022Publication History
First page image

References

  1. Mattia Atzeni, Amna Dridi, and Diego Reforgiato Recupero. 2018. Using frame-based resources for sentiment analysis within the financial domain. Progress in AI 7, 4 (2018), 273–294. https://doi.org/10.1007/s13748-018-0162-8Google ScholarGoogle Scholar
  2. Mattia Atzeni and Diego Reforgiato Recupero. 2020. Multi-domain sentiment analysis with mimicked and polarized word embeddings for human-robot interaction. Future Gener. Comput. Syst. 110 (2020), 984–999. https://doi.org/10.1016/j.future.2019.10.012Google ScholarGoogle ScholarCross RefCross Ref
  3. Luca Barbaglia, Sergio Consoli, and Sebastiano Manzan. 2021. Exploring the Predictive Power of News and Neural Machine Learning Models for Economic Forecasting. In Mining Data for Financial Applications, Vol. 12591. Springer, Switzerland AG, 135–149.Google ScholarGoogle Scholar
  4. Luca Barbaglia, Sergio Consoli, Sebastiano Manzan, Diego Reforgiato Recupero, Michaela Saisana, and Luca Tiozzo Pezzoli. 2021. Data Science Technologies in Economics and Finance: A Gentle Walk-In. In Data Science for Economics and Finance: Methodologies and Applications. Springer Nature, Switzerland AG, 1–17.Google ScholarGoogle Scholar
  5. Johan Bollen, Huina Mao, and Xiaojun Zeng. 2011. Twitter mood predicts the stock market. Journal of computational science 2, 1 (2011), 1–8.Google ScholarGoogle ScholarCross RefCross Ref
  6. Salvatore Carta, Sergio Consoli, Luca Piras, Alessandro Podda, and Diego Reforgiato Recupero. 2020. Dynamic Industry-Specific Lexicon Generation for Stock Market Forecast. In Lecture Notes in Computer Science, Vol. 12565. Springer Nature, Switzerland AG, 162–176.Google ScholarGoogle Scholar
  7. Salvatore Carta, Sergio Consoli, Luca Piras, Alessandro Podda, and Diego Reforgiato Recupero. 2021. Explainable Machine Learning Exploiting News and Domain-Specific Lexicon for Stock Market Forecasting. IEEE Access 9(2021), 30193–30205.Google ScholarGoogle ScholarCross RefCross Ref
  8. Salvatore Carta, Sergio Consoli, Luca Piras, Alessandro Sebastian Podda, and Diego Reforgiato Recupero Recupero. 2021. Event detection in finance using hierarchical clustering algorithms on news and tweets. PeerJ Computer Science 7(2021), e438.Google ScholarGoogle ScholarCross RefCross Ref
  9. Salvatore Carta, Sergio Consoli, Alessandro Podda, Diego Reforgiato Recupero, and Maria Madalina Stanciu. 2021. Ensembling and Dynamic Asset Selection for Risk-Controlled Statistical Arbitrage. IEEE Access 9(2021), 29942–29959. https://doi.org/10.1109/ACCESS.2021.3059187Google ScholarGoogle ScholarCross RefCross Ref
  10. Salvatore Carta, Andrea Medda, Alessio Pili, Diego Reforgiato Recupero, and Roberto Saia. 2019. Forecasting E-Commerce Products Prices by Combining an Autoregressive Integrated Moving Average (ARIMA) Model and Google Trends Data. Future Internet 11, 1 (2019), 5. https://doi.org/10.3390/fi11010005Google ScholarGoogle ScholarCross RefCross Ref
  11. Sergio Consoli, Diego Reforgiato Recupero, and Milan Petkovic. 2019. Data Science for Healthcare: Methodologies and Applications. Springer, Switzerland AG.Google ScholarGoogle Scholar
  12. Sergio Consoli, Diego Reforgiato Recupero, and Michaela Saisana. 2021. Data Science for Economics and Finance: Methodologies and Applications. Springer Nature, Switzerland AG. https://doi.org/10.1007/978-3-030-66891-4Google ScholarGoogle Scholar
  13. Sergio Consoli, Luca Tiozzo Pezzoli, and Elisa Tosetti. 2021. Emotions in Macroeconomic News and their Impact on the European Bond Market. Journal of International Money and Finance Volume 118 (2021), 102472.Google ScholarGoogle ScholarCross RefCross Ref
  14. Sanjiv R Das and Mike Y Chen. 2007. Yahoo! for Amazon: Sentiment extraction from small talk on the web. Management science 53, 9 (2007), 1375–1388.Google ScholarGoogle Scholar
  15. Xiao Ding, Yue Zhang, Ting Liu, and Junwen Duan. 2015. Deep learning for event-driven stock prediction. In Twenty-fourth international joint conference on artificial intelligence. Proceedings IJCAI 2015, Buenos Aires, 2327–2333.Google ScholarGoogle Scholar
  16. Amna Dridi, Mattia Atzeni, and Diego Reforgiato Recupero. 2019. FineNews: fine-grained semantic sentiment analysis on financial microblogs and news. Int. J. Machine Learning & Cybernetics 10, 8 (2019), 2199–2207. https://doi.org/10.1007/s13042-018-0805-xGoogle ScholarGoogle ScholarCross RefCross Ref
  17. Amna Dridi and Diego Reforgiato Recupero. 2019. Leveraging semantics for sentiment polarity detection in social media. Int. J. Mach. Learn. Cybern. 10, 8 (2019), 2045–2055. https://doi.org/10.1007/s13042-017-0727-zGoogle ScholarGoogle ScholarCross RefCross Ref
  18. Ingrid E Fisher, Margaret R Garnsey, and Mark E Hughes. 2016. Natural language processing in accounting, auditing and finance: A synthesis of the literature with a roadmap for future research. Intelligent Systems in Accounting, Finance and Management 23, 3(2016), 157–214.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Sven S Groth and Jan Muntermann. 2011. An intraday market risk management approach based on textual analysis. Decision Support Systems 50, 4 (2011), 680–691.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Michael Hagenau, Michael Liebmann, and Dirk Neumann. 2013. Automated news reading: Stock price prediction based on financial news using context-capturing features. Decision Support Systems 55, 3 (2013), 685–697.Google ScholarGoogle ScholarCross RefCross Ref
  21. William L Hamilton, Kevin Clark, Jure Leskovec, and Dan Jurafsky. 2016. Inducing domain-specific sentiment lexicons from unlabeled corpora. In Conference on Empirical Methods in Natural Language Processing, Vol. 2016. NIH Public Access, Proceedings EMNLP 2016, Austin, US, 595–605.Google ScholarGoogle ScholarCross RefCross Ref
  22. Clayton J Hutto and Eric Gilbert. 2014. Vader: A parsimonious rule-based model for sentiment analysis of social media text. In Eighth international AAAI conference on weblogs and social media. Proceedings ICWSM 2014, Ann Arbor, US, 216–225.Google ScholarGoogle Scholar
  23. J. Korst, V. Pronk, M. Barbieri, and S. Consoli. 2019. Introduction to classification algorithms and their performance analysis using medical examples. In Data Science for Healthcare: Methodologies and Applications. Springer Nature, Switzerland AG, 39–73.Google ScholarGoogle Scholar
  24. Victor Lavrenko, Matt Schmill, Dawn Lawrie, Paul Ogilvie, David Jensen, and James Allan. 2000. Language models for financial news recommendation. In Proceedings of the ninth international conference on information and knowledge management. Association for Computing Machinery, New York, US, 389–396.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Andrew W Lo. 2004. The adaptive markets hypothesis. The Journal of Portfolio Management 30, 5 (2004), 15–29.Google ScholarGoogle ScholarCross RefCross Ref
  26. Hassan H Malik, Vikas S Bhardwaj, and Huascar Fiorletta. 2011. Accurate information extraction for quantitative financial events. In Proceedings of the 20th ACM international conference on information and knowledge management. Association for Computing Machinery, New York, US, 2497–2500.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Burton G Malkiel and Eugene F Fama. 1970. Efficient capital markets: A review of theory and empirical work. The journal of Finance 25, 2 (1970), 383–417.Google ScholarGoogle ScholarCross RefCross Ref
  28. T. Matsubara, R. Akita, and K. Uehara. 2018. Stock price prediction by deep neural generative model of news articles. IEICE Transactions on Information and Systems E101D, 4(2018), 901–908. https://doi.org/10.1587/transinf.2016IIP0016 cited By 2.Google ScholarGoogle Scholar
  29. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems. Proceedings NIPS 2013, Nevada, US, 3111–3119.Google ScholarGoogle Scholar
  30. Antonio Moreno-Ortiz and Javier Fernández-Cruz. 2015. Identifying polarity in financial texts for sentiment analysis: a corpus-based approach. Procedia-Social and Behavioral Sciences 198 (2015), 330–338.Google ScholarGoogle ScholarCross RefCross Ref
  31. G. Moro, R. Pasolini, G. Domeniconi, A. Pagliarani, and A. Roli. 2019. Prediction and trading of dow jones from twitter: A boosting text mining method with relevant tweets identification. Communications in Computer and Information Science 976 (2019), 26–42.Google ScholarGoogle ScholarCross RefCross Ref
  32. Michael Nofer and Oliver Hinz. 2015. Using twitter to predict the stock market. Business & Information Systems Engineering 57, 4 (2015), 229–242.Google ScholarGoogle ScholarCross RefCross Ref
  33. Nuno Oliveira, Paulo Cortez, and Nelson Areal. 2016. Stock market sentiment lexicon acquisition using microblogging data and statistical measures. Decision Support Systems 85 (2016), 62–73.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Diego Reforgiato Recupero, Andrea Nuzzolese, Sergio Consoli, Valentina Presutti, Silvio Peroni, and Misael Mongiovì. 2015. Extracting knowledge from text using SHELDON, a semantic holistic framEwork for LinkeD ONtology data. In WWW 2015 Companion - Proceedings of the 24th International Conference on World Wide Web. ACM, New York, USA, 235–238. https://doi.org/10.1145/2740908.2742842Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Jianfeng Si, Arjun Mukherjee, Bing Liu, Qing Li, Huayi Li, and Xiaotie Deng. 2013. Exploiting topic based twitter sentiment for stock prediction. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Association for Computing Linguistics, New York, US, 24–29.Google ScholarGoogle Scholar
  36. Sahar Sohangir, Nicholas Petty, and Dingding Wang. 2018. Financial sentiment lexicon analysis. In 2018 IEEE 12th International Conference on Semantic Computing (ICSC). IEEE, US, 286–289.Google ScholarGoogle ScholarCross RefCross Ref
  37. Maite Taboada, Julian Brooke, Milan Tofiloski, Kimberly Voll, and Manfred Stede. 2011. Lexicon-based methods for sentiment analysis. Computational linguistics 37, 2 (2011), 267–307.Google ScholarGoogle Scholar
  38. Paul C Tetlock, Maytal Saar-Tsechansky, and Sofus Macskassy. 2008. More than words: Quantifying language to measure firms’ fundamentals. The Journal of Finance 63, 3 (2008), 1437–1467.Google ScholarGoogle ScholarCross RefCross Ref
  39. M.R. Vargas, C.E.M. Dos Anjos, G.L.G. Bichara, and A.G. Evsukoff. 2018. Deep Learning for Stock Market Prediction Using Technical Indicators and Financial News Articles. In 2018 International Joint Conference on Neural Networks (IJCNN), Vol. 2018-July. Proceedings IJCNN 2018, IEEE, Rio de Janeiro, Brazil, 8489208.Google ScholarGoogle Scholar
  40. Frank Z Xing, Erik Cambria, and Roy E Welsch. 2018. Natural language based financial forecasting: a survey. Artificial Intelligence Review 50, 1 (2018), 49–73.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A Big Data framework based on Apache Spark for Industry-specific Lexicon Generation for Stock Market Prediction
            Index terms have been assigned to the content through auto-classification.

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Other conferences
              ICFNDS 2021: The 5th International Conference on Future Networks & Distributed Systems
              December 2021
              847 pages
              ISBN:9781450387347
              DOI:10.1145/3508072

              Copyright © 2021 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 13 April 2022

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article
              • Research
              • Refereed limited
            • Article Metrics

              • Downloads (Last 12 months)37
              • Downloads (Last 6 weeks)5

              Other Metrics

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader

            HTML Format

            View this article in HTML Format .

            View HTML Format