Skip to main content

A Text Mining Approach to Extract and Rank Innovation Insights from Research Projects

  • Conference paper
  • First Online:
Book cover Web Information Systems Engineering – WISE 2020 (WISE 2020)

Abstract

Open innovation is a new paradigm embraced by companies to introduce transformations. It assumes that firms can and should use external and internal ideas to innovate. Recently, commercial and research projects have undergone an exponential growth, leading the open challenge of identifying possible insights on interesting aspects to work on. The existing literature has focused on the identification of goals, topics, and keywords in a single piece of text. However, insights do not have a clear structure and cannot be validated by comparing them with a straightforward ground truth, thus making their identification particularly challenging. Besides the extraction of insights from previously existing initiatives, the issue of how to present them to a company in a ranking also emerges. To overcome these two issues, we present an approach that extracts insights from a large number of projects belonging to distinct domains, by analyzing their abstract. Then, our method is able to rank these results, to support project preparation, by presenting first the most relevant and timely/recent insights. Our evaluation on real data coming from all the Horizon 2020 European projects, shows the effectiveness of our approach in a concrete case study.

All authors equally contributed to this research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://spacy.io/.

  2. 2.

    https://spacy.io/.

  3. 3.

    https://www.elastic.co/.

  4. 4.

    https://data.europa.eu/euodp/en/data/dataset/cordisref-data.

References

  1. Alabdulkareem, F., Cercone, N., Liaskos, S.: Goal and preference identification through natural language. In: 23rd IEEE International Requirements Engineering Conference, RE, pp. 56–65. IEEE Computer Society (2015)

    Google Scholar 

  2. Allahyari, M., et al.: A brief survey of text mining: Classification, clustering and extraction techniques (2017). CoRR abs/1707.02919

    Google Scholar 

  3. Aras, H., Hackl-Sommer, R., Schwantner, M., Sofean, M.: Applications and challenges of text mining with patents. In: Proceedings of the First International Workshop on Patent Mining and Its Applications (IPaMin 2014). CEUR Workshop Proceedings, vol. 1292. CEUR-WS.org (2014)

    Google Scholar 

  4. Bavier, A., Peterson, L., Mosberger, D.: Bert: A scheduler for best effort and realtime tasks. Technical Report (1999)

    Google Scholar 

  5. Bogers, M., Chesbrough, H., Moedas, C.: Open innovation: research, practices, and policies. Calif. Manag. Rev. 60(2), 5–16 (2018)

    Article  Google Scholar 

  6. Boudin, F.: Unsupervised keyphrase extraction with multipartite graphs (2018). arXiv preprint arXiv:1803.08721

  7. Dessì, D., Fenu, G., Marras, M., Reforgiato Recupero, D.: COCO: semantic-enriched collection of online courses at scale with experimental use cases. In: Rocha, Á., Adeli, H., Reis, L.P., Costanzo, S. (eds.) WorldCIST’18 2018. AISC, vol. 746, pp. 1386–1396. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-77712-2_133

    Chapter  Google Scholar 

  8. Dessì, D., Reforgiato Recupero, D., Fenu, G., Consoli, S.: A recommender system of medical reports leveraging cognitive computing and frame semantics. In: Tsihrintzis, G.A., Sotiropoulos, D.N., Jain, L.C. (eds.) Machine Learning Paradigms. ISRL, vol. 149, pp. 7–30. Springer, Cham (2019). https://doi.org/10.1007/978-3-319-94030-4_2

    Chapter  Google Scholar 

  9. Gorinski, P.J., et al.: Named entity recognition for electronic health records: a comparison of rule-based and machine learning approaches (2019). arXiv preprint arXiv:1903.03985

  10. Hasan, H.M., Sanyal, F., Chaki, D.: A novel approach to extract important keywords from documents applying latent semantic analysis. In: 2018 10th International Conference on Knowledge and Smart Technology (KST), pp. 117–122. IEEE (2018)

    Google Scholar 

  11. Kathait, S.S., Tiwari, S., Varshney, A., Sharma, A.: Unsupervised key-phrase extraction using noun phrases. Int. J. Comput. Appl. 162, 1–5 (2017)

    Google Scholar 

  12. Larrañaga, M., Elorriaga, J.A., Arruarte, A.: A heuristic NLP based approach for getting didactic resources from electronic documents. In: Dillenbourg, P., Specht, M. (eds.) EC-TEL 2008. LNCS, vol. 5192, pp. 197–202. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-87605-2_22

    Chapter  Google Scholar 

  13. Loukam, M., Hammouche, D., Mezzoudj, F., Belkredim, F.Z.: Keyphrase extraction from modern standard Arabic texts based on association rules. In: Smaïli, K. (ed.) ICALP 2019. CCIS, vol. 1108, pp. 209–220. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32959-4_15

    Chapter  Google Scholar 

  14. Ramos, G., Boratto, L.: Reputation (in)dependence in ranking systems: demographics influence over output disparities. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’20, pp. 2061–2064. Association for Computing Machinery, New York (2020). https://doi.org/10.1145/3397271.3401278

  15. Rauter, R., Globocnik, D., Perl-Vorbach, E., Baumgartner, R.J.: Open innovation and its effects on economic and sustainability innovation performance. J. Innov. Knowl. 4(4), 226–233 (2019)

    Article  Google Scholar 

  16. Reimers, N., Gurevych, I.: Sentence-bert: sentence embeddings using siamese bert-networks (2019). arXiv preprint arXiv:1908.10084

  17. Rose, S., Dave, E., Nick, C., Wendy, C.: Automatic keyword extraction from individual documents. Text Min. Appl. Theory 1, 1–20 (2010)

    Google Scholar 

  18. Saúde, J., Ramos, G., Caleiro, C., Kar, S.: Reputation-based ranking systems and their resistance to bribery. In: 2017 IEEE International Conference on Data Mining, ICDM 2017, pp. 1063–1068. IEEE Computer Society (2017)

    Google Scholar 

  19. Schröder, G., Thiele, M., Lehner, W.: Setting goals and choosing metrics for recommender system evaluations. In: UCERSTI2 workshop at the 5th ACM Conference on Recommender Systems, vol. 23, p. 53 (2011)

    Google Scholar 

  20. Sifatullah, S., Sharan, A.: Keyword and keyphrase extraction techniques: a literature review. Int. J. Comput. Appl. 109(2), 18–23 (2015)

    Google Scholar 

  21. West, J., Bogers, M.: Open innovation: current status and research opportunities. Innovation 19(1), 43–50 (2017)

    Article  Google Scholar 

  22. Wu, J., Choudhury, S.R., Chiatti, A., Liang, C., Giles, C.L.: Hesdk: a hybrid approach to extracting scientific domain knowledge entities. In: 2017 ACM/IEEE Joint Conference on Digital Libraries (JCDL), pp. 1–4 (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Francesca Maridina Malloci .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Malloci, F.M., Penadés, L.P., Boratto, L., Fenu, G. (2020). A Text Mining Approach to Extract and Rank Innovation Insights from Research Projects. In: Huang, Z., Beek, W., Wang, H., Zhou, R., Zhang, Y. (eds) Web Information Systems Engineering – WISE 2020. WISE 2020. Lecture Notes in Computer Science(), vol 12343. Springer, Cham. https://doi.org/10.1007/978-3-030-62008-0_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-62008-0_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-62007-3

  • Online ISBN: 978-3-030-62008-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics