skip to main content
10.1145/3524842.3528494acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
short-paper

The general index of software engineering papers

Published:17 October 2022Publication History

ABSTRACT

We introduce the General Index of Software Engineering Papers, a dataset of fulltext-indexed papers from the most prominent scientific venues in the field of Software Engineering. The dataset includes both complete bibliographic information and indexed n-grams (sequence of contiguous words after removal of stopwords and non-words, for a total of 577 276 382 unique n-grams in this release) with length 1 to 5 for 44 581 papers retrieved from 34 venues over the 1971--2020 period.

The dataset serves use cases in the field of meta-research, allowing to introspect the output of software engineering research even when access to papers or scholarly search engines is not possible (e.g., due to contractual reasons). The dataset also contributes to making such analyses reproducible and independently verifiable, as opposed to what happens when they are conducted using 3rd-party and non-open scholarly indexing services.

The dataset is available as a portable Postgres database dump and released as open data.

References

  1. Juan P. Alperin, Carol Muñoz Nieves, Lesley A. Schimanski, Gustavo E. Fischman, Meredith T. Niles, and Erin C. McKiernan. 2019. Meta-Research: How significant are the public dimensions of faculty work in review, promotion and tenure documents? eLife (Feb 2019). Google ScholarGoogle ScholarCross RefCross Ref
  2. Jens Peter Andersen, Mathias Wullum Nielsen, Nicole L. Simone, Resa E. Lewiss, and Reshma Jagsi. 2020. Meta-Research: COVID-19 medical papers have fewer women first authors than expected. eLife (Jun 2020). Google ScholarGoogle ScholarCross RefCross Ref
  3. Mário André de Freitas Farias, Renato Lima Novais, Methanias Colaço Júnior, Luis Paulo da Silva Carvalho, Manoel G. Mendonça, and Rodrigo Oliveira Spínola. 2016. A systematic mapping study on mining software repositories. In Proceedings of the 31st Annual ACM Symposium on Applied Computing, Pisa, Italy, April 4--8, 2016, Sascha Ossowski (Ed.). ACM, 1472--1479. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Serge Demeyer, Alessandro Murgia, Kevin Wyckmans, and Ahmed Lamkanfi. 2013. Happy birthday! a trend analysis on past MSR papers. In Proceedings of the 10th Working Conference on Mining Software Repositories, MSR '13, San Francisco, CA, USA, May 18--19, 2013, Thomas Zimmermann, Massimiliano Di Penta, and Sunghun Kim (Eds.). IEEE Computer Society, 353--362. Google ScholarGoogle ScholarCross RefCross Ref
  5. Holly Else. 2021. Giant, free index to world's research papers released online. Available online at https://www.nature.com/articles/d41586-021-02895-8, accessed 2021-12-15. Nature (Oct 2021). Google ScholarGoogle ScholarCross RefCross Ref
  6. Fatih Gurcan and Nergiz Ercil Cagiltay. 2020. Research trends on distance learning: a text mining-based literature review from 2008 to 2018. Interactive Learning Environments (Sep 2020), 1--22. Google ScholarGoogle ScholarCross RefCross Ref
  7. Gali Halevi, Henk F. Moed, and Judit Bar-Ilan. 2017. Suitability of Google Scholar as a source of scientific information andas a source of data for scientific evaluation - Review of the Literature. J. Informetrics 11, 3 (2017), 823--834. Google ScholarGoogle ScholarCross RefCross Ref
  8. Chun-Kai (Karl) Huang, Cameron Neylon, Richard Hosking, Lucy Montgomery, Katie S. Wilson, Alkim Ozaygen, and Chloe Brookes-Kenworthy. 2020. Meta-Research: Evaluating the impact of open access policies on research institutions. eLife (Sep 2020). Google ScholarGoogle ScholarCross RefCross Ref
  9. John P. A. Ioannidis. 2010. Meta-research: The art of getting it wrong. Res. Synth. Methods 1, 3--4 (Jul 2010), 169--184. Google ScholarGoogle ScholarCross RefCross Ref
  10. John P. A. Ioannidis, Daniele Fanelli, Debbie Drake Dunne, and Steven N. Goodman. 2015. Meta-research: Evaluation and Improvement of Research Methods and Practices. PLoS Biol. 13, 10 (Oct 2015). Google ScholarGoogle ScholarCross RefCross Ref
  11. Barbara A. Kitchenham, Pearl Brereton, David Budgen, Mark Turner, John Bailey, and Stephen G. Linkman. 2009. Systematic literature reviews in software engineering - A systematic literature review. Inf. Softw. Technol. 51, 1 (2009), 7--15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Barbara A. Kitchenham, Tore Dybå, and Magne Jørgensen. 2004. Evidence-Based Software Engineering. In 26th International Conference on Software Engineering (ICSE 2004), 23--28 May 2004, Edinburgh, United Kingdom, Anthony Finkelstein, Jacky Estublier, and David S. Rosenblum (Eds.). IEEE Computer Society, 273--281. Google ScholarGoogle ScholarCross RefCross Ref
  13. Zoe Kotti, Konstantinos Kravvaritis, Konstantina Dritsa, and Diomidis Spinellis. 2020. Standing on shoulders or feet? An extended study on the usage of the MSR data papers. Empir. Softw. Eng. 25, 5 (2020), 3288--3322. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Michael Ley. 2002. The DBLP Computer Science Bibliography: Evolution, Research Issues, Perspectives. In String Processing and Information Retrieval, 9th International Symposium, SPIRE 2002, Lisbon, Portugal, September 11--13, 2002, Proceedings (Lecture Notes in Computer Science, Vol. 2476), Alberto H. F. Laender and Arlindo L. Oliveira (Eds.). Springer, 1--10. Google ScholarGoogle ScholarCross RefCross Ref
  15. Patrice Lopez. 2008--2021. GROBID - GeneRation Of BIbliographic Data. https://grobid.readthedocs.io/ Accessed 2022-01-25.Google ScholarGoogle Scholar
  16. George Mathew, Amritanshu Agrawal, and Tim Menzies. 2018. Finding Trends in Software Research. IEEE Transactions on Software Engineering (2018). To appear. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. David G. Pina, Ivan Buljan, Darko Hren, and Ana Marušić. 2021. Meta-Research: A retrospective analysis of the peer review of more than 75,000 Marie Curie proposals between 2007 and 2018. eLife (Jan 2021). Google ScholarGoogle ScholarCross RefCross Ref
  18. Sanam Fayaz Sahito, Abdul Rehman Gilal, Rizwan Ali Abro, Ahmad Waqas, and Khisaluddin Shaikh. 2019. Research Publication Trends in Software Engineering. In 2019 13th International Conference on Mathematics, Actuarial Science, Computer Science and Statistics (MACS). 1--4. Google ScholarGoogle ScholarCross RefCross Ref
  19. Michael Stonebraker and Greg Kemnitz. 1991. The Postgres Next Generation Database Management System. Commun. ACM 34, 10 (1991), 78--92. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Peter Suber. 2003. Removing the barriers to research: an introduction to open access for librarians. College & research libraries news 64 (2003), 92--94. available at https://dash.harvard.edu/bitstream/handle/1/3715477/suber_crln.html.Google ScholarGoogle Scholar
  21. Xiaobing Sun, Xiangyue Liu, Bin Li, Yucong Duan, Hui Yang, and Jiajun Hu. 2016. Exploring topic models in software engineering data analysis: A survey. In 17th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, SNPD 2016, Shanghai, China, May 30 - June 1, 2016, Yihai Chen (Ed.). IEEE Computer Society, 357--362. Google ScholarGoogle ScholarCross RefCross Ref
  22. Dominika Tkaczyk, Andrew Collins, Paraic Sheridan, and Joeran Beel. 2018. Machine learning vs. rules and out-of-the-box vs. retrained: An evaluation of open-source bibliographic reference and citation parsers. In Proceedings of the 18th ACM/IEEE on joint conference on digital libraries. 99--108.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Bogdan Vasilescu, Alexander Serebrenik, and Tom Mens. 2013. A historical dataset of software engineering conferences. In Proceedings of the 10th Working Conference on Mining Software Repositories, MSR '13, San Francisco, CA, USA, May 18--19, 2013, Thomas Zimmermann, Massimiliano Di Penta, and Sunghun Kim (Eds.). IEEE Computer Society, 373--376. Google ScholarGoogle ScholarCross RefCross Ref
  24. Rita Vine. 2006. Google Scholar. J. Med. Libr. Assoc. 94, 1 (Jan 2006), 97. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1324783Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    MSR '22: Proceedings of the 19th International Conference on Mining Software Repositories
    May 2022
    815 pages
    ISBN:9781450393034
    DOI:10.1145/3524842

    Copyright © 2022 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 17 October 2022

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • short-paper

    Upcoming Conference

    ICSE 2025
  • Article Metrics

    • Downloads (Last 12 months)37
    • Downloads (Last 6 weeks)2

    Other Metrics

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader