skip to main content
10.1145/3530019.3535308acmotherconferencesArticle/Chapter ViewAbstractPublication PageseaseConference Proceedingsconference-collections
research-article

Generated abstracts: evaluating automatic text summarization for blog posts in gray literature studies

Published: 13 June 2022 Publication History

Abstract

Background: Researchers in software engineering have increasingly added gray literature (GL) to primary and, especially, secondary studies. Several reasons explain this decision, such as grasping practitioners’ view on the topic under study. However, the use of GL in research poses several challenges like the amount and unstructured nature of data. The lack of automated tools and approaches to aid this task creates a bottleneck in selecting documents for inclusion. Aims: We investigate how summaries generated by PositionRank, an unsupervised text summarization approach, could support the inclusion analysis of documents in a GL study. Method: We performed an evaluation of using PositionRank to summarize documents analyzed on an ongoing study on software engineering. We compared the rating among two raters in a cross-over setup using summaries and full-text documents. We calculated their agreement, the precision and miss-rate using summaries against the full-text. The raters also discussed the documents on which they had conflicted answers and reached categories of reasons to explain the disagreements. Results: The results indicate that some inclusion criteria, which might be positively determined by few sentences, is susceptible to be misclassified when using summaries. Conclusions: Our study presents an analysis of the use of automatic summarization to support the inclusion assessment in gray literature studies discussing when this solution is viable. Our results could guide further studies in this direction.

References

[1]
Araly Barrera and Rakesh Verma. 2011. Automated extractive single-document summarization: Beating the baselines with a new approach. Proceedings of the 2011 ACM Symposium on Applied Computing - SAC’ 11 (2011), 268–269.
[2]
Vebjørn Berg, Jørgen Birkeland, Anh Nguyen-Duc, Ilias O. Pappas, and Letizia Jaccheri. 2018. Software startup engineering: A systematic mapping study. Journal of Systems and Software 144, February (oct 2018), 255–274.
[3]
Ziqiang Cao, Furu Wei, Wenjie Li, and Sujian Li. 2018. Faithful to the original: Fact-aware neural abstractive summarization. In 32nd AAAI Conference on Artificial Intelligence, AAAI 2018. 4784–4791.
[4]
Ellen Chisa. 2014. Evolution of the product manager. Commun. ACM 57, 11 (oct 2014), 48–52.
[5]
Guglielmo De Angelis and Francesca Lonetti. 2021. About the Assessment of Grey Literature in Software Engineering. In Evaluation and Assessment in Software Engineering. ACM, New York, NY, USA, 373–378.
[6]
Katia R Felizardo and Jeffrey C Carver. 2020. Automating Systematic Literature Review. In Contemporary Empirical Methods in Software Engineering. Springer International Publishing, Cham, 327–355.
[7]
Corina Florescu and Cornelia Caragea. 2017. PositionRank: An Unsupervised Approach to Keyphrase Extraction from Scholarly Documents. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Stroudsburg, PA, USA, 1105–1115.
[8]
Vahid Garousi, Michael Felderer, and Mika V. Mäntylä. 2019. Guidelines for including grey literature and conducting multivocal literature reviews in software engineering. Information and Software Technology 106 (2019), 101–121.
[9]
B.G. Glaser and A.L. Strauss. 2009. The Discovery of Grounded Theory: Strategies for Qualitative Research. Aldine Transaction.
[10]
Fernando Kamei, Gustavo Pinto, Igor Wiese, Márcio Ribeiro, and Sérgio Soares. 2021. What Evidence We Would Miss If We Do Not Use Grey Literature?Association for Computing Machinery, New York, NY, USA.
[11]
Fernando Kamei, Igor Wiese, Crescencio Lima, Ivanilton Polato, Vilmar Nepomuceno, Waldemar Ferreira, Márcio Ribeiro, Carolline Pena, Bruno Cartaxo, Gustavo Pinto, and Sérgio Soares. 2021. Grey Literature in Software Engineering: A critical review. Information and Software Technology 138 (2021), 106609.
[12]
Eriks Klotins, Michael Unterkalmsteiner, and Tony Gorschek. 2019. Software engineering in start-up companies: An analysis of 88 experience reports. Empirical Software Engineering 24, 1 (2019), 68–102.
[13]
Andrey Maglyas, Uolevi Nikula, and Kari Smolander. 2013. What are the roles of software product managers? An empirical investigation. Journal of Systems and Software 86, 12 (2013), 3071–3090.
[14]
Mary L McHugh. 2012. Interrater reliability: the kappa statistic. Biochemia medica 22, 3 (2012), 276–282.
[15]
Jorge Melegati, Eduardo Guerra, and Xiaofeng Wang. 2021. Understanding Hypotheses Engineering in Software Startups through a Gray Literature Review. Information and Software Technology 133 (2021), 106465.
[16]
Rada Mihalcea and Paul Tarau. 2004. TextRank: Bringing order into texts. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing.
[17]
Makbule Gulcin Ozsoy, Ferda Nur Alpaslan, and Ilyas Cicekli. 2011. Text summarization using Latent Semantic Analysis. Journal of Information Science 37, 4 (2011), 405–417.
[18]
Kai Petersen, Sairam Vakkalanka, and Ludwik Kuzniarz. 2015. Guidelines for conducting systematic mapping studies in software engineering: An update. Information and Software Technology 64 (2015), 1–18.
[19]
Austen Rainer. 2017. Using argumentation theory to analyse software practitioners’ defeasible evidence, inference and belief. Information and Software Technology 87 (2017), 62–80.
[20]
Joachim Schöpfel. 2010. Towards a Prague Definition of Grey Literature. In Twelfth International Conference on Grey Literature: Transparency in Grey Literature. Grey Tech Approaches to High Tech Issues. Prague, 6-7 December 2010. Czech Republic, 11–26.
[21]
Carolyn B. Seaman. 1999. Qualitative methods in empirical studies of software engineering. IEEE Transactions on Software Engineering 25, 4 (1999), 557–572.
[22]
Patrick Tierney. 2012. A qualitative analysis framework using natural language processing and graph theory. The International Review of Research in Open and Distributed Learning 13, 5 (nov 2012), 173.
[23]
Michael Unterkalmsteiner 2016. Software Startups - A Research Agenda. e-Informatica Software Engineering Journal 10, 1 (2016), 1–28.
[24]
Ashley Williams, Matthew Shardlow, and Austen Rainer. 2021. Towards a corpus for credibility assessment in software practitioner blog articles. In Evaluation and Assessment in Software Engineering. ACM, New York, NY, USA, 100–108.
[25]
Jasy Liew Suet Yan, Nancy McCracken, Shichun Zhou, and Kevin Crowston. 2014. Optimizing Features in Active Machine Learning for Complex Qualitative Content Analysis. In Proceedings of the ACL 2014 Workshop on Language Technologies and Computational Social Science. Association for Computational Linguistics, Stroudsburg, PA, USA, 44–48.
[26]
Affan Yasin, Rubia Fatima, Lijie Wen, Wasif Afzal, Muhammad Azhar, and Richard Torkar. 2020. On using grey literature and google scholar in systematic literature reviews in software engineering. IEEE Access 8(2020), 36226–36243.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
EASE '22: Proceedings of the 26th International Conference on Evaluation and Assessment in Software Engineering
June 2022
466 pages
ISBN:9781450396134
DOI:10.1145/3530019
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 June 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. automatic summarization
  2. empirical software engineering
  3. gray literature
  4. natural language processing
  5. position rank

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

EASE 2022

Acceptance Rates

Overall Acceptance Rate 71 of 232 submissions, 31%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 83
    Total Downloads
  • Downloads (Last 12 months)11
  • Downloads (Last 6 weeks)2
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media