research-article

Generated abstracts: evaluating automatic text summarization for blog posts in gray literature studies

Authors:

Jorge Melegati,

Eduardo Guerra,

Igor Scaliante Wiese,

Xiaofeng WangAuthors Info & Claims

EASE '22: Proceedings of the 26th International Conference on Evaluation and Assessment in Software Engineering

Pages 282 - 287

https://doi.org/10.1145/3530019.3535308

Published: 13 June 2022 Publication History

Abstract

Background: Researchers in software engineering have increasingly added gray literature (GL) to primary and, especially, secondary studies. Several reasons explain this decision, such as grasping practitioners’ view on the topic under study. However, the use of GL in research poses several challenges like the amount and unstructured nature of data. The lack of automated tools and approaches to aid this task creates a bottleneck in selecting documents for inclusion. Aims: We investigate how summaries generated by PositionRank, an unsupervised text summarization approach, could support the inclusion analysis of documents in a GL study. Method: We performed an evaluation of using PositionRank to summarize documents analyzed on an ongoing study on software engineering. We compared the rating among two raters in a cross-over setup using summaries and full-text documents. We calculated their agreement, the precision and miss-rate using summaries against the full-text. The raters also discussed the documents on which they had conflicted answers and reached categories of reasons to explain the disagreements. Results: The results indicate that some inclusion criteria, which might be positively determined by few sentences, is susceptible to be misclassified when using summaries. Conclusions: Our study presents an analysis of the use of automatic summarization to support the inclusion assessment in gray literature studies discussing when this solution is viable. Our results could guide further studies in this direction.

References

[1]

Araly Barrera and Rakesh Verma. 2011. Automated extractive single-document summarization: Beating the baselines with a new approach. Proceedings of the 2011 ACM Symposium on Applied Computing - SAC’ 11 (2011), 268–269.

Digital Library

[2]

Vebjørn Berg, Jørgen Birkeland, Anh Nguyen-Duc, Ilias O. Pappas, and Letizia Jaccheri. 2018. Software startup engineering: A systematic mapping study. Journal of Systems and Software 144, February (oct 2018), 255–274.

Digital Library

[3]

Ziqiang Cao, Furu Wei, Wenjie Li, and Sujian Li. 2018. Faithful to the original: Fact-aware neural abstractive summarization. In 32nd AAAI Conference on Artificial Intelligence, AAAI 2018. 4784–4791.

[4]

Ellen Chisa. 2014. Evolution of the product manager. Commun. ACM 57, 11 (oct 2014), 48–52.

Digital Library

[5]

Guglielmo De Angelis and Francesca Lonetti. 2021. About the Assessment of Grey Literature in Software Engineering. In Evaluation and Assessment in Software Engineering. ACM, New York, NY, USA, 373–378.

[6]

Katia R Felizardo and Jeffrey C Carver. 2020. Automating Systematic Literature Review. In Contemporary Empirical Methods in Software Engineering. Springer International Publishing, Cham, 327–355.

[7]

Corina Florescu and Cornelia Caragea. 2017. PositionRank: An Unsupervised Approach to Keyphrase Extraction from Scholarly Documents. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Stroudsburg, PA, USA, 1105–1115.

[8]

Vahid Garousi, Michael Felderer, and Mika V. Mäntylä. 2019. Guidelines for including grey literature and conducting multivocal literature reviews in software engineering. Information and Software Technology 106 (2019), 101–121.

[9]

B.G. Glaser and A.L. Strauss. 2009. The Discovery of Grounded Theory: Strategies for Qualitative Research. Aldine Transaction.

[10]

Fernando Kamei, Gustavo Pinto, Igor Wiese, Márcio Ribeiro, and Sérgio Soares. 2021. What Evidence We Would Miss If We Do Not Use Grey Literature?Association for Computing Machinery, New York, NY, USA.

[11]

Fernando Kamei, Igor Wiese, Crescencio Lima, Ivanilton Polato, Vilmar Nepomuceno, Waldemar Ferreira, Márcio Ribeiro, Carolline Pena, Bruno Cartaxo, Gustavo Pinto, and Sérgio Soares. 2021. Grey Literature in Software Engineering: A critical review. Information and Software Technology 138 (2021), 106609.

[12]

Eriks Klotins, Michael Unterkalmsteiner, and Tony Gorschek. 2019. Software engineering in start-up companies: An analysis of 88 experience reports. Empirical Software Engineering 24, 1 (2019), 68–102.

Digital Library

[13]

Andrey Maglyas, Uolevi Nikula, and Kari Smolander. 2013. What are the roles of software product managers? An empirical investigation. Journal of Systems and Software 86, 12 (2013), 3071–3090.

Digital Library

[14]

Mary L McHugh. 2012. Interrater reliability: the kappa statistic. Biochemia medica 22, 3 (2012), 276–282.

[15]

Jorge Melegati, Eduardo Guerra, and Xiaofeng Wang. 2021. Understanding Hypotheses Engineering in Software Startups through a Gray Literature Review. Information and Software Technology 133 (2021), 106465.

[16]

Rada Mihalcea and Paul Tarau. 2004. TextRank: Bringing order into texts. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing.

[17]

Makbule Gulcin Ozsoy, Ferda Nur Alpaslan, and Ilyas Cicekli. 2011. Text summarization using Latent Semantic Analysis. Journal of Information Science 37, 4 (2011), 405–417.

Digital Library

[18]

Kai Petersen, Sairam Vakkalanka, and Ludwik Kuzniarz. 2015. Guidelines for conducting systematic mapping studies in software engineering: An update. Information and Software Technology 64 (2015), 1–18.

Digital Library

[19]

Austen Rainer. 2017. Using argumentation theory to analyse software practitioners’ defeasible evidence, inference and belief. Information and Software Technology 87 (2017), 62–80.

Digital Library

[20]

Joachim Schöpfel. 2010. Towards a Prague Definition of Grey Literature. In Twelfth International Conference on Grey Literature: Transparency in Grey Literature. Grey Tech Approaches to High Tech Issues. Prague, 6-7 December 2010. Czech Republic, 11–26.

[21]

Carolyn B. Seaman. 1999. Qualitative methods in empirical studies of software engineering. IEEE Transactions on Software Engineering 25, 4 (1999), 557–572.

Digital Library

[22]

Patrick Tierney. 2012. A qualitative analysis framework using natural language processing and graph theory. The International Review of Research in Open and Distributed Learning 13, 5 (nov 2012), 173.

[23]

Michael Unterkalmsteiner 2016. Software Startups - A Research Agenda. e-Informatica Software Engineering Journal 10, 1 (2016), 1–28.

[24]

Ashley Williams, Matthew Shardlow, and Austen Rainer. 2021. Towards a corpus for credibility assessment in software practitioner blog articles. In Evaluation and Assessment in Software Engineering. ACM, New York, NY, USA, 100–108.

[25]

Jasy Liew Suet Yan, Nancy McCracken, Shichun Zhou, and Kevin Crowston. 2014. Optimizing Features in Active Machine Learning for Complex Qualitative Content Analysis. In Proceedings of the ACL 2014 Workshop on Language Technologies and Computational Social Science. Association for Computational Linguistics, Stroudsburg, PA, USA, 44–48.

[26]

Affan Yasin, Rubia Fatima, Lijie Wen, Wasif Afzal, Muhammad Azhar, and Richard Torkar. 2020. On using grey literature and google scholar in systematic literature reviews in software engineering. IEEE Access 8(2020), 36226–36243.

Recommendations

Sentiment classification of blog posts using topical extracts
ADC '12: Proceedings of the Twenty-Third Australasian Database Conference - Volume 124

Unlike news stories and product reviews which usually have a strong focus on a single topic, blog posts are often unstructured, and opinions expressed in blog posts do not necessarily correspond to a specific topic. This can lead to unsatisfactory ...
Text summarization via hidden Markov models
SIGIR '01: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval

A sentence extract summary of a document is a subset of the document's sentences that contains the main ideas in the document. We present an approach to generating such summaries, a hidden Markov model that judges the likelihood that each sentence ...
Word-sentence co-ranking for automatic extractive text summarization

A principled word-sentence co-ranking model called CoRank is proposed.The convergence of CoRank with matrix notation is proved.A redundancy elimination technique is presented to further improve the performance of CoRank. Extractive summarization aims to ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

EASE '22: Proceedings of the 26th International Conference on Evaluation and Assessment in Software Engineering

June 2022

466 pages

ISBN:9781450396134

DOI:10.1145/3530019

Editors:
Miroslaw Staron
Chalmers | University of Gothenburg, Sweden
,
Christian Berger
Chalmers | University of Gothenburg, Sweden
,
Jocelyn Simmonds
University of Chile, Chile
,
Rafael Prikladnicki
School of Technology at PUCRS University, Brazil

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 June 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

EASE 2022

EASE 2022: The International Conference on Evaluation and Assessment in Software Engineering 2022

June 13 - 15, 2022

Gothenburg, Sweden

Acceptance Rates

Overall Acceptance Rate 71 of 232 submissions, 31%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
83
Total Downloads

Downloads (Last 12 months)11
Downloads (Last 6 weeks)2

Reflects downloads up to 17 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten