Improving the Classification of Q&A Content for Android Fragmentation Using Named Entity Recognition

Rocha, Adriano Mendonça; de Almeida Maia, Marcelo

doi:10.1007/978-3-030-30244-3_60

Adriano Mendonça Rocha¹¹ &
Marcelo de Almeida Maia¹¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11805))

Included in the following conference series:

EPIA Conference on Artificial Intelligence

1611 Accesses

Abstract

Despite the huge amount of high quality information available in socio-technical sites, it is still challenging to filter out relevant piece of information to a specific task in hand. Textual content classification has been used to retrieve only relevant information to solve specific problems. However, those classifiers tend to present poor performance when the target classes have similar content. We aim at developing a Named Entity Recognizer (NER) model to recognize entities related to technical elements, and to improve textual classifiers for Android fragmentation posts from Stack Overflow using the obtained NER model. The proposed NER model was trained for the entities API version, device, hardware, API element, technology and feature. The proposed classifiers were trained using the recognized entities as attributes. To evaluate the performances of these classifiers, we compared them with other three textual classifiers. The obtained results show that the constructed NER model can recognize entities efficiently, as well as discover new entities that were not present in the training data. The classifiers constructed using the NER model produced better results than the other baseline classifiers. We suggest that NER-based classifiers should be considered as a better alternative to classify technical textual context compared to generic textual classifiers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://mallet.cs.umass.edu.

References

Bhasuran, B., Murugesan, G., Abdulkadhar, S., Natarajan, J.: Stacked ensemble combined with fuzzy matching for biomedical named entity recognition of diseases. J. Biomed. Inform. 64, 1–9 (2016)
Article Google Scholar
Campos, E.C., Souza, L.B.L., Maia, M.A.: Searching crowd knowledge to recommend solutions for API usage tasks. J. Softw.: Evol. Process. 28(10), 863–892 (2016)
Google Scholar
Campos, E.C., Maia, M.A.: Automatic categorization of questions from Q&A sites. In: Proceedings of the 29th Annual ACM SAC 2014, pp. 641–643 (2014)
Google Scholar
Dagenais, B., Robillard, M.P.: Using traceability links to recommend adaptive changes for documentation evolution. IEEE TSE 40(11), 1126–1146 (2014)
Google Scholar
Danger, R., Pla, F., Molina, A., Rosso, P.: Towards a protein-protein interaction information extraction system: recognizing named entities. Know.-Based Syst. 57, 104–118 (2014)
Article Google Scholar
Delfim, F.M., Paixão, K.V.R., Cassou, D., Maia, M.A.: Redocumenting APIs with crowd knowledge: a coverage analysis based on question types. J. Braz. Comput. Soc. 22(1), 9 (2016)
Article Google Scholar
Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by gibbs sampling. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, ACL 2005, pp. 363–370 (2005)
Google Scholar
Han, D., Zhang, C., Fan, X., Hindle, A., Wong, K., Stroulia, E.: Understanding Android fragmentation with topic analysis of vendor-specific bugs. In: Proceedings of the 2012 19th Working Conference on Reverse Engineering, WCRE 2012, pp. 83–92 (2012)
Google Scholar
Head, A., Appachu, C., Hearst, M.A., Hartmann, B.: Tutorons: generating context-relevant, on-demand explanations and demonstrations of online code. In: Proceedings of the 2015 IEEE Symposium on Visual Languages and Human-Centric Computing, VL/HCC, pp. 3–12 (2015)
Google Scholar
Liu, X., Zhang, S., Wei, F., Zhou, M.: Recognizing named entities in tweets. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, HLT 2011, vol. 1, pp. 359–367 (2011)
Google Scholar
Quimbaya, A.P., et al.: Named entity recognition over electronic health records through a combined dictionary-based approach. Procedia Comput. Sci. 100, 55–61 (2016)
Article Google Scholar
Robillard, M.P., Chhetri, Y.B.: Recommending reference API documentation. Empir. Softw. Eng. 20(6), 1558–1586 (2014)
Article Google Scholar
Rocha, A.M., Maia, M.A.: Automated API documentation with tutorials generated from stack overflow. In: Proceedings of the 30th Brazilian Symposium on Software Engineering, SBES 2016, pp. 33–42 (2016)
Google Scholar
Shabat, H., Omar, N., Rahem, K.: Named entity recognition in crime using machine learning approach. In: Jaafar, A., et al. (eds.) AIRS 2014. LNCS, vol. 8870, pp. 280–288. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-12844-3_24
Chapter Google Scholar
Souza, L., Campos, E., Madeiral, F., Paixão, K., Rocha, A., Maia, M.: Bootstrapping cookbooks for APIs from crowd knowledge on Stack Overflow. Inf. Softw. Technol. 111, 1–16 (2019)
Article Google Scholar
Souza, L.B.L., Campos, E.C., Maia, M.D.A.: Ranking crowd knowledge to assist software development. In: Proceedings of the 22nd International Conference on Program Comprehension, ICPC 2014, pp. 72–82 (2014)
Google Scholar
Treude, C., Robillard, M.P.: Augmenting API documentation with insights from stack overflow. In: Proceedings of the 38st International Conference on Software Engineering (2016)
Google Scholar
Wei, L., Liu, Y., Cheung, S.C.: Taming Android fragmentation: characterizing and detecting compatibility issues for Android Apps. In: Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering, ASE 2016, pp. 226–237 (2016)
Google Scholar
Yao, Y., Sun, A.: Mobile phone name extraction from Internet forums: a semi-supervised approach. World Wide Web 19(5), 783–805 (2016)
Article Google Scholar
Yoshida, K., Tsujii, J.: Reranking for biomedical named-entity recognition. In: Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing, BioNLP 2007, pp. 209–216 (2007)
Google Scholar
Zhong, H., Su, Z.: Detecting API documentation errors. In: Proceedings of the ACM SIGPLAN OOPSLA 2013, pp. 803–816 (2013)
Article Google Scholar

Download references

Acknowledgments

We acknowledge CAPES, FAPEMIG, and CNPq for partially funding this research.

Author information

Authors and Affiliations

Faculty of Computing, Federal University of Uberlândia, Campus Santa Mônica, Uberlândia, MG, 38400-902, Brazil
Adriano Mendonça Rocha & Marcelo de Almeida Maia

Authors

Adriano Mendonça Rocha
View author publications
You can also search for this author in PubMed Google Scholar
Marcelo de Almeida Maia
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Adriano Mendonça Rocha .

Editor information

Editors and Affiliations

INESC-TEC, University of Trás-os-Montes and Alto Douro, Vila Real, Portugal
Paulo Moura Oliveira
University of Minho, Braga, Portugal
Paulo Novais
LIACC/UP, University of Porto, Porto, Portugal
Luís Paulo Reis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rocha, A.M., de Almeida Maia, M. (2019). Improving the Classification of Q&A Content for Android Fragmentation Using Named Entity Recognition. In: Moura Oliveira, P., Novais, P., Reis, L. (eds) Progress in Artificial Intelligence. EPIA 2019. Lecture Notes in Computer Science(), vol 11805. Springer, Cham. https://doi.org/10.1007/978-3-030-30244-3_60

Download citation

DOI: https://doi.org/10.1007/978-3-030-30244-3_60
Published: 30 August 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30243-6
Online ISBN: 978-3-030-30244-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics