Skip to main content

Improving the Classification of Q&A Content for Android Fragmentation Using Named Entity Recognition

  • Conference paper
  • First Online:
Progress in Artificial Intelligence (EPIA 2019)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11805))

Included in the following conference series:

  • 1611 Accesses

Abstract

Despite the huge amount of high quality information available in socio-technical sites, it is still challenging to filter out relevant piece of information to a specific task in hand. Textual content classification has been used to retrieve only relevant information to solve specific problems. However, those classifiers tend to present poor performance when the target classes have similar content. We aim at developing a Named Entity Recognizer (NER) model to recognize entities related to technical elements, and to improve textual classifiers for Android fragmentation posts from Stack Overflow using the obtained NER model. The proposed NER model was trained for the entities API version, device, hardware, API element, technology and feature. The proposed classifiers were trained using the recognized entities as attributes. To evaluate the performances of these classifiers, we compared them with other three textual classifiers. The obtained results show that the constructed NER model can recognize entities efficiently, as well as discover new entities that were not present in the training data. The classifiers constructed using the NER model produced better results than the other baseline classifiers. We suggest that NER-based classifiers should be considered as a better alternative to classify technical textual context compared to generic textual classifiers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://mallet.cs.umass.edu.

References

  1. Bhasuran, B., Murugesan, G., Abdulkadhar, S., Natarajan, J.: Stacked ensemble combined with fuzzy matching for biomedical named entity recognition of diseases. J. Biomed. Inform. 64, 1–9 (2016)

    Article  Google Scholar 

  2. Campos, E.C., Souza, L.B.L., Maia, M.A.: Searching crowd knowledge to recommend solutions for API usage tasks. J. Softw.: Evol. Process. 28(10), 863–892 (2016)

    Google Scholar 

  3. Campos, E.C., Maia, M.A.: Automatic categorization of questions from Q&A sites. In: Proceedings of the 29th Annual ACM SAC 2014, pp. 641–643 (2014)

    Google Scholar 

  4. Dagenais, B., Robillard, M.P.: Using traceability links to recommend adaptive changes for documentation evolution. IEEE TSE 40(11), 1126–1146 (2014)

    Google Scholar 

  5. Danger, R., Pla, F., Molina, A., Rosso, P.: Towards a protein-protein interaction information extraction system: recognizing named entities. Know.-Based Syst. 57, 104–118 (2014)

    Article  Google Scholar 

  6. Delfim, F.M., Paixão, K.V.R., Cassou, D., Maia, M.A.: Redocumenting APIs with crowd knowledge: a coverage analysis based on question types. J. Braz. Comput. Soc. 22(1), 9 (2016)

    Article  Google Scholar 

  7. Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by gibbs sampling. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, ACL 2005, pp. 363–370 (2005)

    Google Scholar 

  8. Han, D., Zhang, C., Fan, X., Hindle, A., Wong, K., Stroulia, E.: Understanding Android fragmentation with topic analysis of vendor-specific bugs. In: Proceedings of the 2012 19th Working Conference on Reverse Engineering, WCRE 2012, pp. 83–92 (2012)

    Google Scholar 

  9. Head, A., Appachu, C., Hearst, M.A., Hartmann, B.: Tutorons: generating context-relevant, on-demand explanations and demonstrations of online code. In: Proceedings of the 2015 IEEE Symposium on Visual Languages and Human-Centric Computing, VL/HCC, pp. 3–12 (2015)

    Google Scholar 

  10. Liu, X., Zhang, S., Wei, F., Zhou, M.: Recognizing named entities in tweets. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, HLT 2011, vol. 1, pp. 359–367 (2011)

    Google Scholar 

  11. Quimbaya, A.P., et al.: Named entity recognition over electronic health records through a combined dictionary-based approach. Procedia Comput. Sci. 100, 55–61 (2016)

    Article  Google Scholar 

  12. Robillard, M.P., Chhetri, Y.B.: Recommending reference API documentation. Empir. Softw. Eng. 20(6), 1558–1586 (2014)

    Article  Google Scholar 

  13. Rocha, A.M., Maia, M.A.: Automated API documentation with tutorials generated from stack overflow. In: Proceedings of the 30th Brazilian Symposium on Software Engineering, SBES 2016, pp. 33–42 (2016)

    Google Scholar 

  14. Shabat, H., Omar, N., Rahem, K.: Named entity recognition in crime using machine learning approach. In: Jaafar, A., et al. (eds.) AIRS 2014. LNCS, vol. 8870, pp. 280–288. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-12844-3_24

    Chapter  Google Scholar 

  15. Souza, L., Campos, E., Madeiral, F., Paixão, K., Rocha, A., Maia, M.: Bootstrapping cookbooks for APIs from crowd knowledge on Stack Overflow. Inf. Softw. Technol. 111, 1–16 (2019)

    Article  Google Scholar 

  16. Souza, L.B.L., Campos, E.C., Maia, M.D.A.: Ranking crowd knowledge to assist software development. In: Proceedings of the 22nd International Conference on Program Comprehension, ICPC 2014, pp. 72–82 (2014)

    Google Scholar 

  17. Treude, C., Robillard, M.P.: Augmenting API documentation with insights from stack overflow. In: Proceedings of the 38st International Conference on Software Engineering (2016)

    Google Scholar 

  18. Wei, L., Liu, Y., Cheung, S.C.: Taming Android fragmentation: characterizing and detecting compatibility issues for Android Apps. In: Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering, ASE 2016, pp. 226–237 (2016)

    Google Scholar 

  19. Yao, Y., Sun, A.: Mobile phone name extraction from Internet forums: a semi-supervised approach. World Wide Web 19(5), 783–805 (2016)

    Article  Google Scholar 

  20. Yoshida, K., Tsujii, J.: Reranking for biomedical named-entity recognition. In: Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing, BioNLP 2007, pp. 209–216 (2007)

    Google Scholar 

  21. Zhong, H., Su, Z.: Detecting API documentation errors. In: Proceedings of the ACM SIGPLAN OOPSLA 2013, pp. 803–816 (2013)

    Article  Google Scholar 

Download references

Acknowledgments

We acknowledge CAPES, FAPEMIG, and CNPq for partially funding this research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Adriano Mendonça Rocha .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Rocha, A.M., de Almeida Maia, M. (2019). Improving the Classification of Q&A Content for Android Fragmentation Using Named Entity Recognition. In: Moura Oliveira, P., Novais, P., Reis, L. (eds) Progress in Artificial Intelligence. EPIA 2019. Lecture Notes in Computer Science(), vol 11805. Springer, Cham. https://doi.org/10.1007/978-3-030-30244-3_60

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-30244-3_60

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-30243-6

  • Online ISBN: 978-3-030-30244-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics