skip to main content
10.1145/3299815.3314470acmconferencesArticle/Chapter ViewAbstractPublication Pagesacm-seConference Proceedingsconference-collections
short-paper

ALPACA: Advanced Linguistic Pattern and Concept Analysis Framework for Software Engineering Corpora

Published: 18 April 2019 Publication History

Abstract

Software engineering corpora often contain domain-specific topics and linguistic patterns. Popular text analysis tools are not specially designed to accommodate such topics and patterns. In this paper, we introduce ALPACA, a novel, customizable text analysis framework. The main function of ALPACA is to analyze topics and their trends in a text corpus. It allows users to define a topic with a few initial domain-specific keywords and expand it into a much larger set of similar topic words. This new set of words can be further expanded into a set of self-contained phrases to describe the topic more precisely. ALPACA extracts those phrases by matching input sentences with linguistic patterns, which are long sequences mixing both specific words and part-of-speech tags frequently appeared in the corpus. In this paper, we demonstrate using ALPACA to continue analyzing CVE security reports and detect a new topic of mobile device's vulnerability. Youtube link: https://wwwyoutube.com/watch?v=UTcMYb2o1pU

References

[1]
B. Baldwin and B. Carpenter. "LingPipe". Available from World Wide Web: http://alias-i.com/lingpipe 2003.
[2]
D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent Dirichlet Allocation. JMLR 2003.
[3]
N. Chen, J. Lin, S. C. Hoi, X. Xiao, and B. Zhang. AR-Miner: Mining Informative Reviews for Developers from Mobile App Marketplace. In ICSE 2014. Hyderabad, India.
[4]
E. Loper and S. Bird. NLTK: the Natural Language Toolkit. arXiv preprint cs/0205028 2002.
[5]
M.C. De Marneffe, B. MacCartney, C. D. Manning, et al.Generating Typed Dependency Parses from Phrase Structure Parses. In LREC 2006. Genoa, Italy.
[6]
T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient Estimation of Word Representations in Vector Space. arXiv preprint arXiv:1301.3781 2013.
[7]
G. Miller and C. Fellbaum. Wordnet: An Electronic Lexical Database. MIT Press 1998.
[8]
S. Neuhaus and T. Zimmermann. Security Trend Analysis with CVE Topic Models. In ISSRE 2010. San Jose, CA, USA.
[9]
M. Porter and R. Boulton. "Snowball Stemmer" 2001.
[10]
P. Vu, T. Nguyen, H. Pham, and T. Nguyen. Mining User Opinions in Mobile App Reviews: A Keyword-based Approach (t). In ASE 2015. Lincoln, NE, USA.
[11]
P. Vu, H. Pham, T. Nguyen, and T. Nguyen. Tool Support for Analyzing Mobile App Reviews. In ASE 2015. Lincoln, NE, USA.
[12]
P. Vu, H. Pham, T. Nguyen, and T. Nguyen. Phrase-based Extraction of User Opinions in Mobile App Reviews. In ASE 2016. Singapore, Singapore.

Cited By

View all
  • (2024)Xiaoqing: A Q&A model for glaucoma based on LLMsComputers in Biology and Medicine10.1016/j.compbiomed.2024.108399174(108399)Online publication date: May-2024
  • (2023)Extraction of Phrase-based Concepts in Vulnerability Descriptions through Unsupervised LabelingACM Transactions on Software Engineering and Methodology10.1145/357963832:5(1-45)Online publication date: 22-Jul-2023
  • (2021)Automatic Part-of-Speech Tagging for Security Vulnerability Descriptions2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR)10.1109/MSR52588.2021.00016(29-40)Online publication date: May-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ACMSE '19: Proceedings of the 2019 ACM Southeast Conference
April 2019
295 pages
ISBN:9781450362511
DOI:10.1145/3299815
  • Conference Chair:
  • Dan Lo,
  • Program Chair:
  • Donghyun Kim,
  • Publications Chair:
  • Eric Gamess
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 April 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Natural Language Processing
  2. Pattern
  3. Software Engineering

Qualifiers

  • Short-paper
  • Research
  • Refereed limited

Conference

ACM SE '19
Sponsor:
ACM SE '19: 2019 ACM Southeast Conference
April 18 - 20, 2019
GA, Kennesaw, USA

Acceptance Rates

Overall Acceptance Rate 502 of 1,023 submissions, 49%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 15 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Xiaoqing: A Q&A model for glaucoma based on LLMsComputers in Biology and Medicine10.1016/j.compbiomed.2024.108399174(108399)Online publication date: May-2024
  • (2023)Extraction of Phrase-based Concepts in Vulnerability Descriptions through Unsupervised LabelingACM Transactions on Software Engineering and Methodology10.1145/357963832:5(1-45)Online publication date: 22-Jul-2023
  • (2021)Automatic Part-of-Speech Tagging for Security Vulnerability Descriptions2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR)10.1109/MSR52588.2021.00016(29-40)Online publication date: May-2021
  • (2021)Unsupervised labeling and extraction of phrase-based concepts in vulnerability descriptionsProceedings of the 36th IEEE/ACM International Conference on Automated Software Engineering10.1109/ASE51524.2021.9678638(943-954)Online publication date: 15-Nov-2021

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media