Sentence extraction with topic modeling for question–answer pair generation

Wu, Chung-Hsien; Liu, Chao-Hong; Su, Po-Hsun

doi:10.1007/s00500-014-1386-6

Sentence extraction with topic modeling for question–answer pair generation

Focus
Published: 30 July 2014

Volume 19, pages 39–46, (2015)
Cite this article

Soft Computing Aims and scope Submit manuscript

Chung-Hsien Wu¹,
Chao-Hong Liu² &
Po-Hsun Su¹

591 Accesses
5 Citations
Explore all metrics

Abstract

Recently, automatic QA pair generation has been an essential technique to reduce human involvement in the construction of QA systems. In a big data era, huge information is produced every day. Therefore, it is an important issue for QA systems to be able to respond to users with up-to-date information, e.g., to answer questions regarding recent posts on blogs. The major problem in building such systems is the efficiency to capture relevant text sources for specific QA domains. In this study, topic modeling is used as a means to help determine efficiently if an article is of the same topic as a specific domain of interest, e.g., health domain as exemplified in this paper. QA pairs are then generated from these selected articles using the proposed sentence extraction method. Experimental results show that, using the proposed method with topic modeling, a 7.3 % acceptance rate improvement on the generated questions was achieved.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Ali H, Chali Y et al (2010) Automation of question generation from sentences. Boyer & Piwek, pp 58–67
Bernhard D, De Viron L et al (2012) Question generation for French: collating parsers and paraphrasing questions. Dialogue Discourse 3(2):43–74
Bird S, Klein E et al (2009) Natural language processing with Python. O’Reilly Media Inc., Sebastopol, CA
Blei DM, Lafferty JD (2006) Dynamic topic models. In: Proceedings of the 23rd international conference on machine learning. ACM
Blei DM, Ng AY et al (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022
MATH Google Scholar
Chen K-J, Hsieh Y-M (2005) Chinese treebanks and grammar extraction. In: Natural language processing-IJCNLP 2004. Springer, pp 655–663
Chen K-J, Huang C-R et al (1996) Sinica corpus: design methodology for balanced corpora. Language 167:176
Google Scholar
Chen K-J, Luo C-C et al (1999) The CKIP Chinese Treebank: guidelines for annotation. ATALA Workshop-Treebanks, Paris
Google Scholar
Gildea D, Jurafsky D (2002) Automatic labeling of semantic roles. Comput. Linguist. 28(3):245–288
Graff D, Chen K (2003) Chinese Gigaword Corpus produced by Linguistic Data Consortium. LDC, Philadelphia, PA
Hakkani-Tur D, Tur G (2007) Statistical sentence extraction for information distillation. In: Acoustics, speech and signal processing, 2007. ICASSP 2007. IEEE International Conference on, IEEE
Huang C-R (2009) Tagged Chinese Gigaword version 2.0, LDC2009T14. Linguistic Data Consortium, Philadelphia, PA
Huang S-L, Chung Y-S et al (2008) E-HowNet—an expansion of HowNet. The First National HowNet Workshop, Beijing
Google Scholar
Kuyten P, Bickmore T et al (2012) Fully automated generation of question–answer Pairs for scripted virtual instruction. Intelligent Virtual Agents, Springer
Liu C-H, Wu C-H (2010) Sentence decomplexification using holistic aspect-based clause detection for long sentence understanding. In: 7th International Symposium on Chinese spoken language processing (ISCSLP). IEEE
Manyika J, Chui M et al (2011) Big data: the next frontier for innovation, competition, and productivity. McKinsey Global Institute, New York
Phan X-H, Nguyen C-T (2007) GibbsLDA++: a C/C++ implementation of latent Dirichlet allocation (LDA). Technical report
Rus V, Wyse B et al (2010) Overview of the first question generation shared task evaluation challenge. In: Proceedings of QG2010: the third workshop on question generation
Shawar BA, Atwell E (2007) Different measurements metrics to evaluate a chatbot system. In: Proceedings of the Workshop on bridging the gap: academic and industrial research in dialog technologies. Association for Computational Linguistics
Sun W, Sui Z et al (2009) Chinese semantic role labeling with shallow parsing. In: Proceedings of the 2009 Conference on empirical methods in natural language processing. Association for Computational Linguistics, Stroudsburg, PA
Tan Y, Yao T et al (2005) Applying conditional random fields to chinese shallow parsing. Computational linguistics and intelligent text processing. Springer, Berlin, Heidelberg, pp 167–176
Wu C-H, Liu C-H et al (2010) Sentence correction incorporating relative position and parse template language models. Audio Speech Lang Process IEEE Trans 18(6):1170–1181
You J-M, Chen K-J (2004) Automatic semantic role assignment for a tree structure. In: Proceedings of the 3rd SigHAN Workshop on Chinese language processing
Zhao S, Wang H et al (2011) Automatically generating questions from queries for community-based question answering. In: Proceedings of the 5th International Joint Conference on natural language processing

Download references

Author information

Authors and Affiliations

Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan, Taiwan
Chung-Hsien Wu & Po-Hsun Su
ICL, Speech, Language and Audio Processing Department, Industrial Technology Research Institute, Hsinchu, Taiwan
Chao-Hong Liu

Authors

Chung-Hsien Wu
View author publications
You can also search for this author in PubMed Google Scholar
Chao-Hong Liu
View author publications
You can also search for this author in PubMed Google Scholar
Po-Hsun Su
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chung-Hsien Wu.

Additional information

Communicated by L. Xie.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, CH., Liu, CH. & Su, PH. Sentence extraction with topic modeling for question–answer pair generation. Soft Comput 19, 39–46 (2015). https://doi.org/10.1007/s00500-014-1386-6

Download citation

Published: 30 July 2014
Issue Date: January 2015
DOI: https://doi.org/10.1007/s00500-014-1386-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sentence extraction with topic modeling for question–answer pair generation

Abstract

Access this article

Similar content being viewed by others

Automatic Question and Answer Generation from Thai Sentences

DSQA-LLM: Domain-Specific Intelligent Question Answering Based on Large Language Model

DR.QG: Enhancing Closed-Domain Question Answering via Retrieving Documents for Question Generation

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Sentence extraction with topic modeling for question–answer pair generation

Abstract

Access this article

Similar content being viewed by others

Automatic Question and Answer Generation from Thai Sentences

DSQA-LLM: Domain-Specific Intelligent Question Answering Based on Large Language Model

DR.QG: Enhancing Closed-Domain Question Answering via Retrieving Documents for Question Generation

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation