Abstract
Recently, automatic QA pair generation has been an essential technique to reduce human involvement in the construction of QA systems. In a big data era, huge information is produced every day. Therefore, it is an important issue for QA systems to be able to respond to users with up-to-date information, e.g., to answer questions regarding recent posts on blogs. The major problem in building such systems is the efficiency to capture relevant text sources for specific QA domains. In this study, topic modeling is used as a means to help determine efficiently if an article is of the same topic as a specific domain of interest, e.g., health domain as exemplified in this paper. QA pairs are then generated from these selected articles using the proposed sentence extraction method. Experimental results show that, using the proposed method with topic modeling, a 7.3 % acceptance rate improvement on the generated questions was achieved.








Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Ali H, Chali Y et al (2010) Automation of question generation from sentences. Boyer & Piwek, pp 58–67
Bernhard D, De Viron L et al (2012) Question generation for French: collating parsers and paraphrasing questions. Dialogue Discourse 3(2):43–74
Bird S, Klein E et al (2009) Natural language processing with Python. O’Reilly Media Inc., Sebastopol, CA
Blei DM, Lafferty JD (2006) Dynamic topic models. In: Proceedings of the 23rd international conference on machine learning. ACM
Blei DM, Ng AY et al (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022
Chen K-J, Hsieh Y-M (2005) Chinese treebanks and grammar extraction. In: Natural language processing-IJCNLP 2004. Springer, pp 655–663
Chen K-J, Huang C-R et al (1996) Sinica corpus: design methodology for balanced corpora. Language 167:176
Chen K-J, Luo C-C et al (1999) The CKIP Chinese Treebank: guidelines for annotation. ATALA Workshop-Treebanks, Paris
Gildea D, Jurafsky D (2002) Automatic labeling of semantic roles. Comput. Linguist. 28(3):245–288
Graff D, Chen K (2003) Chinese Gigaword Corpus produced by Linguistic Data Consortium. LDC, Philadelphia, PA
Hakkani-Tur D, Tur G (2007) Statistical sentence extraction for information distillation. In: Acoustics, speech and signal processing, 2007. ICASSP 2007. IEEE International Conference on, IEEE
Huang C-R (2009) Tagged Chinese Gigaword version 2.0, LDC2009T14. Linguistic Data Consortium, Philadelphia, PA
Huang S-L, Chung Y-S et al (2008) E-HowNet—an expansion of HowNet. The First National HowNet Workshop, Beijing
Kuyten P, Bickmore T et al (2012) Fully automated generation of question–answer Pairs for scripted virtual instruction. Intelligent Virtual Agents, Springer
Liu C-H, Wu C-H (2010) Sentence decomplexification using holistic aspect-based clause detection for long sentence understanding. In: 7th International Symposium on Chinese spoken language processing (ISCSLP). IEEE
Manyika J, Chui M et al (2011) Big data: the next frontier for innovation, competition, and productivity. McKinsey Global Institute, New York
Phan X-H, Nguyen C-T (2007) GibbsLDA++: a C/C++ implementation of latent Dirichlet allocation (LDA). Technical report
Rus V, Wyse B et al (2010) Overview of the first question generation shared task evaluation challenge. In: Proceedings of QG2010: the third workshop on question generation
Shawar BA, Atwell E (2007) Different measurements metrics to evaluate a chatbot system. In: Proceedings of the Workshop on bridging the gap: academic and industrial research in dialog technologies. Association for Computational Linguistics
Sun W, Sui Z et al (2009) Chinese semantic role labeling with shallow parsing. In: Proceedings of the 2009 Conference on empirical methods in natural language processing. Association for Computational Linguistics, Stroudsburg, PA
Tan Y, Yao T et al (2005) Applying conditional random fields to chinese shallow parsing. Computational linguistics and intelligent text processing. Springer, Berlin, Heidelberg, pp 167–176
Wu C-H, Liu C-H et al (2010) Sentence correction incorporating relative position and parse template language models. Audio Speech Lang Process IEEE Trans 18(6):1170–1181
You J-M, Chen K-J (2004) Automatic semantic role assignment for a tree structure. In: Proceedings of the 3rd SigHAN Workshop on Chinese language processing
Zhao S, Wang H et al (2011) Automatically generating questions from queries for community-based question answering. In: Proceedings of the 5th International Joint Conference on natural language processing
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by L. Xie.
Rights and permissions
About this article
Cite this article
Wu, CH., Liu, CH. & Su, PH. Sentence extraction with topic modeling for question–answer pair generation. Soft Comput 19, 39–46 (2015). https://doi.org/10.1007/s00500-014-1386-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-014-1386-6