Abstract
The segmentation subtask of the NTCIR-14 QA Lab-PoliInfo task is finding a segment of text in assembly minutes that corresponds to a summary sentence. We divided the segmentation subtask into two steps, segmentation and search. Cue phrases were effectively used to detect segment boundaries. We compared five methods for detecting segment boundaries: a rule-based method, three supervised learning methods, and a novel semi-supervised learning method. The supervised models were trained using minutes data (in Japanese) we had segmented. In the search step, contiguous segments were concatenated to form larger segments, and the segment that maximized the value of a formula was selected as the answer. We compared the proposed formula with the conventional BM25 formula. We achieved the highest F-measure during the NTCIR-14 formal run despite our method’s simplicity.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Agirre, E., Diab, M., Cer, D., Gonzalez-Agirre, A.: Semeval-2012 task 6: a pilot on semantic textual similarity. In: Proceedings of the First Joint Conference on Lexical and Computational Semantics-Volume 1: Proceedings of the Main Conference and the Shared Task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation, pp. 385–393. Association for Computational Linguistics (2012)
Badjatiya, P., Kurisinkel, L.J., Gupta, M., Varma, V.: Attention-based neural text segmentation. In: Pasi, G., Piwowarski, B., Azzopardi, L., Hanbury, A. (eds.) ECIR 2018. LNCS, vol. 10772, pp. 180–193. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-76941-7_14
Beeferman, D., Berger, A., Lafferty, J.: Statistical models for text segmentation. Mach. Learn. 34(1–3), 177–210 (1999)
Choi, F.Y.Y.: Advances in domain independent linear text segmentation. In: Proceedings of the 1st North American Chapter of the Association for Computational Linguistics Conference, pp. 26–33 (2000)
Eisenstein, J.: Hierarchical text segmentation from multi-scale lexical cohesion. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 353–361. Association for Computational Linguistics (2009)
Eisenstein, J., Barzilay, R.: Bayesian unsupervised topic segmentation. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 334–343. Association for Computational Linguistics (2008)
Hearst, M.A.: Texttiling: segmenting text into multi-paragraph subtopic passages. Comput. linguist. 23(1), 33–64 (1997)
Kimura, T., Tagami, R., Katsuyama, H., Sugimoto, S., Miyamori, H.: KSU systems at the NTCIR-14 QA Lab-PoliInfo task. In: Proceedings of the 14th NTCIR Conference on Evaluation of Information Access Technologies, pp. 251–267 (2019)
Kimura, Y., et al.: Overview of the NTCIR-14 QA Lab-PoliInfo task. In: Proceedings of the 14th NTCIR Conference on Evaluation of Information Access Technologies, pp. 121–140 (2019)
Koshorek, O., Cohen, A., Mor, N., Rotman, M., Berant, J.: Text segmentation as a supervised learning task. In: NAACL-HLT (2018)
Lavrenko, V., Croft, W.B.: Relevance based language models. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2001, pp. 120–127. ACM, New York (2001)
Riloff, E., Jones, R., et al.: Learning dictionaries for information extraction by multi-level bootstrapping. In: AAAI/IAAI, pp. 474–479 (1999)
Robertson, S., Walker, S., Jones, S., Hancock-Beaulieu, M., Gatford, M.: Okapi at TREC-3. In: Proceedings of the Third Text REtrieval Conference (1994)
Suzuki, M., Matsuda, K., Sekine, S., Okazaki, N., Inui, K.: Neural joint learning for classifying Wikipedia articles into fine-grained named entity types. In: Proceedings of the 30th Pacific Asia Conference on Language, Information and Computation: Posters, pp. 535–544 (2016)
Terazawa, K., Shirato, D., Akiba, T., Masuyama, S.: AKBL at NTCIR-14 QA Lab-PoliInfo task. In: Proceedings of the 14th NTCIR Conference on Evaluation of Information Access Technologies, pp. 190–197 (2019)
Yang, P., Fang, H., Lin, J.: Anserini: enabling the use of Lucene for information retrieval research. In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1253–1256. ACM (2017)
Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., Hovy, E.: Hierarchical attention networks for document classification. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1480–1489 (2016)
Yokote, K., Iwayama, M.: NAMI question answering system at QA Lab-PoliInfo. In: Proceedings of the 14th NTCIR Conference on Evaluation of Information Access Technologies, pp. 278–288 (2019)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Kanasaki, K., Yong, J., Kawamura, S., Naitoh, S., Shinomiya, K. (2019). Cue-Phrase-Based Text Segmentation and Optimal Segment Concatenation for the NTCIR-14 QA Lab-PoliInfo Task. In: Kato, M., Liu, Y., Kando, N., Clarke, C. (eds) NII Testbeds and Community for Information Access Research. NTCIR 2019. Lecture Notes in Computer Science(), vol 11966. Springer, Cham. https://doi.org/10.1007/978-3-030-36805-0_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-36805-0_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-36804-3
Online ISBN: 978-3-030-36805-0
eBook Packages: Computer ScienceComputer Science (R0)