skip to main content
10.1145/3322640.3326711acmconferencesArticle/Chapter ViewAbstractPublication PagesicailConference Proceedingsconference-collections
research-article

A Reliable and Accurate Multiple Choice Question Answering System for Due Diligence

Published: 17 June 2019 Publication History

Abstract

The problem of answering multiple choice questions, based on the content of documents has been studied extensively in the machine learning literature. We pose the due diligence problem, where lawyers study legal contracts and assess the risk in potential mergers and acquisitions, as a multiple choice question answering problem, based on the text of the contract. Existing frameworks for question answering are not suitable for this task, due to the inherent scarcity and imbalance in the legal contract data available for training. We propose a question answering system which first identifies the excerpt in the contract which potentially contains the answer to a given question, and then builds a multi-class classifier to choose the answer to the question, based on the content of this excerpt. Unlike existing question answering systems, the proposed system explicitly handles the imbalance in the data, by generating synthetic instances of the minority answer categories, using the Synthetic Minority Oversampling Technique. This ensures that the number of instances in all the classes are roughly equal to each other, thus leading to more accurate and reliable classification. We demonstrate that the proposed question answering system outperforms the existing systems with minimal amount of training data.

References

[1]
Jon Louis Bentley. 1975. Multidimensional Binary Search Trees Used for Associative Searching. Commun. ACM 18, 9 (1975), 509--517.
[2]
Christopher M Bishop. 2008. Pattern Recognition and Machine Learning. Springer.
[3]
Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, and W Philip Kegelmeyer. 2002. SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research 16 (2002), 321--357.
[4]
Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014).
[5]
Eunsol Choi, Daniel Hewlett, Jakob Uszkoreit, Illia Polosukhin, Alexandre Lacoste, and Jonathan Berant. 2017. Coarse-to-fine question answering for long documents. In Proceedings of the Association for Computational Linguistics, Vol. 1. 209--220.
[6]
Phong-Khac Do, Huy-Tien Nguyen, Chien-Xuan Tran, Minh-Tien Nguyen, and Minh-Le Nguyen. 2017. Legal question answering using ranking SVM and deep convolutional neural network. arXiv preprint arXiv:1703.05320 (2017).
[7]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735--1780.
[8]
Thorsten Joachims. 2006. Training linear SVMs in linear time. In Proceedings of the International Conference on Knowledge Discovery and Data mining. 217--226.
[9]
Mi-Young Kim, Ying Xu, and Randy Goebel. 2015. Applying a convolutional neural network to legal question answering. In JSAI International Symposium on Artificial Intelligence. 282--294.
[10]
John Lafferty, Andrew McCallum, and Fernando CN Pereira. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the International Conference on Machine Learning. 282--289.
[11]
Huayu Li, Martin Renqiang Min, Yong Ge, and Asim Kadav. 2017. A Context-aware Attention Network for Interactive Question Answering. In Proceedings of the International Conference on Knowledge Discovery and Data mining. 927--935.
[12]
Fenglong Ma, Radha Chitta, Saurabh Kataria, Jing Zhou, Palghat Ramesh, Tong Sun, and Jing Gao. 2017. Long-Term Memory Networks for Question Answering. arXiv preprint arXiv:1707.01961 (2017).
[13]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems. 3111--3119.
[14]
Amit Mishra and Sanjay Kumar Jain. 2016. A survey on question answering systems with classification. Journal of King Saud University Computer and Information Sciences 28, 3 (2016), 345--361.
[15]
Naoaki Okazaki. 2007. CRFsuite: a fast implementation of Conditional Random Fields. http://www.chokkan.org/software/crfsuite.
[16]
Adam Roegiest, Alexander K Hudek, and Anne McNulty. 2018. A Dataset and an Examination of Identifying Passages for Due Diligence. In International ACM SIGIR Conference on Research & Development in Information Retrieval. 465--474.
[17]
Hinrich Schütze, Christopher D Manning, and Prabhakar Raghavan. 2008. Introduction to information retrieval. Vol. 39. Cambridge University Press.
[18]
James A Sherer, Taylor M Hoffman, and Eugenio E Ortiz. 2015. Merger and Acquisition Due Diligence: A Proposed Framework to Incorporate Data Privacy, Information Security, E-Discovery, and Information Governance into Due Diligence Practices. Richmond Journal of Law & Technology 21, 2 (2015), 5.
[19]
Sainbayar Sukhbaatar, Jason Weston, Rob Fergus, et al. 2015. End-to-end memory networks. In Advances in Neural Information Processing Systems. 2440--2448.
[20]
Boyu Wang and Joelle Pineau. 2016. Online bagging and boosting for imbalanced data streams. IEEE Transactions on Knowledge & Data Engineering 1 (2016).

Cited By

View all

Index Terms

  1. A Reliable and Accurate Multiple Choice Question Answering System for Due Diligence

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        ICAIL '19: Proceedings of the Seventeenth International Conference on Artificial Intelligence and Law
        June 2019
        312 pages
        ISBN:9781450367547
        DOI:10.1145/3322640
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Sponsors

        In-Cooperation

        • Univ. of Montreal: University of Montreal
        • AAAI
        • IAAIL: Intl Asso for Artifical Intel & Law

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 17 June 2019

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. Due Diligence
        2. Imbalance Handling
        3. Question Answering

        Qualifiers

        • Research-article
        • Research
        • Refereed limited

        Conference

        ICAIL '19
        Sponsor:

        Acceptance Rates

        Overall Acceptance Rate 69 of 169 submissions, 41%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)14
        • Downloads (Last 6 weeks)2
        Reflects downloads up to 08 Mar 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2023)A survey on legal question–answering systemsComputer Science Review10.1016/j.cosrev.2023.10055248:COnline publication date: 1-May-2023
        • (2022)On the Effectiveness of Pre-Trained Language Models for Legal Natural Language Processing: An Empirical StudyIEEE Access10.1109/ACCESS.2022.319040810(75835-75858)Online publication date: 2022
        • (2022)An Approach for Multiple Choice Question Answering SystemNature of Computation and Communication10.1007/978-3-030-92942-8_7(76-82)Online publication date: 3-Jan-2022
        • (2021)Using transformers to improve answer retrieval for legal questionsProceedings of the Eighteenth International Conference on Artificial Intelligence and Law10.1145/3462757.3466102(245-249)Online publication date: 21-Jun-2021

        View Options

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Figures

        Tables

        Media

        Share

        Share

        Share this Publication link

        Share on social media