Skip to main content
Log in

Exploring Prompting Approaches in Legal Textual Entailment

  • Article
  • Published:
The Review of Socionetwork Strategies Aims and scope Submit manuscript

Abstract

We report explorations into prompt engineering with large pre-trained language models that were not fine-tuned to solve the legal entailment task (Task 4) of the 2023 COLIEE competition. Our most successful strategy used simple text similarity measures to retrieve articles and queries from the training set. We report on our efforts to optimize performance with both OpenAI’s GPT-4 and FLaN-T5. We also used an ensemble approach to find the best combination of models and prompts. Finally, we analyze our results and suggest ideas for future improvements.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. https://openai.com/research/gpt-4.

  2. https://platform.openai.com/docs/api-reference/chat.

  3. https://platform.openai.com/docs/guides/chat.

  4. https://huggingface.co/google/flan-t5-xxl.

  5. https://huggingface.co/bigscience/T0pp.

  6. https://huggingface.co/declare-lab/flan-alpaca-xxl.

  7. Because sampling was disabled, the temperature does not affect these models’ predictions.

  8. As we suspected our GPT-4 submission would likely be disqualified, we chose not to use this model in the ensemble.

  9. We refer the reader to the respective papers for details on how each model was trained. Note that, at the time of publication, OpenAI has released no details on what data GPT-4 was trained on.

  10. Although the exact number of parameters in GPT-4 is unknown, it is likely to be on the order of hundreds of billions, given the known size of GPT-3.

  11. ‘gpt-3.5-turbo’.

  12. https://github.com/Advancing-Machine-Human-Reasoning-Lab/COLIEE-2023-Task4.

References

  1. Hart, H. (1961). The concept of law. Clarendon Press.

    Google Scholar 

  2. Franklin, J. (2012). How much of commonsense and legal reasoning is formalizable? A Review of Conceptual Obstacles Law, Probability and Risk, 11(2–3), 225.

    Article  Google Scholar 

  3. Prakken, H. (2017). On the problem of making autonomous vehicles conform to traffic law. Artificial Intelligence and Law, 25(3), 341.

    Article  Google Scholar 

  4. Lawless, W. F., Mittu, R., & Sofge, D. A. (Eds.). (2020). Human-machine shared contexts. NY: Academic Press.

    Google Scholar 

  5. Licato. J., Marji, Z., & Abraham, S. (2019). Proceedings of the AAAI 2019 Fall Symposium on Human-Centered AI, Arlington, VA.

  6. Licatom, J., & Marji, Z. (2018). Proceedings of the 2018 International Conference on Robot Ethics and Standards, ICRES.

  7. Waismann, F. (1965). The principles of linguistic philosophy. St. Martins Press.

    Google Scholar 

  8. Licato J. (2021). How should AI interpret rules? A defense of minimally defeasible interpretive argumentation arXiv e-prints.

  9. Vecht, J. J. (2020). Open texture clarified. Inquiry. https://doi.org/10.1080/0020174X.2020.1787222

    Article  Google Scholar 

  10. Licato, J., Fields, L., & Hollis, B. (2023). Proceedings of The 36th International Florida Artificial Intelligence Research Society Conference (FLAIRS-34), AAAI Press.

  11. Fields, L., & Licato, J. (2023) Proceedings of the 36th International Florida Artificial Intelligence Research Society Conference (FLAIRS-34), AAAI.

  12. Licato, J. (2022). Proceedings of the AAAI 2022 Spring Workshop on “Ethical Computing: Metrics for Measuring AI’s Proficiency and Competency for Ethical Reasoning".

  13. Licato, J. (2022). Proceedings of the 2022 Advances on Societal Digital Transformation (DIGITAL) Special Track on Explainable AI in Societal Games (XAISG).

  14. Sartor. G., Walton, D., Macagno, F., & Rotolo, A. (2014). Legal Knowledge and Information Systems. In: Proceedings of JURIX 14, pp. 21–28.

  15. Bongiovanni, G., Postema, G., Rotolo, A., Sartor, G., Valentini, C., & Walton, D. (Eds.). (2018). Handbook of legal reasoning and argumentation (pp. 519–560). Netherlands, Dordrecht: Springer. https://doi.org/10.1007/978-90-481-9452-0_18

    Book  Google Scholar 

  16. Walton, D., Macagno, F., & Sartor, G. (2021). Statutory interpretation: Pragmatics and argumentation. Cambridge University Press.

    Book  Google Scholar 

  17. Araszkiewicz, M. (2021). Critical questions to argumentation schemes in statutory interpretation. Journal of Applied Logics - IfCoLog Journal of Logics and Their Applications, 8(1), 291–320.

    Google Scholar 

  18. Bommasani, R., Hudson, D. A., Adeli, E., Altman, R., Arora, S., von Arx, S., Bernstein, M. S., Bohg, J., Bosselut, A., & Brunskill, E., et al. (2021). On the opportunities and risks of foundation models, arXiv preprint arXiv:2108.07258

  19. Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., et al. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems, 33, 1877.

    Google Scholar 

  20. Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H., & Neubig, G. (2021). Pre-train,prompt, and predict: A systematic survey of prompting methods in natural language processing arXiv:abs/2107.13586

  21. Wei, J., Wang, X., Schuurmans, D., Bosma, M., Chi, E.H., Le Q., & Zhou, D. (2022). Chain of thought prompting elicits reasoning in large language models CoRR abs/2201.11903. https://arxiv.org/abs/2201.11903

  22. Ye, X., & Durrett, G. (2023). Explanation selection using unlabeled data for in-context learning, arXiv preprint arXiv:2302.04813

  23. Rubin, O., Herzig, J., & Berant, J. (2022). Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies , pp. 2655–2671.

  24. Song, C., Cai, F., Wang, M., Zheng, J., & Shao, T. (2023). TaxonPrompt: Taxonomy-aware curriculum prompt learning for few-shot event classification. Knowledge-Based Systems, 264, 110290. https://doi.org/10.1016/j.knosys.2023.110290

    Article  Google Scholar 

  25. Qu, Y., Ding, Y., Liu, J., Liu, K., Ren, R., Zhao, W. X., Dong, D., Wu, H., & Wang, H. (2021). Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Association for Computational Linguistics, Online). pp. 5835–5847. https://doi.org/10.18653/v1/2021.naacl-main.466

  26. Wang, S., Xu, Y., Fang, Y., Liu, Y., Sun, S., Xu, R., Zhu, C., & Zeng, M. (2022). Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Dublin, Ireland. pp. 3170–3179. https://doi.org/10.18653/v1/2022.acl-long.226

  27. Reimers, N., & Gurevych, I. (2019). in Proceedings of the 2019 Conference on Empirical Methods. In S. Padó & R. Huang (Eds.), Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (pp. 3982–3992). Hong Kong: Association for Computational Linguistics.

  28. Lyu, Q., Havaldar, S., Stein, A., Zhang, L., Rao, D., Wong, E., Apidianaki, M., & Callison-Burch, C. (2023). Faithful chain-of-thought reasoning, arXiv preprint arXiv:2301.13379

  29. Zelikman, E., Wu, Y., Mu, J., & Goodman, N. (2022). The flan collection: Designing data and methods for effective instruction tuning. Advances in Neural Information Processing Systems, 35, 15476.

    Google Scholar 

  30. Jung, J., Qin, L., Welleck, S., Brahman, F., Bhagavatula, C., Bras, R. L., & Choi, Y. (2022). Maieutic prompting: Logically consistent reasoning with recursive explanations arXiv preprint arXiv:2205.11822

  31. Srivastava, A., Rastogi, A., Rao, A., Shoeb, A. A. M. , Abid, A., Fisch, A., & Brown, A. R. A., Santoro, A. Gupta, A. Garriga-Alonso, et al. (2022). Beyond the imitation game: Quantifying and extrapolating the capabilities of language models arXiv preprint arXiv:2206.04615

  32. Yu, F., Quartey, L., & Schilder, F. (2023) Findings of the Association for Computational Linguistics: ACL 2023 , pp. 13582–13596.

  33. Zhou, Y., Muresanu, A. I., Han, Z., Paster, K., Pitis, S., Chan, H., & Ba, J. J. (2022). Large language models are human-level prompt engineers, arXiv preprint arXiv:2211.01910

  34. Devlin, J., Chang, M.W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding, arXiv preprint arXiv:1810.04805

  35. Nguyen, H. T., Vuong, H. Y. T., Nguyen, P. M., Dang, B. T., Bui, Q. M., Vu, S. T., & Nguyen, C. M., Tran, V., Satoh, K. Nguyen, M. L. (2020) Jnlp team: Deep learning for legal processing in coliee, arXiv preprint arXiv:2011.08071

  36. He, P., Liu, X., Gao, J., & Chen, W. (2020). Deberta: Decoding-enhanced bert with disentangled attention, arXiv preprint arXiv:2006.03654

  37. Lin, J., Nogueira, R., & Yates, A. (2022). Pretrained transformers for text ranking: Bert and beyond. Springer Nature.

    Book  Google Scholar 

  38. Rosa, G. M., Rodrigues, R. C., de Alencar Lotufo, R., & Nogueira, R. (2021). Proceedings of the Eighteenth International Conference on Artificial Intelligence and Law, pp. 295–300.

  39. Shao, Y., Mao, J., Liu, Y., Ma, W. , Satoh, K., Zhang, M., & Ma, S. (2020). IJCAI, pp. 3501–3507.

  40. Shao, Y., Liu, B., Mao, J., Liu, Y., Zhang, M., & Ma, S. (2020). Thuir@ coliee-2020: leveraging semantic understanding and exact matching for legal case retrieval and entailment. Corr arXiv:2012.13102

  41. Rosa, G.M. , Rodrigues, R.C. , Lotufo, R., & Nogueira, R. (2021). Yes, bm25 is a strong baseline for legal case retrieval, arXiv preprint arXiv:2105.05686

  42. Althammer, S., Askari, A. , Verberne, S., & Hanbury, A. (2021). Proceedings of the eighth international competition on legal information extraction/entailment (COLIEE 2021), pp. 8–14.

  43. Askari, A., Peikos,G., Pasi, G., & Verberne, S. (2022). Leibi@ coliee 2022: Aggregating tuned lexical models with a cluster-driven bert-based model for case law retrieval, arXiv preprint arXiv:2205.13351

  44. Savelka, J., Ashley, K. D., Gray, M. A., Westermann, H., & Xu, H. (2023). Can gpt-4 support analysis of textual data in tasks requiring highly specialized domain expertise? arXiv preprint arXiv:2306.13906

  45. Savelka, J., Ashley, K. D., Gray, M. A., Westermann, H., & Xu, H. (2023). Explaining legal concepts with augmented large language models, arXiv preprint arXiv:2306.09525

  46. Goebel, R., Kano, Y., Kim, M. Y., Rabelo, J., Satoh, K., & Yoshioka, M. (2023). Proceedings of the Nineteenth International Conference on Artificial Intelligence and Law, pp. 472–480.

  47. Berryessa, C. M., Dror, I. E., & McCormack, C. J. B. (2023). Prosecuting from the bench? Examining sources of pro-prosecution bias in judges. Legal and Criminal Psychology, 28(1), 1.

    Article  Google Scholar 

  48. Liu, J. Z., & Li, X. (2019). Legal techniques for rationalizing biased judicial decisions: Evidence from experiments with real judges. Journal of Empirical Legal Studies, 16(3), 630.

    Article  Google Scholar 

  49. Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., & Galstyan, A. (2022). A survey on bias and fairness in machine learning. ACM Computing Surveys, 54(6), 1. https://doi.org/10.1145/3457607

    Article  Google Scholar 

  50. Wachter, S., Mittelstadt, B., & Russell, C. (2020). Bias preservation in machine learning: The legality of fairness metrics under EU non-discrimination laws. West Virginia Law Review, 123, 735.

    Google Scholar 

  51. Yeung, D., Khan, I., Kalra, N., Osoba, O. A. (2021). Identifying systemic bias in the acquisition of machine learning decision aids for law enforcement applications. RAND Corporation, Santa Monica, CA. https://doi.org/10.7249/PEA862-1

  52. Costantini, S., & Lanzarone, G. A. (1995). Explanation-based interpretation of open-textured concepts in logical models of legislation. Artificial Intelligence and Law, 3, 191. https://doi.org/10.1007/BF00872530

    Article  Google Scholar 

  53. Ashley, K. D., & Walker, V. R. (2013) ICAIL ’13: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Law. Association for Comuting Machinery, pp. 176–180. https://doi.org/10.1145/2514601.2514622

  54. Bayamlıoğlu, E., Leenes, R. E. (2018) Data-driven decision-making and the ‘rule of law’ Tilburg Law School Research Paper.

  55. Workshop, B., Scao, T. L., Fan, A., Akiki, C., Pavlick, E., Ilić, S., Hesslow, D., Castagné, R., Luccioni, A. S., Yvon, F., Gallé, M., Tow, J., Rush, A. M., Biderman, S., Webson, A., Ammanamanchi, P. S., Wang, T., Sagot, B., Muennighoff, N., del Moral, A. V., Ruwase, O., Bawden, R. , Bekman, S., McMillan-Major, A. , Beltagy, I., H. Nguyen, L. Saulnier, S. Tan, P.O. Suarez, V. Sanh, H. Laurenčon, Y. Jernite, J. Launay, M. Mitchell, C. Raffel, A. Gokaslan, A. Simhi, A. Soroa, A.F. Aji, A. Alfassy, A. Rogers, A.K. Nitzav, C. Xu, C. Mou, C. Emezue, C. Klamm, C. Leong, D. van Strien, D.I. Adelani, D. Radev, E.G. Ponferrada, E. Levkovizh, E. Kim, E.B. Natan, F.D. Toni, G. Dupont, G. Kruszewski, G. Pistilli, H. Elsahar, H. Benyamina, H. Tran, I. Yu, I. Abdulmumin, I. Johnson, I. Gonzalez-Dios, J. de la Rosa, J. Chim, J. Dodge, J. Zhu, J. Chang, J. Frohberg, J. Tobing, J. Bhattacharjee, K. Almubarak, K. Chen, K. Lo, L.V. Werra, L. Weber, L. Phan, L.B. allal, L. Tanguy, M. Dey, M.R. Muñoz, M. Masoud, M. Grandury, M. Šaško, M. Huang, M. Coavoux, M. Singh, M.T.J. Jiang, M.C. Vu, M.A. Jauhar, M. Ghaleb, N. Subramani, N. Kassner, N. Khamis, O. Nguyen, O. Espejel, O. de Gibert, P. Villegas, P. Henderson, P. Colombo, P. Amuok, Q. Lhoest, R. Harliman, R. Bommasani, R.L. López, R. Ribeiro, S. Osei, S. Pyysalo, S. Nagel, S. Bose, S.H. Muhammad, S. Sharma, S. Longpre, S. Nikpoor, S. Silberberg, S. Pai, S. Zink, T.T. Torrent, T. Schick, T. Thrush, V. Danchev, V. Nikoulina, V. Laippala, V. Lepercq, V. Prabhu, Z. Alyafeai, Z. Talat, A. Raja, B. Heinzerling, C. Si, D.E. Taşar, E. Salesky, S.J. Mielke, W.Y. Lee, A. Sharma, A. Santilli, A. Chaffin, A. Stiegler, D. Datta, E. Szczechla, G. Chhablani, H. Wang, H. Pandey, H. Strobelt, J.A. Fries, J. Rozen, L. Gao, L. Sutawika, M.S. Bari, M.S. Al-shaibani, M. Manica, N. Nayak, R. Teehan, S. Albanie, S. Shen, S. Ben-David, S.H. Bach, T. Kim, T. Bers, T. Fevry, T. Neeraj, U. Thakker, V. Raunak, X. Tang, Z.X. Yong, Z. Sun, S. Brody, Y. Uri, H. Tojarieh, A. Roberts, H.W. Chung, J. Tae, J. Phang, O. Press, C. Li, D. Narayanan, H. Bourfoune, J. Casper, J. Rasley, M. Ryabinin, M. Mishra, M. Zhang, M. Shoeybi, M. Peyrounette, N. Patry, N. Tazi, O. Sanseviero, P. von Platen, P. Cornette, P.F. Lavallée, R. Lacroix, S. Rajbhandari, S. Gandhi, S. Smith, S. Requena, S. Patil, T. Dettmers, A. Baruwa, A. Singh, A. Cheveleva, A.L. Ligozat, A. Subramonian, A. Névéol, C. Lovering, D. Garrette, D. Tunuguntla, E. Reiter, E. Taktasheva, E. Voloshina, E. Bogdanov, G.I. Winata, H. Schoelkopf, J.C. Kalo, J. Novikova, J.Z. Forde, J. Clive, J. Kasai, K. Kawamura, L. Hazan, M. Carpuat, M. Clinciu, N. Kim, N. Cheng, O. Serikov, O. Antverg, O. van der Wal, R. Zhang, R. Zhang, S. Gehrmann, S. Mirkin, S. Pais, T. Shavrina, T. Scialom, T. Yun, T. Limisiewicz, V. Rieser, V. Protasov, V. Mikhailov, Y. Pruksachatkun, Y. Belinkov, Z. Bamberger, Z. Kasner, A. Rueda, A. Pestana, A. Feizpour, A. Khan, A. Faranak, A. Santos, A. Hevia, A. Unldreaj, A. Aghagol, A. Abdollahi, A. Tammour, A. HajiHosseini, B. Behroozi, B. Ajibade, B. Saxena, C.M. Ferrandis, D. Contractor, D. Lansky, D. David, D. Kiela, D.A. Nguyen, E. Tan, E. Baylor, E. Ozoani, F. Mirza, F. Ononiwu, H. Rezanejad, H. Jones, I. Bhattacharya, I. Solaiman, I. Sedenko, I. Nejadgholi, J. Passmore, J. Seltzer, J.B. Sanz, L. Dutra, M. Samagaio, M. Elbadri, M. Mieskes, M. Gerchick, M. Akinlolu, M. McKenna, M. Qiu, M. Ghauri, M. Burynok, N. Abrar, N. Rajani, N. Elkott, N. Fahmy, O. Samuel, R. An, R. Kromann, R. Hao, S. Alizadeh, S. Shubber, S. Wang, S. Roy, S. Viguier, T. Le, T. Oyebade, T. Le, Y. Yang, Z. Nguyen, A.R. Kashyap, A. Palasciano, A. Callahan, A. Shukla, A. Miranda-Escalada, A. Singh, B. Beilharz, B. Wang, C. Brito, C. Zhou, C. Jain, C. Xu, C. Fourrier, D.L. Periñán, D. Molano, D. Yu, E. Manjavacas, F. Barth, F. Fuhrimann, G. Altay, G. Bayrak, G. Burns, H.U. Vrabec, I. Bello, I. Dash, J. Kang, J. Giorgi, J. Golde, J.D. Posada, K.R. Sivaraman, L. Bulchandani, L. Liu, L. Shinzato, M.H. de Bykhovetz, M. Takeuchi, M. Pámies, M.A. Castillo, M. Nezhurina, M. Sänger, M. Samwald, M. Cullan, M. Weinberg, M.D. Wolf, M. Mihaljcic, M. Liu, M. Freidank, M. Kang, N. Seelam, N. Dahlberg, N.M. Broad, N. Muellner, P. Fung, P. Haller, R. Chandrasekhar, R. Eisenberg, R. Martin, R. Canalli, R. Su, R. Su, S. Cahyawijaya, S. Garda, S.S. Deshmukh, S. Mishra, S. Kiblawi, S. Ott, S. Sang-aroonsiri, S. Kumar, S. Schweter, S. Bharati, T. Laud, T. Gigant, Kainuma, T., Kusa, W., Labrak, Y., Bajaj, Y. S., Venkatraman, Y., Xu, Y., Xu, Y., Xu, Y., Tan, Z., Xie, Z., Ye, Z., Bras, M., Belkada, Y., Wolf, T. (2023). Bloom: A 176b-parameter open-access multilingual language model.

  56. Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C. L., Mishkin, P., Zhang, C., Agarwal, S., Slama,K., Ray, A., Schulman, J., Hilton, J., Kelton, F., Miller, L. , Simens, M., Askell, A., Welinder, P., Christiano, P. J., Leike, R. Lowe, R. (2022). Training language models to follow instructions with human feedback.

  57. Christiano, P. F., Leike, J., Brown, T., Martic, M., Legg, S., & Amodei, D. (2017). Deep reinforcement learning from human preferences. Advances in neural information processing systems 30

  58. Wei, J., Bosma, M., Zhao, V. Y., Guu, K., Yu, A. W., Lester, B., Du, N., Dai, A. M., & Le, Q. V. (2021). Finetuned language models are zero-shot learners, arXiv preprint arXiv:2109.01652

  59. Chung, H. W., Hou, L., Longpre, S., Zoph, B., Tay, Y., Fedus, W., Li, E., Wang, X., Dehghani, M., Brahma, S., et al. (2022). Scaling instruction-finetuned language models, arXiv preprint arXiv:2210.11416

  60. Sanh, V., Webson, A., Raffel, C., Bach, S.H., Sutawika, L., Alyafeai, Z., Chaffin, A., Stiegler, A., Scao, T. L., Raja, A., Dey, M., Bari, M. S., Xu, C., Thakker, U., Sharma, S. S., Szczechla, E. , Kim, T., Chhablani, G., Nayak, N., Datta, D., Chang, J., Jiang, M. T. J., Wang, H., Manica, M., Shen, S., Yong, Z. X., Pandey, H., Bawden, R., Wang, T., Neeraj, T., Rozen, J., Sharma, A., Santilli, A., Fevry, T., Fries, J. A., Teehan, R., Biderman, S., Gao, L., Bers, T., Wolf, T., Rush, A.M. (2021). Multitask prompted training enables zero-shot task generalization.

  61. Chia, Y. K., Hong, P., Bing, L., Poria, S. (2023). Instructeval: Towards holistic evaluation of instruction-tuned large language models, arXiv preprint arXiv:2306.04757

  62. Wolf, T., Debut, L., Sanh, V. , Chaumond, J., Delangue, C., Moi, A., Cistac, P., Rault, T., Louf R.,, Funtowicz, M., Davison, J., Shleifer , S., von Platen, P., Ma, C. Jernite, Y., Plu, J., Xu, C., Scao T. L, Gugger, S., Drame, M. , Lhoest, Q., Rush, A. M. (2020) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Association for Computational Linguistics, Online, pp. 38–45. https://www.aclweb.org/anthology/2020.emnlp-demos.6

  63. Dietterich, T. G. (2000). Multiple Classifier Systems: First International Workshop, MCS 2000 Cagliari, Italy, June 21–23, Proceedings 1. Springer. pp. 1–15.

  64. Abbas, A., & Deny, S. (2022). Progress and limitations of deep networks to recognize objects in unusual poses.

  65. Zhou, K., Yang, J., Loy, C. C, & Liu, Z. (2022). Learning to prompt for vision-language models.

  66. Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273.

    Article  Google Scholar 

  67. Ho, T. K. (1995). Proceedings of 3rd international conference on document analysis and recognition, vol. 1. IEEE. pp. 278–282.

  68. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825.

    Google Scholar 

  69. Breiman, L., Friedman, J., Olshen, R. A., & Stone, C. J. (1984). Classification and regression trees. Chapman and Hall/CRC.

    Google Scholar 

  70. Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M. , Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V., (2019). Scikit-learn: Machine learning in Python, Journal of Machine Learning Res, CoRR abs/1907.11692http://arxiv.org/abs/1907.11692

  71. Chalkidis, I., Fergadiotis, M., Malakasiotis, P., Aletras, N., Androutsopoulos, I. (2020). Findings of the Association for Computational Linguistics: EMNLP 2020. Association for Computational Linguistics, pp. 2898–2904. https://doi.org/10.18653/v1/2020.findings-emnlp.261

  72. Zhang, S., Roller, S., Goyal, N., Artetxe, M., Chen, M., Chen, S., Dewan, C., Diab, M., Li, X., Lin, X. V., Mihaylov, T., Ott, M., Shleifer, S., Shuster, K., Simig, D., Koura, P. S., Sridhar, A., Wang, T., Zettlemoyer, L. (2022). Opt: Open pre-trained transformer language models.

  73. Wang, B., Komatsuzaki, A. (2021). GPT-J-6B: A 6 Billion Parameter Autoregressive Language Model. https://github.com/kingoflolz/mesh-transformer-jax

  74. OpenAI (2022). Introducing chatgpt. https://openai.com/blog/chatgpt

  75. OpenAI (2023) ArXiv, https://arxiv.org/pdf/2303.08774.pdf

  76. Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., Oh, A. (Eds.) (2022). Advances in Neural Information Processing Systems, vol. 35. Curran Associates. pp. 22199–22213. https://proceedings.neurips.cc/paper_files/paper/2022/file/8bb0d291acd4acf06ef112099c16f326-Paper-Conference.pdf

  77. Lu, J., Shen, J., Xiong, B., Ma, W., Staab, S., Yang, C. (2023). Hiprompt: Few-shot biomedical knowledge fusion via hierarchy-oriented prompting, arXiv preprint arXiv:2304.05973

  78. Wang, L., Xu, W., Lan, Y., Hu, Z., Lan, Y., Lee, R. K. W., Lim, E. P. (2023). lan-and-solve prompting: Improving zero-shot chain-ofthought reasoning by large language models, arXiv preprint arXiv:2305.04091

  79. Takama, Y., Yada, K., Satoh, K., & Arai, S. (Eds.). (2023). New frontiers in artificial intelligence (pp. 51–67). Cham: Springer Nature Switzerland.

    Google Scholar 

  80. Floridi, L., & Chiriatti, M. (2020). Its nature, scope, limits, and consequences. Minds and Machines, 30, 681.

    Article  Google Scholar 

  81. Chen, Y., Zhao, C., Yu, Z., McKeown, K., He, H. (2023). On the relation between sensitivity and accuracy in in-context learning.

  82. Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., Vaughan, J. W. (Eds.). (2021). Advances in Neural Information Processing Systems.

  83. Zhao, Z., Wallace, E., Feng, S., Klein, D., Singh, S. (2021) Proceedings of the 38th International Conference on Machine Learning, Proceedings of Machine Learning Research, vol. 139. In: Meila, M., Zhang, T. (Eds.). Proceedings of Machine Learning Research (PMLR), pp. 12697–12706. https://proceedings.mlr.press/v139/zhao21c.html

  84. Leskovec, J., Rajaraman, A., & Ullman, J. (2014). Mining of massive datasets (3rd ed.). Stanford University.

    Book  Google Scholar 

  85. Zhang, T., Kishore, V., Wu, F., Weinberger, K. Q., Artzi, Y. (2020) International Conference on Learning Representations.

  86. Jurafsky, D., Chai, J., Schluter, N., Tetreault, J. (Eds.) (2020). Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics. pp. 7881–7892. https://doi.org/10.18653/v1/2020.acl-main.704

  87. Liévin, V., Hother, C. E., Winther, O. (2023) Can large language models reason about medical questions?

  88. Press, O., Zhang, M., Min, S., Schmidt, L., Smith, N. A., Lewis, M. (2022). Measuring and narrowing the compositionality gap in language models, arXiv preprint arXiv:2210.03350

  89. Chen, S. F., Beeferman, D., Rosenfeld, R. (1998). Evaluation metrics for language models.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Onur Bilgin.

Ethics declarations

Conflict of Interest

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bilgin, O., Fields, L., Laverghetta, A. et al. Exploring Prompting Approaches in Legal Textual Entailment. Rev Socionetwork Strat 18, 75–100 (2024). https://doi.org/10.1007/s12626-023-00154-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12626-023-00154-y

Keywords

Navigation