research-article

Beyond Readability with RateMyPDF: A Combined Rule-based and Machine Learning Approach to Improving Court Forms

Authors:
Quinten Steenhuis

Legal Innovation and Technology Lab, Suffolk University Law School, Boston, Massachusetts USA

Legal Innovation and Technology Lab, Suffolk University Law School, Boston, Massachusetts USA

0009-0001-0110-064X
View Profile

,
Bryce Willey

Legal Innovation and Technology Lab, Suffolk University Law School, Boston, Massachusetts USA

Legal Innovation and Technology Lab, Suffolk University Law School, Boston, Massachusetts USA

0000-0003-1775-2869
View Profile

,
David Colarusso

Legal Innovation and Technology Lab, Suffolk University Law School, Boston, Massachusetts USA

Legal Innovation and Technology Lab, Suffolk University Law School, Boston, Massachusetts USA

0009-0003-6287-9284
View Profile

ICAIL '23: Proceedings of the Nineteenth International Conference on Artificial Intelligence and LawJune 2023Pages 287–296https://doi.org/10.1145/3594536.3595146

Published:07 September 2023Publication History

ICAIL '23: Proceedings of the Nineteenth International Conference on Artificial Intelligence and Law

Pages 287–296

ABSTRACT

In this paper, we describe RateMyPDF, a web application that helps authors measure and improve the usability of court forms. It offers a score together with automated suggestions to improve the form drawn from both traditional machine learning approaches and the general purpose GPT-3 large language model. We worked with form authors and usability experts to determine the set of features we measure and validated them by gathering a dataset of approximately 24,000 PDF forms from 46 U.S. States and the District of Columbia. Our tool and automated measures allow a form author or court tasked with improving a large library of forms to work at scale.

This paper describes the features that we find improve form usability, the results from our analysis of the large form dataset, details of the tool, and the implications of our tool on access to justice for self-represented litigants. We found that the RateMyPDF score significantly correlates to the score of expert reviewers.

While the current version of the tool allows automated analysis of Microsoft Word and PDF court forms, the findings of our research apply equally to the growing number of automated wizard-driven interactive legal applications that replace paper forms with interactive websites.

References

Rebekah George Benjamin. 2012. Reconstructing Readability: Recent Developments and Recommendations in the Analysis of Text Difficulty. Educ Psychol Rev 24, 1 (March 2012), 63--88. DOI:https://doi.org/10.1007/s10648-011-9181-8Google ScholarCross Ref
Allen Russell Boehm. Ohio Forms Burden Reduction Act. Ohio (on file with author).Google Scholar
G. Bradski. 2000. The OpenCV Library. Dr. Dobb's Journal of Software Tools (2000).Google Scholar
Jack Cushman, Matthew Dahl, and Michael Lissner. 2021. eyecite: A tool for parsing legal citations. JOSS 6, 66 (October 2021), 3617. DOI:https://doi.org/10.21105/joss.03617Google ScholarCross Ref
Edgar Dale and Jeanne S. Chall. 1948. A Formula for Predicting Readability: Instructions. Educational Research Bulletin 27, 2 (1948), 37--54.Google Scholar
Alice Davison and Robert N. Kantor. 1982. On the Failure of Readability Formulas to Define Readable Texts: A Case Study from Adaptations. Reading Research Quarterly 17, 2 (1982), 187--209. DOI:https://doi.org/10.2307/747483Google ScholarCross Ref
William H. DuBay. 2007. Smart Language: Readers, Readability, and the Grading of Text. Retrieved February 3, 2023 from https://eric.ed.gov/?id=ED506403Google Scholar
Anne Fernald, Virginia A. Marchman, and Adriana Weisleder. 2013. SES differences in language processing skill and vocabulary are evident at 18 months. Developmental Science 16, 2 (2013), 234--248. DOI:https://doi.org/10.1111/desc.12019Google ScholarCross Ref
Rudolph Flesch. 1948. A new readability yardstick. Journal of Applied Psychology 32, (1948), 221--233. DOI:https://doi.org/10.1037/h0057532Google ScholarCross Ref
Thomas François, Adeline Müller, Eva Rolin, and Magali Norré. 2020. AMesure: A Web Platform to Assist the Clear Writing of Administrative Texts. In Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing: System Demonstrations, Association for Computational Linguistics, Suzhou, China, 1--7. Retrieved November 9, 2022 from https://aclanthology.org/2020.aacl-demo.1Google Scholar
Dr Jörg Fuchs, Tina Heyer, and Diana Langenhan. 2008. Influence of Font Sizes on the Readability and Comprehensibility of Package Inserts. Pharm. Ind. (2008).Google Scholar
Paula Hannaford, Scott Graves, and Shelley Spacek Miller. 2015. The Landscape of Civil Litigation in State Courts. National Center for State Courts. Retrieved May 1, 2023 from https://www.ncsc.org/__data/assets/pdf_file/0020/13376/civiljusticereport-2015.pdfGoogle Scholar
Matthew Honnibal and Ines Montani. 2017. spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing. Retrieved February 2, 2023 from https://spacy.io/Google Scholar
Caroline Jarrett, Gerry Gaffney, and Steve Krug. 2008. Forms that Work: Designing Web Forms for Usability (1st edition ed.). Morgan Kaufmann, Amsterdam; Boston.Google Scholar
Marc Lauritsen and Quinten Steenhuis. 2019. Substantive Legal Software Quality: A Gathering Storm? In Proceedings of the Seventeenth International Conference on Artificial Intelligence and Law, ACM, Montreal QC Canada, 52--62. DOI:https://doi.org/10.1145/3322640.3326706Google ScholarDigital Library
Irving Lorge and Raphael Blau. 1941. Reading Comprehension of Adults. Teachers College Record 43, 3 (December 1941), 1--6. DOI:https://doi.org/10.1177/016146814104300303Google ScholarCross Ref
Shelley Miller-Shaul. 2005. The characteristics of young and adult dyslexics readers on reading and reading related cognitive tasks as compared to normal readers. Dyslexia 11, 2 (2005), 132--151. DOI:https://doi.org/10.1002/dys.290Google ScholarCross Ref
A. Miniukovich, A. De angeli, S. Sulpizio, and P. Venuti. 2017. Design guidelines for web readability. In DIS 2017 - Proceedings of the 2017 ACM Conference on Designing Interactive Systems, Association for Computing Machinery, Inc, Edinburgh, 285--296. DOI:https://doi.org/10.1145/3064663.3064711Google ScholarDigital Library
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12, (2011), 2825--2830.Google Scholar
Janice Redish. 2000. Readability formulas have even more limitations than Klare discusses. ACM J. Comput. Doc. 24, 3 (August 2000), 132--137. DOI:https://doi.org/10.1145/344599.344637Google ScholarDigital Library
Luz Rello, Martin Pielot, and Mari-Carmen Marcos. 2016. Make It Big! The Effect of Font Size and Line Spacing on Online Readability. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (CHI '16), Association for Computing Machinery, New York, NY, USA, 3637--3648. DOI:https://doi.org/10.1145/2858036.2858204Google ScholarDigital Library
John Sabatini. 2015. Understanding the Basic Reading Skills of U.S. Adults: Reading Components in the PIAAC Literacy Survey. ETS Center for Research on Human Capital and Education. Retrieved February 3, 2023 from https://eric.ed.gov/?id=ED593006Google Scholar
Amir Sepehri, David Matthew Markowitz, and Mitra Mir. 2022. PassivePy: A Tool to Automatically Identify Passive Voice in Big Text Dat. DOI:https://doi.org/10.31234/osf.io/bwp3tGoogle ScholarCross Ref
Quinten Steenhuis and David Colarusso. 2021. Digital Curb Cuts: Towards an Open Forms Ecosystem. Akron Law Review 54, 4 (2021), 2.Google Scholar
Suffolk Law School's Legal Innovation and Technology Lab. About Spot. Retrieved February 9, 2021 from https://spot.suffolklitlab.org/Google Scholar
Susanne Trauzettel-Klosinski, Klaus Dietz, and the IReST Study Group. 2012. Standardized Assessment of Reading Performance: The New International Reading Speed Texts IReST. Investigative Ophthalmology & Visual Science 53, 9 (August 2012), 5452--5461. DOI:https://doi.org/10.1167/iovs.11-8284Google ScholarCross Ref
Linda Veiga, Tomasz Janowski, and Luís Soares Barbosa. 2016. Digital Government and Administrative Burden Reduction. In Proceedings of the 9th International Conference on Theory and Practice of Electronic Governance (ICEGOV '15-16), Association for Computing Machinery, New York, NY, USA, 323--326. DOI:https://doi.org/10.1145/2910019.2910107Google ScholarDigital Library
Washington Law Help. 2022. How to File Petition for Order of Protection. Retrieved February 6, 2023 from https://www.washingtonlawhelp.org/files/C9D2EA3F-0350-D9AF-ACAE-BF37E9BC9FFA/attachments/9100D6C9-D107-4B15-87B3-A898F12B6FD8/3701en_how-to-file-petition-for-order-of-protection.pdfGoogle Scholar
Antoinette Welsh. 2013. Effects of Trauma Induced Stress on Attention, Executive Functioning, Processing Speed, and Resilience in Urban Children. Seton Hall University Dissertations and Theses (ETDs) (December 2013). Retrieved from https://scholarship.shu.edu/dissertations/1907Google Scholar
Jenny Ziviani and John Elkins. 1984. An Evaluation of Handwriting Performance. Educational Review 36, 3 (November 1984), 249--261. DOI:https://doi.org/10.1080/0013191840360304Google ScholarCross Ref
2015. Paperwork Reduction Act (44 U.S.C. 3501 et seq.). Digital.gov. Retrieved February 2, 2023 from https://digital.gov/resources/paperwork-reduction-act-44-u-s-c-3501-et-seq/Google Scholar
2023. RateMyPDF. Retrieved February 3, 2023 from https://github.com/SuffolkLITLab/RateMyPDFGoogle Scholar
2023. FormFyxer. Retrieved February 3, 2023 from https://github.com/SuffolkLITLab/FormFyxerGoogle Scholar
2023. Textstat. Retrieved February 7, 2023 from https://github.com/textstat/textstatGoogle Scholar
How to write good questions for forms - NHS digital service manual. nhs.uk. Retrieved February 6, 2023 from https://service-manual.nhs.ukGoogle Scholar
Restraining order/abuse prevention order court forms | Mass.gov. Retrieved February 6, 2023 from https://www.mass.gov/lists/restraining-orderabuse-prevention-order-court-formsGoogle Scholar
How to estimate burden | A Guide to the Paperwork Reduction Act. Retrieved November 9, 2022 from https://pra.digital.gov/burden/estimation/Google Scholar
LIST:Legal Issues Taxonomy. LIST: Legal Issues Taxonomy. Retrieved February 7, 2023 from https://taxonomy.legal/Google Scholar
About the Form Explorer? Retrieved February 7, 2023 from https://suffolklitlab.org/form-explorer/Google Scholar
Requests: HTTP for Humans™ --- Requests 2.28.2 documentation. Retrieved February 3, 2023 from https://requests.readthedocs.io/en/latest/Google Scholar
Field labels to use in template files | The Document Assembly Line Project. Retrieved February 3, 2023 from https://suffolklitlab.org/docassemble-AssemblyLine-documentation/docs/label_variablesGoogle Scholar
plainlanguage.gov | Choose your words carefully. Retrieved April 29, 2023 from https://www.plainlanguage.gov/guidelines/words/Google Scholar

Index Terms

Beyond Readability with RateMyPDF: A Combined Rule-based and Machine Learning Approach to Improving Court Forms

Recommendations

Lessons in Copyright Activism: K-12 Education and the DMCA 1201 Exemption Rulemaking Process

Digital learning is being transformed by changes in copyright law. This article discusses the author's personal journey as a copyright education activist through two rounds of rulemaking proceedings before the Copyright Office concerning the anti-...
Read More
Lawful Users: Copyright Circumvention and Legal Constraints on Technology Use
CHI '20: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems

The study of human-computer interaction requires consideration of aspects of interactions with technology that may be outside of the control of both user and designer. One example of when a user's question of "can I do this?" may have an answer beyond ...
Read More
Scene of the Cybercrime
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICAIL '23: Proceedings of the Nineteenth International Conference on Artificial Intelligence and Law
June 2023
499 pages
ISBN:9798400701979
DOI:10.1145/3594536
Conference Chair:
Francisco Andrade
University of Minho, Portugal
,
Program Chair:
Matthias Grabmair
Technical University of Munich, Germany
Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 7 September 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Accessibility
Administrative Burden
Automated Analysis
Court Forms
Law
Readability
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate69of169submissions,41%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 36
  Total Downloads
- Downloads (Last 12 months)36
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Beyond Readability with RateMyPDF: A Combined Rule-based and Machine Learning Approach to Improving Court Forms

ICAIL '23: Proceedings of the Nineteenth International Conference on Artificial Intelligence and Law

ABSTRACT

References

Cited By

Index Terms

Recommendations

Lessons in Copyright Activism: K-12 Education and the DMCA 1201 Exemption Rulemaking Process

Lawful Users: Copyright Circumvention and Legal Constraints on Technology Use

Scene of the Cybercrime