ABSTRACT
Clinical trial protocols are complex documents that must be translated manually for trial execution and management. We have developed a system to automatically transform a schedule of activity (SOA) table from a PDF document into a machine interpretable form. Our system combines semantic, structural, and NLP approaches with a "human in the loop" for verification to determine which cells contain activity or temporal information, and then to understand details of what these cells represent. Using a training and test set of 20 protocols, we assess the accuracy of identifying specific types of SOA elements. This work is the first stage of a larger effort to use artificial intelligence techniques to extract procedural logic in clinical trial documents and to create a knowledge base of protocols for insights and comparison across studies.
- Batelle . 2015. Biopharmaceutical Industry-Sponsored Clinical Trials: Impact on State Economies. Technical Report. Battelle Technology Partnership Practice.Google Scholar
- Ernst R Berndt and Iain M Cockburn . 2013. Price indexes for clinical trial research: a feasibility study. Technical Report. National Bureau of Economic Research.Google Scholar
- Sanmitra Bhattacharya and Michael N Cantor . 2013. Analysis of eligibility criteria representation in industry-standard clinical trial protocols. Journal of Biomedical Informatics Vol. 46, 5 (2013), 805--813. Google ScholarDigital Library
- Tara Borlawsky and PR Payne . 2007. Evaluating an NLP-based approach to modeling computable clinical trial eligibility criteria. In AMIA Annu Symp Proc, Vol. Vol. 878.Google Scholar
- Robert M Califf . 2009. Clinical research sites-the underappreciated component of the clinical research system. Jama Vol. 302, 18 (2009), 2025--2027.Google ScholarCross Ref
- Olivia Choudhury, Hillol Sarker, Nolan Rudolph, Morgan Foreman, Nicholas Fay, Murtaza Dhuliawala, Issa Sylla, Noor Fairoza, and Amar K Das . 2018. Enforcing Human Subject Regulations using Blockchain and Smart Contracts. Blockchain in Healthcare Today (2018).Google Scholar
- Clinical Data Interchange Standards Consortium . {n. d.}. CDISC Foundational Standards, https://www.cdisc.org/standards/foundational. deftempurl%https://www.cdisc.org/standards/foundational tempurlGoogle Scholar
- Berry De Bruijn, Simona Carini, Svetlana Kiritchenko, Joel Martin, and Ida Sim . 2008. Automated information extraction of key trial design elements from clinical trial publications. In AMIA Annual Symposium Proceedings, Vol. Vol. 2008. American Medical Informatics Association, 141.Google Scholar
- David Ferrucci, Anthony Levas, Sugato Bagchi, David Gondek, and Erik T. Mueller . 2013. Watson: Beyond Jeopardy! Vol. 199--200 (07 . 2013), 93--105.Google Scholar
- Xibin Gao, Munindar P Singh, and Pankaj Mehra . 2012. Mining business contracts for service exceptions. IEEE Transactions on Services Computing Vol. 5, 3 (2012), 333--344. Google ScholarDigital Library
- Kenneth A Getz, Stella Stergiopoulos, Michelle Marlborough, Jane Whitehill, Marla Curran, and Kenneth I Kaitin . 2015. Quantifying the magnitude and cost of collecting extraneous protocol data. American journal of therapeutics Vol. 22, 2 (2015), 117--124.Google Scholar
- Kenneth A Getz, Stella Stergiopoulos, Mary Short, Leon Surgeon, Randy Krauss, Sybrand Pretorius, Julian Desmond, and Derek Dunn . 2016. The impact of protocol amendments on clinical trial performance and cost. Therapeutic Innovation & Regulatory Science Vol. 50, 4 (2016), 436--441.Google ScholarCross Ref
- Ian Horrocks, Peter F Patel-Schneider, Harold Boley, Said Tabet, Benjamin Grosof, Mike Dean, et almbox. . 2004. SWRL: A semantic web rule language combining OWL and RuleML. W3C Member submission Vol. 21 (2004), 79.Google Scholar
- Svetlana Kiritchenko, Berry de Bruijn, Simona Carini, Joel Martin, and Ida Sim . 2010. ExaCT: automatic extraction of clinical trial characteristics from journal publications. BMC medical informatics and decision making Vol. 10, 1 (2010), 56.Google Scholar
- Edward Loper and Steven Bird . 2002. NLTK: The Natural Language Toolkit. In In Proceedings of the ACL Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics. Philadelphia: Association for Computational Linguistics. Google ScholarDigital Library
- Jake Luo, Stephen B Johnson, Albert Lai, and Chunhua Weng . 2011. Extracting Temporal Constraints from Clinical Research Eligibility Criteria Using Conditional Random Fields. Vol. 2011 (01 . 2011), 843--52.Google Scholar
- Sudip Mittal, Karuna P Joshi, Claudia Pearce, and Anupam Joshi . 2016. Automatic extraction of metrics from slas for cloud service management Cloud Engineering (IC2E), 2016 IEEE International Conference on. IEEE, 139--142.Google Scholar
- Sanjay Modgil and Peter Hammond . 2003. Decision support tools for clinical trial design. Artificial Intelligence in Medicine Vol. 27, 2 (2003), 181--200. Google ScholarDigital Library
- NHLBI . 2018. BioLINCC Website. deftempurl%https://biolincc.nhlbi.nih.gov/ tempurlGoogle Scholar
- NIH US National Library of Medicine . 2018. Unified Medical Language System (UMLS), www.nlm.nih.gov/research/umls. deftempurl%https://www.nlm.nih.gov/research/umls/ tempurlGoogle Scholar
- Martin O'connor, Holger Knublauch, Samson Tu, Benjamin Grosof, Mike Dean, William Grosso, and Mark Musen . 2005. Supporting rule system interoperability on the semantic web with SWRL International Semantic Web Conference. Springer, 974--986. Google ScholarDigital Library
- Horacio Saggion, Adam Funk, Diana Maynard, and Kalina Bontcheva . 2007. Ontology-based information extraction for business intelligence. In The Semantic Web. Springer, 843--856. Google ScholarDigital Library
- Ravi D Shankar, Susana B Martins, Martin O'Connor, David B Parrish, and Amar K Das . 2007. An ontology-based architecture for integration of clinical trials management applications AMIA Annual Symposium Proceedings, Vol. Vol. 2007. American Medical Informatics Association, 661.Google Scholar
- Peter WJ Staar, Michele Dolfi, Christoph Auer, and Costas Bekas . 2018. Corpus Conversion Service: A machine learning platform to ingest documents at scale. SysML (2018).Google Scholar
Index Terms
- What Happens When?: Interpreting Schedule of Activity Tables in Clinical Trial Documents
Recommendations
Analysis of eligibility criteria representation in industry-standard clinical trial protocols
Graphical abstractDisplay Omitted We compare textual complexity of full-text and ClinicalTrials.gov (CT) protocols.We use cosine-similarity measures to identify clusters for standardization.We find that CT protocols are very condensed and convey lesser ...
Conflict Discovery and Analysis for Clinical Trials
DH '17: Proceedings of the 2017 International Conference on Digital HealthToday, cancer patients and their caregivers often prefer to share the decision making process with their physicians and may be highly involved in the process of locating and choosing clinical trials for treatment. One issue is that treatments received ...
Inferring appropriate eligibility criteria in clinical trial protocols without labeled data
DTMBIO '12: Proceedings of the ACM sixth international workshop on Data and text mining in biomedical informaticsWe consider the user task of designing clinical trial protocols and propose a method that outputs the most appropriate eligibility criteria from a potentially huge set of candidates. Each document d in our collection D is a clinical trial protocol which ...
Comments