SynthNotes: A Generator Framework for High-volume, High-fidelity Synthetic Mental Health Notes
- ORNL
- Stanford University
One of the key, emerging challenges that connects the "Big Data" and the AI domain is the availability of sufficient volumes of training data for AI/Machine Learning tasks. SynthNotes is a framework for generating standards-compliant, realistic mental health progress report notes at the very large, population-level scale, and in a strict privacy-preserving manner. Our framework, inspired by the needs to explore, evaluate, and train computational methods for the emerging mental health crisis in the US, is useful for benchmarking, optimization, and training of biomedical natural language processing, information extraction, and machine learning systems intended to operate at "Big Data" scale (billions of notes). The free text notes generated by SynthNotes are based on the literature and public statistical models allowing for realistic, natural language representation of a patient, and his or her mental health characteristics. Additionally, SynthNotes can partially simulate stylistic, grammatical, and expressive characteristics of a licensed mental health professional. SynthNotes is modular and flexible, allowing for representation of variety of conditions, incorporation of alternative foundational models, and parametrization of the variability of the structure, content, and size of the synthetically generated corpus. In this paper, we report on the initial use and performance characteristics of our SynthNotes framework and on the ongoing work for inclusion of content planning and deep learning-based generative methods trained on real data.
- Research Organization:
- Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
- Sponsoring Organization:
- USDOE
- DOE Contract Number:
- AC05-00OR22725
- OSTI ID:
- 1507868
- Resource Relation:
- Conference: 2018 IEEE International Conference on Big Data - Seattle, Washington, United States of America - 8/10/2018 8:00:00 AM-8/13/2018 8:00:00 AM
- Country of Publication:
- United States
- Language:
- English
Similar Records
Explainable Artificial Intelligence Recommendation System by Leveraging the Semantics of Adverse Childhood Experiences: Proof-of-Concept Prototype Development
Optimal vocabulary selection approaches for privacy-preserving deep NLP model training for information extraction and cancer epidemiology