Program Information

Medicine-Specific Natural Language Processing for Information Aggregation in Radiotherapy

D Ruan*, C Felix , D Veruttipong , V basehart , M Steinberg , D Low , P Kupelian , J Weidhass , University of California, Los Angeles, Los Angeles, CA

Presentations

TU-C3-GePD-J(A)-4 (Tuesday, August 1, 2017) 10:30 AM - 11:00 AM Room: Joint Imaging-Therapy ePoster Lounge - A

Purpose: Computational methods and informatics technologies are useful tools in understanding outcomes and responses for radiotherapy. Their success depends on accurate quantification of hybrid information, including medical history, treatment regimen, follow-up etc., often distributed as notes across various departments in multiple EMRs. This project reports our effort in adapting natural language processing (NLP) methods, to establish a quantitative information hub in radiation oncology.

Methods: Unlike general-purpose NLP to understand the theme/topic of documents where common words such as “no” are discounted due to their high frequency, it is crucial to catch negations as well as medical modifiers in the context of medical informatics. We have developed a method to modify the “frequency inversion” approaches for pattern recognition in NLP by (1) incorporating a set of medical-specific key words both by hand-crafting and regular expressions, (2) hierarchically modeling the context to modify the probability of a medically-relevant modifier, and (3) adaptive learning to enhance the keyword pool and refine the hierarchical context model as more notes are processed. For a germ line study, we applied our approach to extract chemotherapy information (from medical oncology department) using queried notes from an Epic system. A senior research coordinator worked on the same task independently, and the retrieved results were compared.

Results: Highly consistent results were found between our method and human. Our method managed to catch a few human misses (12 out of 720 patients) upon further review, but failed to incorporate information on scanned-in notes where electronic data were unavailable. The nominal time for human processing a record was in the order of 5~10 minutes vs. seconds with the processor.

Conclusion: General-purpose NLP needs to be modified to catch the medically crucial information. Our preliminary results showed promise in automating informatics extraction from notes. Further QA is needed for wide use.

Funding Support, Disclosures, and Conflict of Interest: This project is supported in part by a research agreement with Varian.

Contact Email: