Program Information

Evaluation of Segmentation Methods as a Function of the Quality of Input Images

G Pednekar¹*, J Udupa² , D McLaughlin³ , X Wu⁴ , D Odhner⁵ , Y Tong⁶ , C Simone⁷ , J Camaratta⁸ , D Torigian⁹ , (1)Quantitative Radiology Solutions, Philadelphia, PA, (2)University of Pennsylvania, Philadelphia, PA, (3) Quantitative Radiology Solutions, Philadelphia, PA, (4) University of Pennsylvania, Philadelphia, PA, (5) University of Pennsylvania, Philadelphia, PA, (6) University of Pennsylvania, Philadelphia, PA, (7) University of Maryland, Baltimore, MD, (8) Quantitative Radiology Solutions, Philadelphia, PA, (9) University of Pennsylvania, Philadelphia, PA

Presentations

WE-RAM1-GePD-J(B)-5 (Wednesday, August 2, 2017) 9:30 AM - 10:00 AM Room: Joint Imaging-Therapy ePoster Lounge - B

Purpose: Many data sets, performance metrics, and methods exist for evaluating image segmentation algorithms. However, it is currently not possible to obtain a quantitative understanding of performance as a function of input image quality, and consequently, it is impossible to present a holistic picture of segmentation performance independent of input-image-specific vagaries due to unknown quality. We present a novel methodology to overcome this hurdle.

Methods: We retrospectively created a database of CT images and dosimetrist-drawn contours in 200 cancer studies of head-and-neck (H&N) and thorax. We developed precise definitions of key organs at risk (OARs), 11 in H&N and 12 in thorax, by extending object definitions from recent guidelines, and modified contour data to fulfill these definitions. We devised a set of key quality criteria that influence segmentation: some (e.g., streak artifacts, image noise) image-wise/ global, and others (e.g., shape distortion, severity of pathology) object-specific/ local. A trained reader assigned a grade to each object for each criterion in each study. We developed algorithms based on logical predicates for determining a 1 to 10 numeric quality score for each object and each image from reader-assigned quality grades. We then describe the performance of any segmentation method for any given metric over the entire quality score scale as a distribution of that metric.

Results: Mean object quality score over all objects: H&N: 3.9, with scores for 3 objects in the upper and 8 in lower quartile; Thorax: 5.7, with scores for 7 objects in the upper and 5 in lower quartile. H&N objects have lower quality score than thoracic objects.

Conclusion: The logical predicate can be adapted to the requirements of the application. The proposed holistic assessment of performance may allow for selection of segmentation systems that are optimally suited to the image quality distribution underlying a given application/ imaging center.

Funding Support, Disclosures, and Conflict of Interest: This work is supported by STTR grants from the National Science Foundation and National Cancer Institute. Drs. Udupa and Torigian are cofounders of Quantitative Radiology Solutions LLC.

Contact Email: