Program Information
Keys to Avoiding Statistical Pitfalls of Small Datasets in Radiomics
A Chatterjee1,2*, M Vallieres1,2, A Dohan2, I Levesque1,2, Y Ueno3, V Bist4, S Saif2, C Reinhold2, J Seuntjens1,2, (1) McGill University, Montreal, QC, (2) McGill University Health Centre, Montreal, QC, (3) Kobe University, Kobe, Hyogo Prefecture, (4) Venkateshwar Hospital, New Delhi
Presentations
TU-C2-GePD-J(A)-4 (Tuesday, August 1, 2017) 10:00 AM - 10:30 AM Room: Joint Imaging-Therapy ePoster Lounge - A
Purpose: Radiomic studies, where correlations are drawn between patients’ medical image features and patient outcomes, often deal with small datasets. Consequently, results can suffer from lack of reproducibility and stability. This study establishes a novel methodology to assess and reduce the impact of statistical fluctuations that may occur in small datasets. Such fluctuations can lead to false positives, particularly when applying feature selection or machine learning (ML) methods commonly used in the radiomics literature.
Methods: Monte Carlo models were used to illustrate the limitations of small datasets, highlighting the difference between true correlation and measured correlation. Two feature selection methods were created: one for choosing single predictive features, and another for obtaining features sets that could be combined in a predictive model. The features were combined using ML tools less affected by overfitting (Naïve Bayes, Logistic Regression, and linear Support Vector Machines). Only 3 features were allowed to be combined at a time, further limiting overfitting. This methodology was applied to MR images from small datasets in metastatic liver disease (69 samples) and primary uterine adenocarcinoma (93 samples), and the outcomes studied were: desmoplasia (for liver metastases), lymphovascular space invasion (LVSI), cancer staging (FIGO), and tumor grade (for uterine tumors). For outcomes in uterine cancer, the predictive models were tested on independent subsets.
Results: With respect to the combined predictive feature approach: for LVSI, AUC = 0.87 ± 0.07 and accuracy = 0.84 ± 0.09 in the testing set. For FIGO, AUC = 0.81 ± 0.03 and accuracy = 0.79 ± 0.08. For highGrade, AUC = 0.76 ± 0.05 and accuracy = 0.70 ± 0.08.
Conclusion: Despite using a large set (~10⁴) of texture features, our methodology avoided false positives while discovering promising results. Using this methodology should lead to more statistically stable results in radiomic studies involving small datasets.
Funding Support, Disclosures, and Conflict of Interest: This work was supported in part by CREATE Medical Physics Research Training Network grant of the Natural Sciences and Engineering Research Council (Grant number: 432290), by the Strategic Training in Transdisciplinary Radiation Science for the 21st Century (STARS21) program, and the Canadian Institutes of Health Research Foundation Grant FDN-143257.
Contact Email: