Program Information
Multi-Parameterized Machine Learning for Individualized Cancer Risk Prediction
D Roffman*, I Ali , F Guo , J Deng , Yale University, New Haven, CT
Presentations
TU-L-GePD-J(A)-3 (Tuesday, August 1, 2017) 1:15 PM - 1:45 PM Room: Joint Imaging-Therapy ePoster Lounge - A
Purpose: The goal of this work is to build multi-parameterized cancer prediction models via mining of big health data. With validation and testing, we anticipate that our models can be used for cancer risk prediction for individuals, hence contributing to early cancer detection and prevention.
Methods: Both a conditional probability analysis (CPA) and an artificial neural network (ANN) were tested for our multi-parameterized model. A total of 555,183 people in the 1997-2015 CDC datasets were included for data extraction. The parameters we extracted include age, gender, ethnicity, alcohol consumption, diabetes, family history, tobacco use, and heart disease. In the CPA approach the primary quantity of interest was the probability of acquiring cancer given multiple parameters, P(Cancer|X), from which multi-parameterized correlations can be deduced via data mining. In the ANN model, a sample of 18,000 people, with half having cancer was used to train the model. 20% of that sample was used, to test the model for validation.
Results: With the CPA method, we found the combined effects of smoking, diabetes and age were far greater than when considered alone. The impact of smoking on cancer risk in young women was more significant than in young men. For the ANN, prediction accuracy was plotted as a function of training examples. Average accuracy of 78% was obtained, which will only improve as more data is incorporated.
Conclusion: Our preliminary study indicated that the CPA model can be used for individualized cancer risk prediction with up to 5 parameters considered at a time. When more data is included from mining, the ANN model has the potential for more predictive power than the CPA model. Further work is underway to build a robust multi-parameterized model for individualized cancer risk prediction so that cancer detection and intervention can be initiated at an early stage.
Contact Email: