The AAPM CT Metal Artifact Reduction (CT-MAR) challenge has concluded. This challenge provided 14,000 CT training datasets using a hybrid data simulation combining different anatomies such as lung, abdomen, liver, head, and pelvis with different metal materials. Each dataset includes 5 components: CT sinograms (uncorrected & labels), CT reconstructed images (uncorrected & labels), and metal masks. In the final evaluation, a total of 29 clinical datasets with metal artifacts were provided in both sinogram and image domains. The inserted metals include surgical clips, dental fillings, and hip prosthesis etc. The participants’ submitted images were evaluated using our scoring metrics.
Out of a total of 105 registered institutions, 26 participants completed all phases of the competition.
The top five performing teams were:
- Team name: JLAB
Members: Yi Guo, Jianhua Ma, Yongbo Wang, Zhaoying Bian, and Dong Zeng
Institution: Southern Medical University, China
Final score: 0.96
- Team name: NIMS
Members: Hyoung Suk Park and Kiwan Jeon
Institution: National Institute for Mathematical Sciences, South Korea
Final score = 0.98
- Team name: fanstan
Members: Fuxin Fan and Mareike Thies
Institution: Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany
Final score = 0.99
- Team name: FDMAR
Members: Zilong Li, Chenglong Ma, and Hongming Shan
Institution: Fudan University, China
Final score = 1.05
- Team name: MIR-MAR
Members: Da-in Choi, Sungho Yun, and Subong Hyun
Institution: Korea Advanced Institute of Science and Technology, South Korea
Final score = 1.06
A total of eight metrics were used to evaluate the submitted MAR images relative to the ground truth images. The metrics included RMSE, noise, image sharpness, streak amplitude, SSIM, metal integrity, bone integrity, and proton beam range for radiotherapy. For each metric, a fractional score between 0 (good) and 4 (bad) was assigned. A score of 0 corresponds to no relevant differences to ground truth, a score of 2 corresponds to the MAR capability of a state-of-the-art NMAR algorithm if applicable, and a score of 4 corresponds to no improvement. An overall score was computed by the average of the eight metrics over all cases.