Encrypted | Login

Policy number Policy name Policy date Sunset date
PUB 3-C Instructions to Authors and Reviewers: Dataset and Software Article 5/16/2026 12/31/2030
Section
No section assigned
Policy source
Policy text
Untitled Document

A. SCOPE AND SUBMISSION GUIDELINES

Dataset & Software Articles (DS) describe scientifically or clinically valuable open-access datasets and/or software (from now on referred to as material) with high potential for contributing to the research of medical physicists and other investigators working on related problems. In contrast to Research Articles, DS should not include hypothesis testing or data analyses supporting generalizable conclusions. DS should provide detailed descriptions of high impact and re-usable research/clinical practice material, including the value; scope; conditions of acquisition/use; known limitations; curation and quality assurance processes. For datasets, the amount and format of the data should be specified. For software, information on the format of the input and output should be specified. Comprehensive descriptive analysis is required and graphical visualizations are allowed and even encouraged to better understand the material. DS should focus on helping others reuse data and/or software rather than presenting new interpretations, methods, or in-depth analyses. Example of use cases beyond the original application are also encouraged.

As a condition of submission and acceptance of an DS, the authors must place the dataset or software (optionally including the corresponding code) files in a recognized and stable data archive prior to submission that makes the material available to other investigators with limited restrictions for commercial use but none for scientific purposes. As with all AAPM Journal  articles, submitted DS manuscripts will be peer-reviewed for possible publication. Editorial evaluation and peer review of DSs will consider their novelty, importance to the field, completeness quality of the dataset or software, reusability, and accessibility. The material must be made publicly available without restriction in the event that the DS is accepted for publication. DSs describing dataset collections of images derived from obsolete imaging devices; images or data acquired under poorly controlled conditions; or datasets lacking sufficient annotations, e.g., physician-drawn contours, segmentation landmark, or clinical diagnoses or clinical outcomes, to support hypothesis-driven research will likely be rejected. Software that is only applicable to obsolete technologies or whose validation has not been sufficiently documented will likely be rejected. It is essential that the authors make a convincing case for the value and the potential reusability of the material.

A.1 Data acquisition policies and selection of a repository

It is AAPM Journals’ policy that all key datasets DS-including computational data, curated data, and data acquired via an experimental or observational procedure – and/or all software -including compiled executables and/or software code and any and all necessary input or accessory files- described by an DSDS manuscript should be placed in an appropriate external repository prior to submission of the manuscript. We believe that this is the best means of making these materials discoverable, reproducible and reusable, and we will work with our authors to identify the most appropriate location(s) for them.

For datasets, authors should provide the data in datasets in the ’rawest’ form that will permit substantial reuse. It may be advantageous to release some types of data at multiple levels to enable their broadest reuse, for example, CT sinogram data may best be released as ’raw’ readings, including only detector response and background corrections, as well as more processed sinograms including flood fielding and water-equivalent beam hardening corrections. Authors may also submit supplementary information files – including code, models, workflows and summary tables – via secondary repositories or as supplementary files linked below the Conclusion section to be excluded from the PDF version of the article. However, primary data should not be submitted as supplementary information. Data, including headers and metadata, derived from patient studies must be fully anonymized and free of protected health information (PHI). All human or animal research studies from which the dataset was derived must have adhered to human health services (HHS) and all other applicable regulations and ethical guidelines with any necessary institutional review board (IRB) permissions.

For software, authors are encouraged but not necessarily need to provide the underlying code when posting compiled executables. However, the software should be accompanied by any and all data/parameter/input files required for use of the software, in addition to sufficient guides or "read me" files to clarify how the software is used, the computing environment(s) required, and the format of the expected output.

All code and information needed for the Referees to access and manipulate the material must be provided at submission.

Finding appropriate digital repositories for large and complex datasets and software files is challenging. Currently, neither the American Association of Physicists in Medicine (AAPM) nor our publisher are able to provide archiving and curation for large datasets. A trusted archive must

  • Be broadly supported and recognized within the scientific community
  • Ensure long-term persistence and preservation of datasets (> 10 years) in their published form including backup and web-based hosting capability
  • Provide the dataset with a Digital Object Identifier (DOI) that is referenced in the DS
  • Allow record versioning and file management
  • Must be able to support any restrictions on data access required to protect human research subjects as practiced by TCIA/TCGA. Such restrictions require Editor approval.
  • Provide curation that enables datasets to be stored in internationally recognized data formats (preferably DICOM, if applicable) with appropriately linked metadata

Currently, we cannot review DS until the material has been archived in a final form prior to submission. If the authors would prefer another repository or foresee some delays before finalizing the submission to the designated repository, this needs to be conferred with the Editor-in-Chief and the managing Deputy Editor.

Repository fees are the author’s responsibility. The NIH provides a resource list of repository options by file type and indexing/versioning criteria: (NIH Repository Chart). A generalist repository will meet most authors’ needs.

B. DATA ARTICLE FORMAT

DSs should be limited to 10 published pages. At the Editor’s discretion, additional pages may be published provided the authors are willing to assume excess page charges.

B.1 Structured Abstract (<300 words)

  • Purpose: Brief summary description of the dataset or software, including purpose, scope, target audience, and potential applications
  • Acquisition or Development and Validation Methods: For datasets, briefly identify population or phenomenon characterized and data acquisition, processing, and validation procedures. For software, briefly describe underlying algorithm and validation procedures.
  • Data Format and Usage Notes: For datasets, provide data types, number of subjects, population, formats. For software, describe implementation details, operating system(s) if software is only provided as a compiled executable, minimum/recommended computational requirements, and included input and output examples. For both types of manuscript, describe method of access and link to repository.
  • Potential Applications: Brief description of proposed scientific and/or clinical applications of the dataset or software and important limitations

B.2 Manuscript structure

B.2.1 Introduction:

Summary of the scientific and clinical background of the material, including a succinct summary of its novelty and expected clinical and/or scientific reuse and impact. The use of the data or description of the algorithm and/or underlying scientific methods used by the software in any prior publications should be described and cited here. Include a list of possible use cases for the material. In the case of software, describe the typical applications it is suited for, whether for research, clinical practice, and/or educational purposes.

Include a link to the material and relevant documentation at the end of the introduction to guide the reader toward accessing and using it.

B.2.2 Acquisition and Validation Methods:

B.2.2.1: Datasets

The Methods should include a detailed description of the experimental and/or computational procedures used in producing the data, including full descriptions of the experimental design, data acquisition assays, and any computational processing (e.g. normalization, image feature extraction). If the data are derived from observations of animal or human subjects, appropriate regulatory approvals should be cited and the protocol for subject selection and study design summarized, including anonymization procedure. Related methods should be grouped under corresponding subheadings where possible, and the methods should be described in enough detail so that other researchers can interpret and repeat, if required, the full study. Specific data outputs should be explicitly referenced via data citation (see Data Records and Data Citations, below).

For studies using computational tools in the generation or processing of datasets, a statement must be included in the Methods section, under the subheading "computational tools", indicating whether and how the associated code can be accessed, including any restrictions to access. This section should also include information on the versions of any software used, if relevant, and any specific variables or parameters used to generate, test, or process the current dataset. For example, if a condensed history code is used to generate particle tracks, the transport model used must be fully identified, along with all relevant parameter settings (step size, energy cutoffs, etc.) and cross section libraries used. While model and algorithm details maybe referenced, a general overview should be given. Reference to any previous publications using dataset should be highlighted.

The Data Validation subsection should present any experiments or analyses that are needed to support the technical quality of the dataset. This section may be supported by figures and tables, as needed. This is a required section; authors must provide information to justify the reliability of their data.

Possible content may include:

  • Experiments that support or validate the data-collection procedure(s), e.g. benchmarking of computer codes or instruments against known standards.
  • Statistical analyses of experimental error and variation, e.g., reproducibility of mammogram reader scores, measurement of inter-operator segmentation variability, landmark localization error of a nonrigid registration code.
  • Consistency checks, e.g., showing that reconstructed CT images from a documented sinogram dataset match vendor-reconstructed images.
  • General discussions of any procedures used to ensure reliable and unbiased data production, such as blinding and randomization, sample tracking systems, etc.
  • Known limitations or uncertainties associated with the dataset
  • Any other information needed for assessment of technical rigor by the referees

Generally, this should not include:

  • Follow-up experiments aimed at testing or supporting an interpretation of the data
  • Statistical hypothesis testing (e.g. tests of statistical significance, assessing performance of competing algorithms, trend analysis, etc.)
  • Exploratory computational analyses like clustering and annotation enrichment, unless such features are part of the dataset

B.2.2.2: Software

For software-describing manuscripts, this section should be broken down into three components: algorithm, implementation, and validation.

General algorithm/underlying theory

This section should provide a brief overview of the underlying theory or model used by the software. Assuming the algorithm/theory itself is already well-established and published elsewhere, this section should focus on briefly referencing the relevant work and explaining the core principles on which the software is based.

Implementation

This section should provide an in-depth explanation of the software architecture and functionality, so that the end user can:

  • Have an overview of the software structure. i.e., what every included file does, and how they interact with each other.
  • Clearly understand how the software works, including features and limitations.
  • If the source code is provided: this section should provide enough information for the user to be able to interact with it. e.g. modify or add features to the software, if needed.

The specific contents will depend on the architecture and whether source code is included. For software designed with an object-oriented architecture, this section should describe the key classes, methods, and modules, explaining how they interact to achieve the software’s intended functionality.

A README file should be included in the software repository, to deal, at a minimum, with software installation and user I/O. Therefore, this Implementation section is not to become an "user’s manual". This section should focus on the inner workings and implementation details of the software.

Validation

The validation section should describe the tests and measurements performed to ensure the accuracy and reliability of the software. If experimental validation has been carried out, provide details on the experimental setup and how the software results were compared to real-world data.

If the software has been validated against pre-existing models or benchmarks (e.g., previously validated software), this section should clearly explain the validation methodology and the criteria for success.

The focus here is to demonstrate that the software’s results are trustworthy and reliable within its intended scope.

Additional recommendations:

  • Use a collaborative software development platform that allows for management of software code changes, and supports public access capabilities when possible.
  • Assign a clear version number to your software.
  • Intellectual property (IP) concerns and restrictions exist for software in both public and private sectors. Assign a license that describes terms of software reuse and access. Check with your institution and/or sponsor for guidance on choosing an appropriate software license.
  • If software code cannot be publicly shared due to IP and/or licensing considerations, include a reference to a publication that describes the underlying logic and methods of software source code, if possible.

B.2.3 (for datasets only) Data Format and Usage Notes:

The Data Format section should open with an overview of the data file structures and their format including the repository where this information is stored and the methodology for accessing the data. The authors should strive to make the data presentation to follow the FAIR (Findable, Accessible, Interoperable, Re-usable) guidelines, which publishing as an DS will aid this process.

A detailed description should be given of the format of each data record associated with this work and the physical identity and units associated with each data field. For large, complex datasets, tables should be used to succinctly describe data record format and content and should clearly indicate the samples and subjects, their provenance, and the experimental manipulations performed on each with clear references to the methods described in the previous section.

This section should contain brief instructions to assist other researchers with access and manipulation of the data. This may include discussion of software packages that are suitable for analyzing the data, suggested downstream processing steps (e.g. normalization, etc.), or tips for integrating or comparing the data records with other datasets. Authors are highly encouraged to provide code, programs or data-processing workflows if they may help others understand or reuse the data. Any specialized software tools needed to access the data set or transform it into an accessible data format, such as binary data readers, interpolation code, or recovery of data from linear combinations of basis functions, should be provided in the data repository or linked to it using open-source. See the software guidance in Section B.2.2.2.

B.2.4 (for software only) Results:

Step-by-step example

This section should provide a guided example, featuring detailed explanations. The example should illustrate the main use case for the software, ensuring that users can follow along and understand how the software operates in practice.

All corresponding files (inputs, outputs, and relevant code) should be included in the software repository, with clear links or references to these files provided within the article.

Validation results

Present and discuss validation results. This section can also include figures or tables to help visualize the comparison between the measured data and the model’s outputs, allowing readers to clearly see the software’s performance.

Other resources and use cases

If additional files or examples are provided with the software, describe them here. These examples can be presented as proofs of concept that showcase the potential versatility of the software, even if they are not fully validated or outside the initial scope. For instance, describe particular cases or possible future applications of the software, or discuss scenarios where users might adapt the software beyond the core functionality (if applicable).

B.2.4 Discussion

This section should discuss future applications, analyses, hypotheses; or potential dataset/software extensions enabled by making the authors’ material available. Limitations of the proposed material with respect to future scientific uses and comparison to competing material of the same type should be addressed in this section.

For software, highlight any significant findings or observations from the validation process and how they support the intended applications. Briefly discuss the next steps, such as future improvements or extensions that could be made, or how the software could be adapted to different use cases in the future.

B.2.5 Conclusion

A succinct summary description of the material, its contribution to the literature, and potential applications

Appendices

Additional descriptions of files or parameters. Derivation of mathematical expressions used in the software, if no references exist.

B.2.6 Acknowledgements

In addition to acknowledging funding sources and contributions from non-authors, any potential conflicts of interest should be specified

B.2.7 References

Follow the formatting requirements outlined in the full Instructions to Authors

B.3 Author Checklist

  • The expected future utility, e.g., with use cases, is clearly described and illustrated if feasible.
  • Quality assurance and validation processes are succinctly described.
  • The use of any standards should be clearly described. It is highly desirable that existing ontologies and other standards are used where available.
  • For datasets, the manuscript should include a table that lists the key data types, key metadata, and the fraction of data for which each element is available. Preferably 100% of all key data fields are available, but this is often not possible with clinical data and needs to be clarified.
  • The process for locating and accessing the material is clearly described. The material  must be identified via a ’DOI’ (digital object identifier).
  • Material should be completely freely available. Exceptions will be granted only in cases where privacy, e.g., protection of personal health information (PHI), or public safety is a concern. For such cases, user registration and/or execution of a data sharing agreement may be required as a condition of access.
  • Note that a single DS could describe several related datasets that are useful within the same context of investigation. For instance, the case of radiogenomics studies, related imaging, molecular biomarkers, and clinical datasets may be compiled into one study dataset for completeness.
  • The adoption of any standards for nomenclature, semantic interoperability, or other ontology’s (https://bioportal.bioontology.org) should be well described.
  • For datasets, if software systems were used to generate the data, they should be described in full detail, including version numbers. Any deviation from this should be justified. If the software is not generically available, and it has an open-source license, it should be archived with the dataset.
  • Any filtering or other selection process used to select and generate the final dataset should be clearly described, potentially in a flowchart diagram.
  • Datasets must be de-identified. Upon request, the DS Review Team can provide specific guidance on tools to use, handling of the many fields in DICOM files where PHI can lurk, approaches for handling of dates/delta dates, age ranges, etc.
  • DSDS authorship guidelines:
    • All authors must have read and approved submission of the final version of the manuscript
    • Each author must have contributed substantially to the data collection/software development process, the design of the data collection/software development process, or to the data stewardship processes or the software validation processes.
    • Providing funding or resources, clinical access to patient subjects/information, or purely technical contributions alone do not qualify one for co-authorship

B.4 Instructions to Referees:

  1. The Review Team must include an expert in the research domain addressed by the material. This referee should address the following issues
    1. material novelty and impact
    2. material relevance and importance to addressing significant and current research questions in this domain
    3. Adequacy and quality of the material is sufficient to contribute to targeted research applications whether for validation purposes or initiating new questions
  2. The Review Team must include an expert in material access and curation. This referee should address the following issues
    1. Verify existence/accessibility of material via supplied DOI or other link
    2. Verify that material can be successfully downloaded
    3. Verify the software tools supplied or referenced can open/process the dataset/execute the software without errors
    4. Verify number and type of files, data structures, metadata supplied
    5. Graphically/qualitatively evaluate the accuracy/integrity/consistency of one randomly selected data structure
    6. Evaluate the ease of use of the software esp. compared to the description of an example use case in the manuscript.
    7. If a repository has already performed a curation process on the material (e.g., the TCIA) this should be noted. Otherwise, alternative curation methods should be highlighted
    8. Confirm the usability of the material for the intended purposes in the article.
Policy version history
Policy number Policy name Policy date Sunset date Active?
PUB 3-A Instructions to Authors and Reviewers: Medical Physics Dataset Article 3/20/2018 3/20/2023 Inactive
PUB 3-B Instructions to Authors and Reviewers: Medical Physics Dataset Article 3/21/2023 3/20/2028 Inactive
PUB 3-C Instructions to Authors and Reviewers: Dataset and Software Article 5/16/2026 12/31/2030 Active

Return to Policy Home