Paper
16 March 2020 Artificially augmenting data or adding more samples? A study on a 3D CNN for lung nodule classification
Panagiotis Gonidakis, Bart Jansen, Jef Vandemeulebroucke
Author Affiliations +
Abstract
Convolutional neural networks are known to require large amounts of data to achieve optimal performance. In addition, data is commonly computationally augmented using a variety of geometric and intensity transformations to further extent the set of training samples. In medical imaging, annotated data is often scarce or costly to obtain, and there is considerable interest in methods to reduce the amount of data needed. In this work, we investigate the relative benefit of increasing the amount of original data, with respect to computationally augmenting the amount of training samples, for the case of false positive reduction of lung nodules candidates. To this end, we have implemented a previously published topology for classification, shown to achieve state of the art results on the publicly available Luna16 dataset. Numerous models were trained using different amounts of unique training samples and different degrees of data augmentation involving rotations and translations, and the performance was compared. Results indicate that in general, better performance is achieved when increasing the amount of data, or augmenting the data more extensively, as expected. Surprisingly however, we observed that after reaching a certain amount of unique training samples, data augmentation leads to significantly better performance compared to adding the same number of new samples to the training dataset. We hypothesize that the augmentation has aided in learning more general {rotation and translation invariant-features, leading to improved performance on unseen data. Future experiments include more detailed characterization of this behavior, and relating this to the topology and amount of parameters to be trained.
© (2020) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Panagiotis Gonidakis, Bart Jansen, and Jef Vandemeulebroucke "Artificially augmenting data or adding more samples? A study on a 3D CNN for lung nodule classification", Proc. SPIE 11314, Medical Imaging 2020: Computer-Aided Diagnosis, 113142F (16 March 2020); https://doi.org/10.1117/12.2549810
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Lung

Data modeling

Medical imaging

3D modeling

Computed tomography

Computing systems

Performance modeling

Back to Top