Deep convolutional neural network based medical image classification for disease diagnosis

Yadav, Samir S.; Jadhav, Shivajirao M.

doi:10.1186/s40537-019-0276-2

Research
Open access
Published: 17 December 2019

Deep convolutional neural network based medical image classification for disease diagnosis

Journal of Big Data volume 6, Article number: 113 (2019) Cite this article

96k Accesses
553 Citations
10 Altmetric
Metrics details

Abstract

Medical image classification plays an essential role in clinical treatment and teaching tasks. However, the traditional method has reached its ceiling on performance. Moreover, by using them, much time and effort need to be spent on extracting and selecting classification features. The deep neural network is an emerging machine learning method that has proven its potential for different classification tasks. Notably, the convolutional neural network dominates with the best results on varying image classification tasks. However, medical image datasets are hard to collect because it needs a lot of professional expertise to label them. Therefore, this paper researches how to apply the convolutional neural network (CNN) based algorithm on a chest X-ray dataset to classify pneumonia. Three techniques are evaluated through experiments. These are linear support vector machine classifier with local rotation and orientation free features, transfer learning on two convolutional neural network models: Visual Geometry Group i.e., VGG16 and InceptionV3, and a capsule network training from scratch. Data augmentation is a data preprocessing method applied to all three methods. The results of the experiments show that data augmentation generally is an effective way for all three algorithms to improve performance. Also, Transfer learning is a more useful classification method on a small dataset compared to a support vector machine with oriented fast and rotated binary (ORB) robust independent elementary features and capsule network. In transfer learning, retraining specific features on a new target dataset is essential to improve performance. And, the second important factor is a proper network complexity that matches the scale of the dataset.

Introduction

Effectively classifying medical images play an essential role in aiding clinical care and treatment. For example, Analysis X-ray is the best approach to diagnose pneumonia [1] which causes about 50,000 people to die per year in the US [2], but classifying pneumonia from chest X-rays needs professional radiologists which is a rare and expensive resource for some regions.

The use of the traditional machine learning methods, such as support vector methods (SVMs), in medical image classification, began long ago. However, these methods have the following disadvantages: the performance is far from the practical standard, and the developing of them is quite slow in recent years. Also, the feature extracting and selection are time-consuming and vary according to different objects [3]. The deep neural networks (DNN), especially the convolutional neural networks (CNNs), are widely used in changing image classification tasks and have achieved significant performance since 2012 [4]. Some research on medical image classification by CNN has achieved performances rivaling human experts. For example, CheXNet, a CNN with 121 layers trained on a dataset with more than 100,000 frontal-view chest X-rays (ChestX-ray 14), achieved a better performance than the average performance of four radiologists. Moreover, Kermany et al. [3] propose a transfer learning system to classify 108,309 Optical coherence tomography (OCT) images, and the weighted average error is equal to the average performance of 6 human experts.

The medical images are hard to collect, as the collecting and labeling of medical data confronted with both data privacy concerns and the requirement for time-consuming expert explanations. In the two general resolving directions, one is to collect more data, such as crowdsourcing [5] or digging into the existing clinical reports [6]. Another way is studying how to increase the performance of a small dataset, which is very important because the knowledge achieved from the research can migrate to the research on big datasets. In addition to this, the most significant published chest X-ray image dataset (ChestX-ray 14) is still far smaller than the biggest general image dataset-ImageNet which has reached 14,197,122 instances at 2010 [7, 8].

CNN-based methods have various strategies to increase the performance of image classification on small datasets: One method is data augmentation [9,10,11,12]. Wang and Perez [13] researched the effectiveness of data augmentation in image classification. The authors found the traditional transform-based data augmentation has better performance than generative adversarial network (GAN) and other neural network-based methods. Another method is transfer learning [3, 12, 14, 15]. Kermany et al. [3] achieved 92% accuracy on a small pneumonia X-rays image dataset by transfer learning. The third method is the capsule network. Sabour et al. [16] invented a new neural network structure-capsule network, which achieves state-of-the-art performance on the Modified National Institute of Standards and Technology (MNIST) database [17]. And, also the best performance on other small datasets. Afshar et al. [18] have utilized capsule network to detect brain tumors and got 86.56% accuracy.

However, some gaps are needing to be noticed. A limitation of Kermany’s research is they use the InceptionV3 model and stop retrain the convolutional layer of InceptionV3 because of the overfitting. Therefore, other models and the effects of retraining the convolutional layer will be evaluated in this research. Moreover, Afshar et al. [18] did not compare the performance of capsule network with other methods. Therefore, the contributions of this report include:

Performance comparison of three different classification methods: SVM classifier with oriented fast and rotated binary robust independent elementary features (ORB), transfer learning of VGG16 and InceptionV3, and training capsule network from scratch.
An analysis of the effects of data augmentation, network complexity, fine-tuned convolutional layer, and other preventing overfitting mechanics on the classification of small chest X-ray dataset by transfer learning of CNN.

This article conducts four groups of experiments. The SVM with ORB runs on a standard Machine. The convolutional neural network (CNN) related analyses are all run on a virtual machine with an Nvidia Tesla K80 Graphic card in Google Cloud [19].

The remainder of the article ordered as follows: “Literature review” section reviews the related literature on medical image classification. “Experimental design” section describes the design of experiments. “Experimental results” section presents the result of the experiments, and “Discussion” section discusses the results. Finally, the conclusion is drawn, and the future work described, followed by references.

Literature review

Medical image classification is a sub-subject of image classification. Many techniques in image classification can also be used on it. Such as many image enhanced methods to enhance the discriminable features for classification [20]. However, as CNN is an end to end solution for image classification, it will learn the feature by itself. Therefore, the literature about how to select and enhance features in the medical image will not be reviewed. The review mainly focuses on the application of traditional methods and CNN based transfer learning. And, on the capsule network on medical image related paper to investigate what factors in those models are essential to the final result and the gaps they haven’t included in their work.

ORB and SVM application on medical image classification

Paredes et al. [21] use small patches of medical images as local features and k-nearest neighbor (k-NN) to classify the categorization of the whole medical image, finally achieving start-of-art accuracy. Parveen and Sathik [22] researched to detect Pneumonia from X-rays. The authors extracted features by discrete wavelet transform (DWT), wavelet frame transform (WFT) moreover, wavelet packet transform (WPT) and used Fuzzy C-means to detect Pneumonia. Caicedo et al. [23] use scale-invariant feature transform (SIFT) as a local feature descriptor and use support vector machines (SVM) classifiers to classify medical images and get state-of-art precision at 67%. However, SIFT is a patent algorithm. Thus, Rublee et al. [24] propose a free, faster local feature descriptor-oriented fast and rotated binary robust independent elementary features (ORB), which has the same performance as SIFT and even better performance than SIFT under some condition. SVM is also a high-performance classification algorithm, widely used in different medical image classification tasks by other researchers, and achieves an excellent performance [25, 26]. Therefore, this report uses ORB and SVM as the representation of the traditional methods.

CNN on medical image classification

With the different CNN-based deep neural networks developed and achieved a significant result on ImageNet Challenger, which is the most significant image classification and segmentation challenge in the image analyzing field [27]. The CNN-based deep neural system is widely used in the medical classification task. CNN is an excellent feature extractor, therefore utilizing it to classify medical images can avoid complicated and expensive feature engineering. Qing et al. [28] presented a customized CNN with shallow ConvLayer to classify image patches of lung disease. The authors also found that the system can be generalized to other medical image datasets. Moreover, in other research, it also found that CNN based system can be trained from big chest X-ray (CXR) film dataset and state-of-art with high accuracy and sensitivity results on their dataset, like Stanford Normal Radiology Diagnostic Dataset containing more than 400,000 CXR and a new CXR database (ChestX-ray8), which consist of 108,948 frontal-view CXR [29]. Moreover, using limited data makes it hard to train an adequate model. Therefore the transfer learning of CNN is wildly used in medical image classification tasks. Kermany et al. [3] use InceptionV3 with ImageNet trained weight and transfer learning on a medical image dataset containing 108,312 optical coherence tomography (OCT) images. They got an average accuracy of 96.6%, with a sensitivity of 97.8% and a specificity of 97.4%. The authors also compared the results with six human experts. Most of the experts got high sensitivity but low specificity, while the CNN-based system got high values on both sensitivity and specificity. Moreover, on the average weight error measure, the CNN-based system exceeds two human experts. The authors also verified their system on a small pneumonia dataset, including about five thousand images, and achieved an average accuracy of 92.8%, with a sensitivity of 93.2% and a specificity of 90.1%. This system finally may help in accelerating diagnosis and referral of patients and therefore introduce early treatment, resulting in an increased cure rate. Moreover, Vianna [30] also studied how to utilize transfer learning to build an X-ray image classification system that is the critical component of a computer-aided-diagnosis system. The authors found a fine-tuned transfer learning system with data augmentation effectively alleviate overfitting problem and yield a better result than two other models: training from scratch and a transfer learning model with only a retrained last classification layer.

Capsule neural network on medical image classification

As mentioned in the previous section, the CapsNet was invented in 2017 [16]. Therefore, the research about it is not as fruitful as CNN. However, there is still some research on applying them to the different datasets and varying fields due to its excellent feature—Equivariance. This means the spatial relationship of objects in an image is kept, and at the same time, the result does not impact the object’s orientation and size. Afshar et al. [18] applied CapsNet to classifying brain tumors on Magnetic Resonance Imaging (MRI) images and got 86.56% prediction accuracy with a modified CapsNet that reduces the feature maps from the original 256 to 64.

Moreover, Tomas and Robertas [31] presented a CapsNet based solution to classify four types of breast tissue biopsies from breast cancer histology images. They achieved 87% accuracy with the same high sensitivity. Jimenez-Sanchez et al. [5] evaluated the CapsNet on medical image challenges. The authors selected a CNN with three layers of ConvLayer as the baseline and compared CapsNet’s performance with LeNet and the baseline on four datasets, MNIST, Fashion-MNIST, mitosis detection (TUPAC16) and diabetic retinopathy detection (DIARETDB1), with three conditions: the partial subset of the dataset, the imbalanced subset of the dataset and data augmentation. The final result shows CapsNet performed better than the other two networks in a small, imbalanced dataset. Beşer et al. [32] implemented a sign language recognizing system by CapsNet and achieved 94.2% validation accuracy. Moreover, some researchers studied internal mechanics by varying network structures under different conditions. Xi et al. [33] studied the impact of different network structures on a complex dataset CIFAR10. The authors choose the following options:

1.
Increase the number of primary capsule layers.
2.
Increase the capsule number in primary capsule layer.
3.
Assemble multiple models and average the result.
4.
Adjust the scaling factor of reconstruction loss.
5.
Add more ConvLayer.
6.
Evaluate other activation function.

Finally, the authors found more ConvLayers and more models assembled, which have more effect on improving the final accuracy. Moreover, also they achieved the highest result with a 7-model assembled CapsNet with a more ConvLayer than the original version of Sabour’s. Furthermore, The CapsNet of Tomas and Robertas used to classify breast cancer increased the ConvLayer to five layers. On the other hand, Afshar et al. [18] also evaluated the different options of CapsNet. They fine-tuned the input size, number of feature maps, number of ConvLayers, capsule number in primary CapsLayer, dimension number in Primary Capsule, and the neuron number in reconstruction layers. The authors got the best results with a CapsNet having a \(64\times 64\) input image (original is \(28\times 28\)) and fewer feature map, which reduces to 64 from the original 256. Also, the authors found that increasing the routing iteration number beyond three will not improve the performance on the four datasets: MNIST, Fashion-MNIST, The Street View House Numbers (SVHN) dataset, and Canadian Institute for Advanced Research 10 (CIFAR10) dataset. From the previous reviews, it can be seen that the traditional method (SVM with ORB feature), CNN based transfer learning, and Capsule network can all use on the medical image dataset. Just looking at the value of accuracy on different datasets, CNN based transfer learning looks have better performance than the other two methods. However, they have not been compared to the same dataset. Therefore, this paper will compare their performance on the same dataset-the pneumonia dataset.

Moreover, there are so many different options when fine-tuning the parameter of those methods. The traditional method has so many features and classifying algorithms which can be evaluated. They cannot be iterated in this paper due to the limited time. As the baseline, the traditional method choose ORB as the feature and linear SVM as the classifier. As the data augmentation is a data preprocessing method that can apply to all three methods, it also will be evaluated on the traditional method. For CNN-based transfer learning, the layers of retrained ConvLayer, the complexity of classification layers, the dropout rate has significant effects on the final result. Therefore, they will be evaluated by this research. Based on the same research, the critical fact in capsule network: the number of the feature map, the number of the capsules, and the channels of the capsule will also be evaluated in this report.

Experimental design

Data neural network on medical image classification

The Dataset comes from the work of Kermnay et al. [34]. It contains two kinds of chest X-ray Images: NORMAL and PNEUMONIA, which are stored in two folders. In the PNEUMONIA folder, two types of specific PNEUMONIA can be recognized by the file name: BACTERIA and VIRUS. Table 1 describes the composition of the dataset. The training dataset contains 5232 X-ray images, while the testing dataset contains 624 images. In the training dataset, the image in the NORMAL class only occupies one-fourth of all data. In the testing dataset, the PNEUMONIA consists of 62.5% of all data, which means the accuracy of the testing data should higher 62.5%.

Table 1 The composition of chest X-ray dataset

Deep convolutional neural network based medical image classification for disease diagnosis

Abstract

Introduction

Literature review

ORB and SVM application on medical image classification

CNN on medical image classification

Capsule neural network on medical image classification

Experimental design

Data neural network on medical image classification

Environment setup

Hardware

Software

Data augmentation design

ORB and SVM application experiments design

Transfer learning experiments design

Capsule neural network design

Experimental results

ORB and SVM classification

Transfer learning classification

Capsule neural network

Verify on OCT dataset

Discussion

The effects of data augmentation

The finding on fine-tune of transfer learning

The finding on capsule network

Horizontal comparison

Finding in verifying on OCT dataset

Conclusions and future work

Data availability statement

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Consent for publication

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords