Medical image classification is a sub-subject of image classification. Many techniques in image classification can also be used on it. Such as many image enhanced methods to enhance the discriminable features for classification [20]. However, as CNN is an end to end solution for image classification, it will learn the feature by itself. Therefore, the literature about how to select and enhance features in the medical image will not be reviewed. The review mainly focuses on the application of traditional methods and CNN based transfer learning. And, on the capsule network on medical image related paper to investigate what factors in those models are essential to the final result and the gaps they haven’t included in their work.
ORB and SVM application on medical image classification
Paredes et al. [21] use small patches of medical images as local features and k-nearest neighbor (k-NN) to classify the categorization of the whole medical image, finally achieving start-of-art accuracy. Parveen and Sathik [22] researched to detect Pneumonia from X-rays. The authors extracted features by discrete wavelet transform (DWT), wavelet frame transform (WFT) moreover, wavelet packet transform (WPT) and used Fuzzy C-means to detect Pneumonia. Caicedo et al. [23] use scale-invariant feature transform (SIFT) as a local feature descriptor and use support vector machines (SVM) classifiers to classify medical images and get state-of-art precision at 67%. However, SIFT is a patent algorithm. Thus, Rublee et al. [24] propose a free, faster local feature descriptor-oriented fast and rotated binary robust independent elementary features (ORB), which has the same performance as SIFT and even better performance than SIFT under some condition. SVM is also a high-performance classification algorithm, widely used in different medical image classification tasks by other researchers, and achieves an excellent performance [25, 26]. Therefore, this report uses ORB and SVM as the representation of the traditional methods.
CNN on medical image classification
With the different CNN-based deep neural networks developed and achieved a significant result on ImageNet Challenger, which is the most significant image classification and segmentation challenge in the image analyzing field [27]. The CNN-based deep neural system is widely used in the medical classification task. CNN is an excellent feature extractor, therefore utilizing it to classify medical images can avoid complicated and expensive feature engineering. Qing et al. [28] presented a customized CNN with shallow ConvLayer to classify image patches of lung disease. The authors also found that the system can be generalized to other medical image datasets. Moreover, in other research, it also found that CNN based system can be trained from big chest X-ray (CXR) film dataset and state-of-art with high accuracy and sensitivity results on their dataset, like Stanford Normal Radiology Diagnostic Dataset containing more than 400,000 CXR and a new CXR database (ChestX-ray8), which consist of 108,948 frontal-view CXR [29]. Moreover, using limited data makes it hard to train an adequate model. Therefore the transfer learning of CNN is wildly used in medical image classification tasks. Kermany et al. [3] use InceptionV3 with ImageNet trained weight and transfer learning on a medical image dataset containing 108,312 optical coherence tomography (OCT) images. They got an average accuracy of 96.6%, with a sensitivity of 97.8% and a specificity of 97.4%. The authors also compared the results with six human experts. Most of the experts got high sensitivity but low specificity, while the CNN-based system got high values on both sensitivity and specificity. Moreover, on the average weight error measure, the CNN-based system exceeds two human experts. The authors also verified their system on a small pneumonia dataset, including about five thousand images, and achieved an average accuracy of 92.8%, with a sensitivity of 93.2% and a specificity of 90.1%. This system finally may help in accelerating diagnosis and referral of patients and therefore introduce early treatment, resulting in an increased cure rate. Moreover, Vianna [30] also studied how to utilize transfer learning to build an X-ray image classification system that is the critical component of a computer-aided-diagnosis system. The authors found a fine-tuned transfer learning system with data augmentation effectively alleviate overfitting problem and yield a better result than two other models: training from scratch and a transfer learning model with only a retrained last classification layer.
Capsule neural network on medical image classification
As mentioned in the previous section, the CapsNet was invented in 2017 [16]. Therefore, the research about it is not as fruitful as CNN. However, there is still some research on applying them to the different datasets and varying fields due to its excellent feature—Equivariance. This means the spatial relationship of objects in an image is kept, and at the same time, the result does not impact the object’s orientation and size. Afshar et al. [18] applied CapsNet to classifying brain tumors on Magnetic Resonance Imaging (MRI) images and got 86.56% prediction accuracy with a modified CapsNet that reduces the feature maps from the original 256 to 64.
Moreover, Tomas and Robertas [31] presented a CapsNet based solution to classify four types of breast tissue biopsies from breast cancer histology images. They achieved 87% accuracy with the same high sensitivity. Jimenez-Sanchez et al. [5] evaluated the CapsNet on medical image challenges. The authors selected a CNN with three layers of ConvLayer as the baseline and compared CapsNet’s performance with LeNet and the baseline on four datasets, MNIST, Fashion-MNIST, mitosis detection (TUPAC16) and diabetic retinopathy detection (DIARETDB1), with three conditions: the partial subset of the dataset, the imbalanced subset of the dataset and data augmentation. The final result shows CapsNet performed better than the other two networks in a small, imbalanced dataset. Beşer et al. [32] implemented a sign language recognizing system by CapsNet and achieved 94.2% validation accuracy. Moreover, some researchers studied internal mechanics by varying network structures under different conditions. Xi et al. [33] studied the impact of different network structures on a complex dataset CIFAR10. The authors choose the following options:
- 1.
Increase the number of primary capsule layers.
- 2.
Increase the capsule number in primary capsule layer.
- 3.
Assemble multiple models and average the result.
- 4.
Adjust the scaling factor of reconstruction loss.
- 5.
Add more ConvLayer.
- 6.
Evaluate other activation function.
Finally, the authors found more ConvLayers and more models assembled, which have more effect on improving the final accuracy. Moreover, also they achieved the highest result with a 7-model assembled CapsNet with a more ConvLayer than the original version of Sabour’s. Furthermore, The CapsNet of Tomas and Robertas used to classify breast cancer increased the ConvLayer to five layers. On the other hand, Afshar et al. [18] also evaluated the different options of CapsNet. They fine-tuned the input size, number of feature maps, number of ConvLayers, capsule number in primary CapsLayer, dimension number in Primary Capsule, and the neuron number in reconstruction layers. The authors got the best results with a CapsNet having a \(64\times 64\) input image (original is \(28\times 28\)) and fewer feature map, which reduces to 64 from the original 256. Also, the authors found that increasing the routing iteration number beyond three will not improve the performance on the four datasets: MNIST, Fashion-MNIST, The Street View House Numbers (SVHN) dataset, and Canadian Institute for Advanced Research 10 (CIFAR10) dataset. From the previous reviews, it can be seen that the traditional method (SVM with ORB feature), CNN based transfer learning, and Capsule network can all use on the medical image dataset. Just looking at the value of accuracy on different datasets, CNN based transfer learning looks have better performance than the other two methods. However, they have not been compared to the same dataset. Therefore, this paper will compare their performance on the same dataset-the pneumonia dataset.
Moreover, there are so many different options when fine-tuning the parameter of those methods. The traditional method has so many features and classifying algorithms which can be evaluated. They cannot be iterated in this paper due to the limited time. As the baseline, the traditional method choose ORB as the feature and linear SVM as the classifier. As the data augmentation is a data preprocessing method that can apply to all three methods, it also will be evaluated on the traditional method. For CNN-based transfer learning, the layers of retrained ConvLayer, the complexity of classification layers, the dropout rate has significant effects on the final result. Therefore, they will be evaluated by this research. Based on the same research, the critical fact in capsule network: the number of the feature map, the number of the capsules, and the channels of the capsule will also be evaluated in this report.