CEU-Net: ensemble semantic segmentation of hyperspectral images using clustering

Soucy, Nicholas; Sekeh, Salimeh Yasaei

doi:10.1186/s40537-023-00718-3

Research
Open access
Published: 12 April 2023

CEU-Net: ensemble semantic segmentation of hyperspectral images using clustering

Nicholas Soucy¹ &
Salimeh Yasaei Sekeh¹

Journal of Big Data volume 10, Article number: 43 (2023) Cite this article

1834 Accesses
4 Citations
Metrics details

Abstract

Most semantic segmentation approaches of big data hyperspectral images use and require preprocessing steps in the form of patching to accurately classify diversified land cover in remotely sensed images. These approaches use patching to incorporate the rich spatial neighborhood information in images and exploit the simplicity and segmentability of the most common datasets. In contrast, most landmasses in the world consist of overlapping and diffused classes, making neighborhood information weaker than what is seen in common datasets. To combat this common issue and generalize the segmentation models to more complex and diverse hyperspectral datasets, in this work, we propose a novel flagship model: Clustering Ensemble U-Net. Our model uses the ensemble method to combine spectral information extracted from convolutional neural network training on a cluster of landscape pixels. Our model outperforms existing state-of-the-art hyperspectral semantic segmentation methods and gets competitive performance with and without patching when compared to baseline models. We highlight our model’s high performance across six popular hyperspectral datasets including Kennedy Space Center, Houston, and Indian Pines, then compare them to current top-performing models.

Introduction

Between climate change, invasive species, and logging enterprises, it is important to know which ground types are where on a large scale. Recently, due to the widespread use of satellite imagery, big data hyperspectral images (HSI) are available to be utilized on a grand scale in ground-type semantic segmentation [1,2,3,4].

Ground-type semantic segmentation is a challenging problem in HSI analysis and the remote sensing domain. Ground types in a natural forest environment are overlapping, diverse, similar, and diffused. In contrast, the two most common datasets, Indian pines, and Salinas [5] datasets are small and land-separated. Due to the already segmented nature of farmland and small sample size, the techniques that apply to these datasets do not translate well to large complex natural forests. In contrast, recent advancements in remote sensing imaging have increased spectral resolution exponentially which affects the segmentation models’ performance significantly [6]. Therefore, models that exploit the rich spectral information more efficiently can see higher accuracy without a large performance cost.

Patching

Patching is a practical preprocessing technique that often increases the overall test accuracy of a semantic segmentation model by using spatial neighborhood information via overlapping patches [6,7,8,9,10]. Patching is implemented by three approaches: exclusive, majority, and center pixel classification. Examples of these patching techniques are described in Fig. 1. Despite patching improving the performance of segmentation models with particular datasets like Indian Pines and other farmland datasets [10], it is often not as useful in datasets that have diverse overlapping classes as shown in Tables 5, 6, 7, and 8. Due to the limited number of labeled samples and the nature of individual pixel classification, exclusive and majority patching are rarely used in hyperspectral semantic segmentation models because these techniques would further reduce the dataset size. In addition, exclusive and majority patching would not be possible in datasets with diverse and overlapping classes. Therefore, we focus on center pixel classification in the following sections.

Center pixel classification (CPC) is used in more recent works including one of our baseline models HybridSN [6]. The CPC method is implemented by taking a patch of size n x n for each pixel in the dataset as input to the model to capture the spatial neighbors of the pixel. This exponentially increases the time complexity of training and testing due to each sample being an n x n x w patch, where w is the number of spectral bands, instead of just a single pixel with spectral bands w. This technique can work for many datasets where other techniques like exclusive and majority patching will not because the size of the dataset is not reduced, and datasets with overlapping classes can still be classified. This can lead to a dramatic increase in time complexity with diminishing returns to test accuracy if the neighborhood information is not as useful. Figure 1 bottom left visually demonstrates CPC.

There has been an effort in recent works that focus on neighborhood information instead of spectral information to further increase semantic segmentation accuracy in the popular HSI datasets Indian Pines, Salinas, and Pavia University [8, 9]. For example, due to industrial farming techniques, corn is grown in a single patch, therefore a pixel of corn will be accompanied by other corn pixels. This ensures that the neighbors of each pixel in a single class are all similar. In addition, other datasets like Pavia University and Houston focus on areas of man-made structures that are also easily segmentable. This information can be used in the classification network to much success. However, once most of the land types HSI researchers are interested in have diverse overlapping classes, neighborhood information is weak. The dataset Kennedy Space Center (KSC) is the closest example of this phenomenon and contains classes that are more spectrally similar like different tree types (see Fig. 1). The KSC dataset is often left out of many works due to its small labeled sample size. However, we include KSC to compare and contrast the performance of existing patching-based methods and highlight the weakness in their assumptions.

In this paper, we provide an extensive and systematic discussion on both the benefits and drawbacks of patching and validate our analysis with experimental results.

Feature reduction

The uniqueness of HSI data in remote sensing is the rich spectral features for each pixel. Due to a large number of features, a reduction is often necessary to reduce training runtimes [12, 13]. This is a common practice within HSI semantic segmentation and classification in general.

In HSI Semantic Segmentation, papers [10, 14] focuses on semantic segmentation and/or feature reduction while using neighborhood information. In addition, the top feature reduction methods often use random forest or support vector machine classifiers instead of neural network-based semantic segmentation methods in their research [7]. In this paper, we explore dimensionality reduction techniques that can select pertinent spectral features in the data for later classification in traditional and cutting-edge neural network-based classifiers without neighborhood information; thereby reducing runtime complexity and storage size for classification, while maximizing overall classification accuracy.

In this paper, we experimentally determine that deep neural network feature reduction techniques, like autoencoders, do not beat projection-based feature reduction techniques when using spectral information only.

Semantic segmentation

In remote sensing semantic segmentation, techniques have been focusing on higher dimensional convolutional neural networks (CNNs) to better incorporate neighborhood information. Older techniques used 1D CNNs to use only the spectral information in a given pixel, however, 2D and 3D CNNs have seen greater success with only spectral information included in the training process [15]. Recent works in HSI semantic segmentation have been focusing on using these 2D and 3D CNNs to incorporate neighborhood information in the form of patching [6, 10, 16]. This recent research has mostly ignored the development of spectral-only semantic segmentation, which has notably faster runtimes.

In recent works outside of remote sensing, semantic segmentation has seen great strides in the medical field with the introduction of a novel deep neural network (DNN) architecture called U-Net [17,18,19]. The idea to use U-Net for semantic segmentation in HSI has to our knowledge been done only once from the paper AeroRIT [10]. The novelty in their U-Net architecture adds complexity via a custom squeeze and excitation block. However, with a high number of features, their custom layer increases the time complexity exponentially. In addition, AeroRIT did not include studies on other datasets and they used neighborhood information in the form of patching as a preprocessing step.

To combat these issues in HSI semantic segmentation, we increase the effectiveness of U-Net with our novel Clustering Ensemble U-Net (CEU-Net) by using an ensemble method to create separate parallel models that are trained in subsets of pixels for better overall classification accuracy.

Our goal in this paper is to develop a semantic segmentation model that is more dataset independent and provides competitive performance versus baselines with and without implementing patching as a preprocessing step.

Ensemble methods

Ensemble learning aims to create a collection of individual classifiers to increase the accuracy of classification/semantic segmentation models. There are three general approaches to ensemble learning: Bagging, Boosting, and Stacking [20, 21].

1
Bagging: Bagging is an ensemble technique that extracts a subset of the dataset to train sub-classifiers. Each sub-classifier and subset are independent of one another and are therefore parallel. The results of the overall bagging method can be determined through a voted majority or a concatenation of the sub-classifier outputs [20].
2
Boosting: Boosting was first developed by the famous algorithm AdaBoost [22]. In boosting the complete dataset is used to train each sub-classifier, then after each iteration, the weights are adjusted for the overall ensemble network to improve classification accuracy [20].
3
Stacking: Stacking is the most unique of the ensemble methods because instead of paralleling the networks like Bagging and Boosting, sub-classifiers are stacked on top of each other in a linear fashion. Therefore, making the output of one sub-classifier the input for another to create a whole ensemble stacking model [21].

For the HSI domain, large data is common creating exponentially increasing running times. In contrast, methods like boosting and stacking can be incredibly costly to running time. Methods like bagging, however, could be implemented to increase accuracy while decreasing runtime. Therefore, in this paper, we aim to create a bagging ensemble method to increase classification accuracy and reduce runtime complexity.

To summarize, our contributions in this paper are,

1
Debuting Clustering Ensemble U-Net, CEU-Net, for HSI semantic segmentation to get more competitive accuracies with and without neighborhood information.
2
Empirical analysis on the common preprocessing technique of patching and focusing more on spectral information instead of neighborhood information to make our model, CEU-Net, more data independent.
3
Experimental analysis on deep neural network-based feature reduction techniques while using only spectral information.

Related works

Current machine learning (ML) based solutions employing neural networks focus on semantic segmentation. Due to the lack of sufficient labeled samples in popular HSI datasets, this is often treated as a pixel classification problem.

Recent works have been using 2D and 3D CNN in both feature reduction and semantic segmentation techniques to implement neighborhood information in addition to spectral information [6, 16]. The works that focus on 2D CNN architectures [23] are older, however, more recent works have focused on 3D CNN architectures or 2D-3D hybrids with greater success [6, 16].

Several works including [6,7,8] employ a combination of three datasets: Indian Pines, Salinas, and Pavia University due to their well-labeled nature and easy access. We will be focusing on these datasets in addition to Kennedy Space Center, Botswana [5] and Houston [24].

Neighborhood information

The use of neighborhood information is not new in HSI semantic segmentation, almost all of the CNN models for HSI semantic segmentation use neighborhood information in the form of patching as a preprocessing step [6, 23, 25]. Models use neighborhood information due to the nature of the most popular HSI datasets: Indian Pines, Salinas, and Pavia University. These flagship datasets are popular due to their consistent use and the number of labeled pixels. However, the vast majority of HSI images are of dense forest areas with diverse ground types but are not labeled [7, 26].

In [11], the authors discuss patching and its shortcomings by demonstrating how patching only exploits the local spatial information and results in high noise in the data when classes overlap frequently. They propose a full patching network called SPNet with an end-to-end deep learning architecture to do the spectral patching instead of manual analysis. However, SPNet is still a network-based approach that adds significant runtime to semantic segmentation over the common patching method CPC. Further, this work shows how patching is not always the best approach to semantic segmentation. Therefore, we do not include this in this paper, as we focus on improving solely spectral information in our semantic segmentation network for datasets that are more complex like tree species data.

Feature reduction

Certain Bands of light in hyperspectral images might not be as important for classification based on the labeled ground types. Once deep neural network algorithms are quite computationally expensive, reducing the number of input features would increase runtime dramatically. In addition to runtimes, fewer input features often correspond to fewer parameters in the classification model. A model with too many parameters is prone to overfitting issues. Our goal is to improve training runtime and overcome overfitting challenges in semantic segmentation models for HIS by reducing the feature size.

One paper [27] uses feature selection to reduce HSI feature size. The top-performing feature selection method in the paper was a Sequential feature selector (SFS). SFSs work by removing or adding one feature at a time, then performing classification on that feature subset until the feature subset is of the desired size. A drawback of SFSs is that they are supervised and are a greedy search algorithm. Also in [27] different feature selection algorithms were explored like Random Forest and Support Vector Machines (SVMs). However, most of these methods are outperformed by neural network approaches [14]. The optimal feature selector from [27]: SFS, guarantees that we get the optimal feature subset as it goes through each permutation of the feature space, but it is prohibitively computationally expensive. It has been shown that Principal Component Analysis (PCA) can reduce the size and incorporates information on the original features all while being unsupervised and computationally less expensive than SFSs and other feature selectors [28].

To increase the selection of pertinent features, many works now focus on neural network-based feature reduction techniques. Self-Organizing Maps (SOMs) [29] are similar to neural networks as they employ neurons, but their architecture is quite different. Rather than a series of connected layers, SOMs are composed of a single-layer linear 2D grid of neurons. Each node on the grid is connected to the input vector, but not one another. None of these nodes knows the weights of the other nodes. The grid acts as the map that organizes itself at each iteration based on the input data. Each node has its 2D coordinate that allows the calculation of the Euclidean distance between each node. In [30], the authors propose an unsupervised method for the dimensionality reduction of hyperspectral images based on Kohonen’s self-organized maps. However, SOMs have dramatically increased runtime when compared to projection-based methods like PCA.

In addition to semantic segmentation, one can learn the feature representations using convolutional networks, for example, in [31] the authors proposed a model called CNNiN that has two parts, a feature learning and a semantic segmentation section that are attached linearly. In the feature learning part, they use a general convolutional network that acts like a U-Net or autoencoder. They have a contracting path that embeds the features into a smaller space, then an expansion path that embeds the desired feature size for classification. Once they have their feature reduction and classification connected in one network, this feature reduction approach is supervised. In addition, the CNNiN method uses neighborhood spacial information in the form of patching and does not get competitive results when compared to other deep learning feature reduction approaches like autoencoders [31, 32].

Recurrent Neural Networks (RNNs) were developed to tackle many recursive machine learning problems including feature reduction. One of the RNN based HSI feature reduction is long short term memory network in which cells solve the vanishing/exploding gradient problem in the backpropagation and can effectively capture contextual information of adjacent data. However, like most deep learning approaches in hyperspectral feature reduction, spacial information is the focus. In [33], they focus on a solely spatial LSTM feature reduction approach. The work in [34] unifies spatial and spectral information by combining spectral LSTM and spatial LSTM networks for feature reduction. However, RNNs appear to be outperformed by other deep learning approaches like convolutional autoencoders [32,33,34].

A more popular deep learning technique in current literature that uses unsupervised approaches for feature reduction is convolutional autoencoders (CAEs). However, autoencoders are being used more recently to exploit the spatial information of the data rather than the spectral image. 2-Dimensional Convolutional Autoencoders (2D-CAEs) are developed to exploit the spatial information, while 3-Dimensional Convolutional Autoencoders (3D-CAEs) are developed to exploit both the spatial and spectral information available. Current research shows greater semantic segmentation accuracy among 3D-CAE results when incorporating spectral information rather than 2D-CAEs that only use spatial information [14, 32, 35, 36]. The work by [14] introduces an unsupervised spatial-spectral feature learning strategy for HSIs using a 3D-CAE. 3D-CAEs consist of 3D or element-wise operations only, 3D convolution, 3D pooling, and 3D batch normalization, to maximally explore spatial-spectral structure information for feature reduction, rather than spatial only. A companion 3D convolutional decoder network is also designed to reconstruct the input patterns to the 3D-CAE method for full unsupervised learning. Papers [32, 35, 36] create a more complex autoencoder architecture that uses variational autoencoders in their feature reduction structure. Variational autoencoders are similar to autoencoders except their latent space vector is calculated based on the mean and standard deviation of the previous layer. In traditional autoencoders, the latent space vector is simply a layer in the network. Furthermore, the work in [14, 32, 35, 36] rely heavily on spatial for their feature extraction and therefore uses patching as a preprocessing technique. The features are selected due to spacial and spectral instead of solely spectral information. In addition, these papers often use PCA as a preprocessing step before their deep learning feature reduction, making the success of their feature reduction method dependent on PCA. In this paper analysis of autoencoders is provided without patching to determine their effectiveness in selecting pertinent features in the spectral domain only.

In this paper, our main goal is to focus on semantic segmentation without patching, however, feature extraction is a necessary step in the process to improve accuracy and computational complexity. Therefore, we provide brief experiments on feature reduction techniques to determine the effectiveness of deep learning feature reduction techniques without patching. For this experiment, we choose autoencoders and compare them to PCA to determine if the success of deep learning feature reduction methods in HSI is dependent on spatial information.

Semantic segmentation

HybridSN model: Among several segmentation models [11, 14], to our knowledge the most successful CNN-based model for popular HSI datasets is HybridSN [6]. Instead of using exclusively 3D-CNNs and sacrificing runtime, or using exclusively 2D-CNNs and sacrificing accuracy, they propose a hybrid spectral CNN (HybridSN) for hyperspectral semantic segmentation [6].

HybridSN is a spectral-spatial 3D-CNN model followed by a spatial 2D-CNN. The 3D-CNN facilitates the joint spatial-spectral feature representation from a stack of spectral bands. The 2D-CNN on top of the 3D-CNN further learns more abstract-level spatial representation via neighborhood information. Moreover, the use of hybrid CNNs reduces the number of parameters in the model compared to the use of 3D-CNN alone. This creates a faster semantic segmentation technique while getting state-of-the-art accuracy scores.

However, once HybridSN relies heavily on neighborhood information for its semantic segmentation network, it is unknown if it is a strong spectral classifier. Some other networks, including CEU-Net, work better for classifying solely via spectral information, and/or with smaller patch sizes.

U-Net model: AeroRIT included a U-Net architecture that added complexity via a custom squeeze and excitation block. This is a common practice in the RGB image domain. It works by scaling network responses by modeling channel-wise attention weights, similar to the residual layer in ResNet [19]. The authors use this on large-scale hyperspectral data, however, with the number of channels (bands) that are in hyperspectral compared to RGB, the time complexity increases exponentially. In addition, AeroRIT did not include studies on other datasets, and neighborhood information is used in the form of patching as a preprocessing step.

Ensemble methods

Ensemble methods in HSI semantic segmentation are not new. However, most ensemble methods either do not focus on the bagging ensemble method or do not use CNN architectures.

In the paper, [37] an ensemble boosting method is performed to increase the overall accuracy of a rotation forest (RoF) classifier. However, ensemble method boosting is a costly method that requires multiple training sessions to perform. In addition, the RoF classifier has been shown to be an underperforming classifier compared to CNN techniques.

Deep neural network techniques in HSI semantic segmentation include [38] and [39]. The authors in [39] do a boosting ensemble method called Deep CNN Ensemble where they take the top performing models, HybridSN and ResNet, for their submodels. However, the boosting method increases the running time exponentially because of training multiple models on the same pixels. In addition, they use patching as a preprocessing step to use neighborhood information in their model, which leads to a further increase in running time.

In [38], a bagging ensemble method is used called EECNN, but this method applies a random sampling technique on the feature space to obtain the data subsets for each submodel. However, there are clustering models that cluster data in an unsupervised fashion and outperform random sampling. Moreover, random sampling can lead to too much class disparity between sub-models leading to a reduction in classification accuracy due to the small number of pixels for training. The work in [40] also uses a bagging ensemble method called TCNN-E-ILS. However, they do not have any intelligent way of discriminating what data goes to which network and they have a large number of ensemble classifiers. In this paper, we compare our ensemble method against EECNN and TCNN-E-ILS baselines.

In summary, the ensemble techniques in HSI semantic segmentation use the ensemble method to integrate multiple successful networks/techniques together so they can work together and get higher performance. However, as discussed earlier in this section, boosting is a very costly method that results in higher runtime. The papers that leverage the bagging ensemble technique to reduce computational complexity while increasing classification accuracy do not use an intelligent sampling system, such as clustering, to determine which samples are input to which sub-classifier [38, 40]. Therefore we propose the CEU-Net model to improve the bagging ensemble semantic segmentation technique by focusing on bagging with an intelligent sampling technique for subset division.

Clustering ensemble U-Net (CEU-Net)

One single U-Net is a strong architecture for semantic segmentation, however, without neighborhood information, it is difficult to get competitive accuracy versus models that use it. Our solution to this challenge is the proposed CEU-Net model. In machine learning, the ensemble technique is used to improve the accuracy and stability of learning models, especially for the generalization ability on complex datasets. The overview of our Clustering Ensemble U-Net model is demonstrated in Fig. 2. We propose to separate dissimilar pixels by performing unsupervised clustering on pixels via their spectral signature.

Previous ensemble works, like [38, 40], use a parameter called ensemble size, often denoted as T. In our work, once we use clustering as our intelligent method for determining the number of ensemble networks, we refer to this ensemble size as ’cluster number’ and denote it as k.

We formalize our model as follows:

Notation

For an HSI semantic segmentation problem, conditioned on an observed image ${\textbf{x}}\in {\mathbb {R}}^{N\times w}$ with N pixels and w spectral range. The objective is to learn the true posterior distribution $p({\textbf{y}}|{\textbf{x}})$, where ${\textbf{y}}\in \{1,\ldots m\}^N$, and $1,\ldots ,m$ are land type labels. Throughout the paper we use the notations below:

$\{x_i,y_i\}_{i=1}^N$: Training data where $x_i$ is a pixel and $y_i$ is label, $y_i\in \{1,\ldots ,m\}$.
Classifier F: A function mapping the input space ${\mathcal {X}}$ to a set of labels ${\mathcal {Y}}$, i.e. $F: {\mathcal {X}}\mapsto {\mathcal {Y}}$. In this paper, this map function is a U-Net model, $F=F^{U-Net}$.
${\mathcal {L}}_{\theta _j}(F_j({\textbf{x}}),y)$: Loss function with parameter set $\theta _j$. In the U-Net model, $F_j^{U-Net}$, the parameter set $\theta _j$ is the network’s weight matrix and offsets.
$\mathbf {\omega }$: Ensemble weight vector, $\mathbf {\omega }=[\omega _1,\ldots ,\omega _k]^T$, where k is the number of clusters.

Methodology

Training a classifier is performed by minimizing a loss function:

$$\begin{aligned} {\varvec{\Theta }}=\arg \min \limits _{\theta } {\mathcal {L}}_\theta (F({\textbf{x}}),y). \end{aligned}$$

(1)

In ensemble approach with k classifiers, $F_1({\textbf{x}}),\ldots F_k({\textbf{x}})$ and weight vector $\omega =[\omega _1,\ldots ,\omega _k]^T$, where $\omega _k \ge 0$, satisfying $\sum \limits _{j=1}^k \omega _j=1$, we find the optimal parameter set ${\varvec{\Theta }}$ as follows:

$$\begin{aligned} {\varvec{\Theta }}=\arg \min \limits _{\theta , k,\omega } \sum \limits _{j=1}^k \omega _j {\mathcal {L}}_{\theta _j}(F_j({\textbf{x}}),y), \end{aligned}$$

(2)

where $\theta _j$ is the parameter set of classifier $F_j$. Our proposed CEU-Net architecture extends (2) by utilizing clustering method: Let $C_1({\textbf{x}}),\ldots ,C_k({\textbf{x}})$ be the results of partitioning the training data $\{x_i,y_i\}_{i=1}^N$ with label sets $y_{C_1},\ldots ,y_{C_k}$, respectively, into k-clusters. CEU-Net optimizes parameter set $\theta$ by

$$\begin{aligned} {\varvec{\Theta }}=\arg \min \limits _{\theta , k,\omega } \sum \limits _{j=1}^k \omega _j {\mathcal {L}}_{\theta _j}(F_j(C_j({\textbf{x}})),y_{C_j}). \end{aligned}$$

(3)

Note that in CEU-Net model $F_j$ is a single U-Net model i.e. $F_j=F_j^{U-Net}$. In this work, we consider k as a hyperparameter and do not learn it under optimization problem 2. The pseudocode of our CEU-Net model is illustrated in Algorithm 1.

In the practical implementation of Algorithm 1 the value of weights $\omega =[\omega _1,\omega _2,\ldots ,\omega _k]^T$ is determined experimentally. We then take the training data and use an unsupervised clustering method that separates the pixels into k clusters. Both k and the clustering method will be tuned for each dataset. We then send the training pixels from each cluster into k separate sub-U-Nets for separate training in a supervised fashion with categorical cross entropy as the loss function. This way, each sub-U-Net becomes an expert in its given cluster and is trained in parallel with its corresponding pixel cluster. After each sub-U-Net is trained, we use the same clustering method to cluster the testing data into k clusters. Then we predict the labels for each cluster using the corresponding trained sub-U-Net for each cluster. Finally, the sub-U-Nets’ predicted labels are concatenated and we compare them to the ground truths for overall testing accuracy. Each sub-U-Net is identical to the single U-Net architecture using the configuration presented in Table 2.

Experimental results

The experimental results section is divided into two main parts, the first discusses the performance of CEU-Net in the context of the state-of-the-art semantic segmentation algorithm and illustrates key insights into the expected behavior of CEU-Net. The second part emphasizes the efficiency improvement of CEU-Net and hyper-parameter tuning.

We briefly outline the datasets, feature reduction, U-Net architecture, configuration, clustering methods, and metrics used across our experiments.

Datasets

In this experiment, we choose six datasets: Indian Pines, Salinas, Pavia University, Kennedy Space Center, Botswana, and Houston [5]. HSI data is infamously difficult to label due to the professional and time requirements necessary to label ground-types [41]. These well-known HSI datasets are well labeled and will provide good testing data for our semantic segmentation techniques.

These datasets while used profusely in the ML hyperspectral community, have quite a few flaws.

1
Easily Segmentable: Indian Pines and Salinas are farmland datasets while Pavia University and Houston are man-made structures. These land areas are quite easily separable spatially. This means grass is often next to other grass and tar is next to other tar etc. This makes training an easy task in just the pixel domain.
2
Not Representative of Most Land Areas: A vast majority of land in the world is forest regions and most hyperspectral remote sensing is done in these areas [26]. Therefore, the existing semantic segmentation models for HSI in remote sensing are not transferable to other landscapes due to the unavailability of labeled samples. Kennedy Space Center is the closest dataset to represent these more difficult datasets, however, once it is in a desert biome, the ground-truth labels are still easily spatially clustered.
3
Small Amount of Labeled Samples: Due to the difficulty of labeling HSI data, the amount of pixels in a dataset is often not a good description of its entire size. All the datasets we use here have a labeled pixel percentage under 50% as shown in Table 1. This could lead to over-fitting when presented with complex architectures.

Table 1 Information on the more popular datasets in HSI semantic segmentation used in this paper, SP stands for Spectral Bands [5]

CEU-Net: ensemble semantic segmentation of hyperspectral images using clustering

Abstract

Introduction

Patching

Feature reduction

Semantic segmentation

Ensemble methods

Related works

Neighborhood information

Feature reduction

Semantic segmentation

Ensemble methods

Clustering ensemble U-Net (CEU-Net)

Notation

Methodology

Experimental results

Datasets

Feature reduction

Single U-Net architecture

Experiment configurations

Clustering methods

Performance comparison

Feature reduction

Feature size

Cluster hyperparameters

Weight study

Semantic segmentation

Discussions

Patching

Feature reduction

Semantic segmentation

Conclusion

Availability of data and materials

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher's Note

Appendix

Appendix

Statistical analysis of feature extraction and semantic segmentation methods

Layer summary of 2D and 3D convolutional autoencoders

Class-wise classification results for no-patching CEU-Net

Rights and permissions

About this article

Cite this article

Share this article

Keywords