De-occlusion and recognition of frontal face images: a comparative study of multiple imputation methods

Mensah, Joseph Agyapong; Nortey, Ezekiel N. N.; Ocran, Eric; Iddi, Samuel; Asiedu, Louis

doi:10.1186/s40537-024-00925-6

Research
Open access
Published: 29 April 2024

De-occlusion and recognition of frontal face images: a comparative study of multiple imputation methods

Joseph Agyapong Mensah^1,2,
Ezekiel N. N. Nortey¹,
Eric Ocran¹,
Samuel Iddi^1,3 &
…
Louis Asiedu¹

Journal of Big Data volume 11, Article number: 60 (2024) Cite this article

502 Accesses
Metrics details

Abstract

Increasingly, automatic face recognition algorithms have become necessary with the development and extensive use of face recognition technology, particularly in the era of machine learning and artificial intelligence. However, the presence of unconstrained environmental conditions degrades the quality of acquired face images and may deteriorate the performance of many classical face recognition algorithms. Due to this backdrop, many researchers have given considerable attention to image restoration and enhancement mechanisms, but with minimal focus on occlusion-related and multiple-constrained problems. Although occlusion robust face recognition modules, via sparse representation have been explored, they require a large number of features to achieve correct computations and to maximize robustness to occlusions. Therefore, such an approach may become deficient in the presence of random occlusions of relatively moderate magnitude. This study assesses the robustness of Principal Component Analysis and Singular Value Decomposition using Discrete Wavelet Transformation for preprocessing and city block distance for classification (DWT-PCA/SVD-L1) face recognition module to image degradations due to random occlusions of varying magnitudes (10% and 20%) in test images acquired with varying expressions. Numerical evaluation of the performance of the DWT-PCA/SVD-L1 face recognition module showed that the use of the de-occluded faces for recognition enhanced significantly the performance of the study recognition module at each level (10% and 20%) of occlusion. The algorithm attained the highest recognition rate of 85.94% and 78.65% at 10% and 20% occlusions respectively, when the MICE de-occluded face images were used for recognition. With the exception of Entropy where MICE de-occluded face images attained the highest average value, the MICE and RegEM result in images of similar quality as measured by their Absolute mean brightness error (AMBE) and peak signal to noise ratio (PSNR). The study therefore recommends MICE as a suitable imputation mechanism for de-occlusion of face images acquired under varying expressions.

Introduction

The ability of humans to detect, identify and classify faces and other attributes of faces (gender, race, emotion) under variable conditions with some apparent efficiency derives from a network of brain regions (fusiform face area and anterior inferotemporal cortex) highly tuned to face information [1, 2]. The study of how machines are able to perform same tasks in real-time has attracted much attention among researchers due to rising security concerns and the fast-paced evolution of supportive technologies. The field of automatic face recognition concerns the use of machines to recognize the identity of persons or individuals from a database of stored face images.

Several studies [3,4,5,6,7,8] have shown that the performances of automatic face recognition modules are affected by the quality of the face images acquired and used for recognition. In most instances, image quality is problematic due to the acquisition of images from unconstrained environments. Image quality is often eroded due to imbalanced illumination effects, wild poses, noise, occlusions and varying facial expressions. When occlusions are the underlying cause of image degradation, the problem becomes more intractable as occlusions obscure salient features of the face needed for training recognition algorithms, thus creating larger intra-subject variability compared with inter-subject variability such that images of different individuals appear similar than images of the same individual [9]. This may be further compounded by the wide range or forms of occlusions which may include randomly occurring occlusions of different magnitudes. This not withstanding, only a few studies have focused on how to resolve occlusion-related problems in face recognition.

References [4, 10, 11] have shown that enhancing the quality of acquired images prior to recognition improves the performance of face recognition algorithms. However, choosing the right image enhancement mechanism is often a challenging task. This is because the choice of enhancement mechanism is contingent on knowledge of the underlying cause of image degradation which, in most instances, is limited. Also, a combination of more than one enhancement mechanism may be required to attain optimal results, but specifying the right combination of enhancement mechanisms is challenging and still continues to be a gap in literature that requires more research. To deal with occlusions in face images, some researchers have advocated for the use of occlusion-invariant features or the non-occluded portions (sparse representation) of the face for recognition, many of which have attained remarkable successes [12, 13]. However, these approaches sometimes become deficient for some classes of occlusions, especially when there is significant loss of facial features or pixels. De-occlusion techniques, therefore, become indispensable in these situations. These methods may leverage on the inherent topology of the face [14] to reconstruct missing facial components or operate from the premise that missing facial pixels can be inferred from the observed facial pixels in order to restore the occlusions in face images [15]. A key stage in the de-occlusion process is the choice of de-occlusion mechanism. This is very crucial in automatic face recognition because inappropriate handling of occlusions could further degrade the quality of images and may lead to significant drop in the performance of an otherwise well-performing face recognition module. Chan and Shen [16] considered the problem of image restoration using a diffusion-based approach. Diffusion-based methods are considered optimal in filling small patches in an image [17]. The use of exemplar-based methods have also been explored in the literature [18]. The exemplar-based methods are known to be optimal for filling larger texture areas. However, the approach sometimes decreases the connectivity of structure and clearness of texture while increasing the time complexity [19]. Some authors leveraged on the advantages of both diffusion based method and texture synthesis technique by first dividing the image into structure and texture layers, then using diffusion-based method to in-paint the structure layer and texture synthesis technique to in-paint texture layer. According to [19], this approach helps overcome the smooth effect disadvantage brought from the diffusion-based in-painting algorithm, but it is still very difficult to recover the larger missing structures. Refer to [20,21,22] for more information on occlusion-aware systems.

Occlusions in face images can be classified as a missing-data problem. Therefore, de-occlusion as used in this study refers to any process that attempts to restore missing pixels in face images. In general missing value problems, the use of multiple imputation methods has been widely explored [4]. Such methods aim at finding plausible values for the missing data and are known to give unbiased results and can also account for the uncertainty in the imputations. This gives multiple imputation methods an edge over single imputation methods [23]. Despite the aforementioned advantages of multiple imputation methods, some researchers object to their use in handling missing values in datasets, arguing that imputation methods only synthesize numerical (non-real) values for the missing data [24]. Other researchers [25] on the other hand assert that imputation methods aim not to re-create the missing values in a dataset, but are a means of handling missing data in order to arrive at the proper statistical inferences under a given missingness mechanism. The Multiple Imputation by Chain Equations (MICE) [26], MissForest [27] and the Expectation Maximization (EM)-based methods (Regularized Expectation- Maximization (RegEM)) [28] are among the most successful contemporary multiple imputation methods in practice. These methods impute missing values in datasets based on multiple regression modules. The MICE uses the conditional distributions of variables with missing data and is based on Markov Chain Monte Carlo (MCMC) and attains imputations via Gibbs sampling; the MissForest draws imputations uses via random forests models while the RegEM is a likelihood-based method [4].

In this study, we assess the robustness of DWT-PCA/SVD-L1 face recognition module to image degradations due to random occlusions of varying magnitudes (10% and 20%) in test images acquired with varying expressions. The study also helps identify the appropriate image restoration mechanism when dealing with moderately low levels of occlusions in face images acquired under varying expressions.

The rest of the paper is organised as follows: Section Methods and materials discusses the data acquisition, the mathematical underpins of the adopted imputation mechanisms, recognition modules and their implementation. In section Results and discussion, we evaluate the recognition modules under the adopted imputation mechanisms and conclude by summarising the overall achievements of the study with some recommendations and directions for future developments in section Conclusion and recommendation.

Materials and methods

Data acquisition

The study used two standard face image datasets to benchmark the performance of the study algorithm.

Dataset 1 The Japanese Female Expression (JAFFE) dataset is homogenous in terms of race and gender. It contains face images of ten (10) female Japanese subjects captured along seven universally accepted principal emotions (neutral, angry, disgust, fear, sad, surprise and happy).

Dataset 2 The Cohn Kanade AU-Coded Facial Expression (CKFE) dataset is heterogeneous with regard to race and gender. It contains face images of twenty-two (22) subjects of mixed race and gender also captured along the above seven universally accepted principal emotions.

The neutral expressions of subjects in the two datasets (totaling 32) were captured into the train-image database for training the study algorithm after face detection and cropping. Figure 1 depicts the face images of subjects in the train-image database.

All the other face images of subjects acquired under varying expressions (sad, happy, disgust, surprise, angry and fear) in each dataset were synthetically occluded (10% and 20% missingness or degradation) after face detection and cropping. Figure 2 (test-image database 1) and Fig. 3 (test-image database 2) contain expression-variant face images with 10% and 20% occlusions respectively.

The multiple-constrained (occlusions, varying expressions) face images in Figs. 2 and 3 were subsequently reconstructed using the MICE, MissForest and RegEM imputation techniques respectively and captured into separate test-image databases.

De-occlusion via imputation methods

Reconstructive methods seek to restore missing components or pixel information for the purposes of completeness, good visual effects, as well as providing relevant features for subsequent feature extraction.

Multiple imputation methods have been successfully used to deal with missing data problems in many applications. In this work, we use the MICE, MissForest and RegEM imputation methods to de-occlude occlusions in test faces based on the assumption that such methods can find plausible pixel values to replace missing components or pixels using information from the existing pixels (non-occluded portions of the face).

Imputation algorithms

Let $Y_{n \times p} = (Y_1, Y_2,\dots , Y_p)$ be the image matrix of an occluded face image. For each column (variable) $Y_j, j\in \{1,2,\dots ,p\}$ that contains missing pixels, Y is divided into four parts indicated below: $Y_{n\times p} = \left[ \begin{array}{cccccccccc} y_{1,1} &{} y_{1,2} &{} \dots &{} y_{1,j}^{(obs)} &{} \dots &{} y_{1,p-1} &{} y_{1,p}\\ \\ y_{2,1} &{} y_{2,2} &{} \dots &{} y_{2,j}^{(obs)} &{} \dots &{} y_{2,p-1} &{} y_{2,p}\\ \vdots &{} \vdots &{} &{} \vdots &{} &{} \vdots &{} \vdots \\ \\ y_{n-1,1} &{} y_{n-1,2} &{} \dots &{} y_{n-1,j}^{(mis)} &{} \dots &{} y_{n-1,p-1} &{} y_{n-1,p} \\ \\ y_{n,1} &{} y_{n,2} &{} \dots &{} y_{n,j}^{(mis)} &{} \dots &{} y_{n,p-1} &{} y_{n,p}\\ \end{array}\right] ,$ where

$Y_j^{(obs)}$ $\longrightarrow$ Observed pixels of $Y_j$ and $k_j^{obs} \in \{1,2,\dots ,n\}$ is the corresponding row index set of the observed pixels.
$Y_j^{(mis)}$ $\longrightarrow$ Unobserved pixels of $Y_j$ and $k_j^{mis} \in \{1,2,\dots ,n\}$ is the corresponding row index set of the missing pixels. Note that, $k_j^{mis} = \{1,2,\dots ,n\}- \ k_j^{obs}$.
$Y_{-j}^{k(j,obs)}$ $\longrightarrow$ The part of all other columns other than the jth column $Y_{j}$ with row index set same as $k_j^{obs}$.
$Y_{-j}^{k(j,mis)}$ $\longrightarrow$ The part of all other columns other than $Y_{j}$ with row index set same as $k_j^{mis}$.

Multiple imputation with chain equations (MICE)

Given the feature matrix of an occluded face image, the MICE algorithm imputes missing pixels using univariate conditional distributions for each variable feature given all other variables [26]. It is assumed that the face image feature matrix has a full multivariate distribution from which the conditional distribution of each feature is obtained, although such distribution may not be explicitly specified [29] as long as the distribution of each feature is stated, or may not exist [30, 31].

The MICE algorithm is an iterative method which imputes missing values based on the fitted conditional (regression) models until a stopping/termination criterion is met and uses the Gibbs sampler to generate multiple imputations.

MissForest

The MissForest [27] is a non-parametric multiple imputation technique based on random forests [32]. Unlike MICE, the MissForest algorithm specifies a random forest model for each variable with missing pixels and uses the other variables to predict the missing values. As in the case of MICE, this process is iteratively done for all missing pixels until a stopping criterion is met. The advantage of using random forest models is that they provide much flexibility, address complex non-linear interactions [27], require little tuning and provide an internally cross-validated error estimates [33].

Regularized expectation-maximization (RegEM)

The expectation-maximization (EM) imputation algorithm is an iterative optimization technique for estimating the parameter set ($\vartheta$) of a probability model with incomplete data based on the notion of maximum [34]. Parameter estimation, via EM-based methods is done first by estimating the parameters of the data distribution through the existing data and then imputing the missing data based on the estimated distribution [35]. The EM algorithm encompasses two steps; obtaining a probability distribution over all possible complete versions of the incomplete data given the current parameter estimate (E-step) and re-estimating the underlying parameter set using these completions (M-step). In practice however, one need not specify this probability distribution explicitly, but rather need only compute expected sufficient statistics over these completions. The EM algorithm attempts to find the parameter set $\vartheta ^*$ that maximizes the log-likelihood of the observed pixel intensities by casting it as a prediction problem [36].

Assuming that the distribution of pixels is multivariate normal with parameter set $\vartheta =\left[ {\mu ,\Sigma }\right]$, then the missing pixel values can be imputed using a regression model. Although the normality assumption is plausible in many application areas, it can be replaced with other more complex densities, such as mixture of simplex ones [31].

In the presence of occlusions, the feature matrix and associated design matrix become ill-conditioned (as a result of missing pixel values). As a result, ordinary regression estimates (such as least squares) and standard errors could be highly unreliable and can affect the stability of such models as well as the quality of predictions amidst multi-collinearity [37]. Under these circumstance a penalized regression method (ridge regression) is recommended for ill-conditioned design matrices instead of least squares estimators. Specifically, given a linear regression model

$$Y =X\beta + \varepsilon ;$$

(1)

$Y_{n\times 1}$ vector of observations; $X_{n\times p}$ design matrix of rank p, $\beta _{p\times 1}$ vector of unknown parameters and $\varepsilon _{n\times 1}$ vector of unobserved errors, the ridge regression estimate ${\hat{\beta }}$ of $\beta$ is

$${\hat{\beta }} = (X^{T}X + \gamma {\mathbb {I}})^{T}X^{T}{Y},$$

(2)

where $\gamma$ is the ridge parameter to be selected and ${\mathbb {I}}$ is the $n\times n$ identity matrix. This can be obtained as a the solution to the least squares problem

$${\hat{\beta }} = \mathop {\textrm{argmin}}\limits _{\beta \in {\mathbb {R}}^{p}} ||\beta ||<\tau \left[ ({Y}-X \beta )({Y}-X \beta )^{T}\right] ,$$

(3)

where $\tau \ge 0$.

The visual quality of the de-occluded image depends on the regularization parameter $\gamma$. The method of generalized cross validation has been shown by [38] to give a better estimate of $\gamma$ compared to the method of maximum likelihood. In generalized cross-validation, the estimate $\dot{\gamma }$ of $\gamma$ is obtained as a minimizer of the generalized cross-validation function

$$V(\dot{\gamma }) = \dfrac{\frac{1}{n}||{\mathbb {I}} - A(\gamma )Y||^{2}}{\frac{1}{n}Trace(I-A(\gamma ))^{2}},$$

(4)

where $A(\gamma ) = X(X^{T}X + n\gamma {\mathbb {I}})^{-1}X^{T}$.

According to [38], using the generalized cross-validation approach to estimate $\gamma$ does not require knowledge of the noise variance $\sigma ^2$, making it a natural choice for solving regression-like problems where the design matrix is ill-posed since in such cases there is no way of estimating $\sigma ^2$ from the data. The regularized EM using multiple ridge regression is carried out, starting with initial estimates of the mean $\mu$ and covariance matrix $\Sigma$, as follows:

For each row of the feature matrix with missing values, obtain the multiple ridge regression parameters by regressing columns with missing pixel values on the columns with observed pixel values using the mean and covariance matrix.
Fill in the missing pixel values with their conditional expectation values, where the conditional expectation values are obtained as the product of the available pixel values and the estimated ridge regression coefficients $\hat{\beta _{r}}$.
Re-estimate the mean and covariance matrix, where the mean is obtained as the mean of the completed feature matrix and the covariance matrix is obtained as the sum of the covariance matrix of the feature matrix and an estimate of the conditional covariance matrix of the imputation error.

The test-image database 3 contains the MICE, MissForest and RegEM reconstructed images of test-image database 1.

Fig. 4 shows the reconstructed face images for some subjects under 10% degradation for the JAFFE and CKFE data sets.

The test-image database 4 contains the MICE, MissForest and RegEM reconstructed images of test-image database 2.

Fig. 5 shows the reconstructed face images for some subjects under 20% degradation for the JAFFE and CKFE data sets.

Research design

When face images are sent to the recognition module, they are preprocessed through mean centering and Discrete Wavelet transformation (DWT) mechanisms. The train images are the first to be preprocessed this way. Afterward, the preprocessed images are sent to the feature extraction unit where the PCA/SVD algorithm extracts discriminative features. The extracted features are then stored in memory as a created knowledge for recognition.

As mentioned before, four test datasets were used in this study. It is worthy to note that only one of the adopted imputation mechanisms is used for de-occlusion at a time in a database before recognition. The test images are also preprocessed using mean centering and Discrete Wavelet transformation (DWT) mechanisms and their discriminative features are also extracted using the PCA/SVD algorithm for recognition.

The discriminative features are passed on to the classifier/recognition unit where they are matched with the stored knowledge created from the train images where a closer match is defined in terms of minimum recognition distance. We note that only one test image is passed to the recognition module along with the train images at a time. The design of the study recognition module is shown in Fig. 6.

Preprocessing

Preprocessing is a key stage in digital image processing. The importance of preprocessing has been underscored by several research works [11, 39,40,41,42].

The goal of image enhancement is to accentuate, via denoising mechanisms, the defining features of the image by improving the image quality. Image enhancement can be carried out in the spatial domain or in a transformed domain of the image. The latter, particularly, has evolved over the years to effectively deal with image denoising and enhance edge features [43]. In this study, we adopted mean centering and the Discrete Wavelet Transform as preprocessing mechanisms.

Discrete wavelet transform (DWT)

The discrete wavelet transform (DWT) is a transform-domain-based image denoising method with multi-resolution property that allows the analysis of a signal (image) in different frequency resolutions [44]. This is particularly useful because some features of a face or signal have low frequency components while others have high frequency components [45]. According to [46], the use of wavelets gives superior performance in image denoising due to its multi-resolution property. DWT provides both spatial and temporal information about a given signal. As such, DWT-based image denoising is preferred to other transform-domain denoising mechanisms such as Fourier transforms which only give the spatial information of a signal [6].

Denoising an image based on DWT consists of decomposing the face image, noise filtering and image reconstruction. DWT decomposes an image into two sets of coefficients namely the approximation coefficients and detail coefficients. The decomposition is done by passing the image through a series of filters. First, the image is passed through a low-pass filter resulting in the approximation coefficients (LL-sub-band). The image is also decomposed simultaneously using a high-pass filter resulting in the detail coefficients (Horizontal coefficients (LH-sub-band), Vertical coefficients (HL-sub-band) and diagonal coefficients (HH-sub-band) [47]. These sub-bands provide different resolutions of the image, with the LL sub-band being the low resolution form of the image and the remaining sub-bands being the high-resolution forms of the image. The LL-sub-band contains global information of the image and is less prone to noise while the remaining sub-bands contain local information such as eyes, nose and mouth [6].

DWT is the most stable invertible transform in transforming signals in diverse domains. Its efficiency in denoising signals is because of its multiresolution property which allows the analysis of a signal at different resolutions or scales, making it easier to identify patterns and anomalies in large datasets [48]. The wavelet transform involves the displacement of basic wavelet functions called mother wavelets [49]. Notable among them are the Haar, Daubechies, Coiflet, symlets and Morlet wavelets. In this study we chose the Haar wavelet as the mother wavelet and performed a one-level decomposition of the face images. This is because it is simple and orthogonal (rigid in transformation to preserve distance in the original image) in nature. After the one-level decomposition, a Gaussian filter is applied to normalize illumination and the image is reconstructed via inverse discrete wavelet transform. Figure 7 shows the DWT cycle using the Haar wavelet.

Mean centering

Given a matrix of face images whose columns are the vectorized forms of the face images of subjects, its corresponding mean-centered matrix of face images is obtained by subtracting the mean intensity value of each column from each of their respective intensity values. The resultant mean-centered matrix is, thus, of zero mean.

Mean-centering is an integral part of eigenvalue analysis which ensures that the principal components are proportional to the variance of the input data matrix with the first principal component reflecting the maximum variance, which otherwise would reflect the mean instead of the greatest variance [50].

Feature extraction

Dealing with high dimensional datasets such as the human face is computationally expensive. Besides, with the presence of a large number of features, a learning model tends to overfit and hence under-perform [51]. Therefore, feature extraction forms an integral part of every face recognition module. During feature extraction, the dimensionality of the otherwise high-dimensional face images is reduced. This is because only the relevant features of each face are selected for classification and redundancy (noise) is removed.

Principal Component Analysis (PCA) is one such effective dimensionality reduction technique widely used in signal and image processing [14]. According to [52], PCA reduces the dimensionality of datasets whilst maintaining as much variability as possible and gives the best possible representation of a p-dimensional dataset in q dimensions $(q < p)$ by maximizing variance (statistical information) in q dimensions.

In PCA-based face feature extraction, discriminative facial features are obtained by projecting the given face images onto a feature space spanned by the principal components, which are the eigenvectors of the variance-covariance matrix of the faces data matrix [53]. This approach to feature extraction in face recognition is efficient due to its ease of implementation and low processing steps. In addition, no knowledge of geometry or any specific feature of face is required [54].

Several studies [55] have shown that the use of PCA for dimensionality reduction and feature extraction competes favourably with (and may outperform) other dimensionality reduction techniques, including independent component analysis [56] and linear discriminant analysis [57]. Based on the above merits, we adopted PCA for feature extraction.

Assessment of the quality of de-occluded images

An attempt to resolve the challenges posed by occlusions to face recognition using multiple imputation methods may induce other artifacts or further degradations in the resultant faces. Image quality assessment is, therefore, crucial in this regard.

Image quality metrics are used to quantify the quality of the de-occluded images, and hence determine the best multiple imputation technique used to carry out de-occlusion. Here, we refer to the unoccluded images as clean images. The Discrete Shannon Entropy (E), Absolute mean brightness error (AMBE), Peak Signal-to-Noise Ratio (PSNR), and Contrast (C) were used in this context.

Entropy

The entropy of a face image characterizes the average level of information inherent in the face image. A relatively higher entropy after de-occlusion signifies better image quality and a good source of information that could be leveraged to enhance the classification performance of face recognition modules, given the right choice of feature selection scheme. If the pixel intensity values in an image are seen as discretely sampled from the underlying image probability density P, then the discrete Shannon entropy with base 2 of the jth image is given by

$$E(I_j) = - \sum _{k=0}^{L-1} P_j(k)\text{ log}_2 (P_j(k)),$$

(5)

where $P_j (k)$ is the probability of occurrence of the kth pixel intensity value and L is the number of grey levels [49].

Absolute mean brightness error (AMBE)

The absolute mean brightness error quantifies the brightness preservation property of the multiple imputation schemes in carrying out de-occlusion. For the jth image $(I_j)$ the AMBE is evaluated as the absolute difference between the mean brightness of the clean image and its respective de-occluded image $({\tilde{I}}_j)$ and is given by

$$AMBE(I_j, {\tilde{I}}_j) = |m(I_j)-m(\tilde{I}_j)|,$$

(6)

where $m(I_j)$ and $m(\tilde{I}_j)$ represent the mean brightness of the clean image and de-occluded images respectively. The multiple imputation method that gives the least (average) AMBE values has the highest brightness preservation and thus, conserves the brightness of the “clean” images.

Peak signal-to-noise ratio (PSNR)

The PSNR is computed as the ratio of the highest pixel value of the image to the noise (Mean Square Error) that affects the quality of the pixels, expressed in logarithmic decibel scale. For the jth image, the PSNR is given by

$$PSNR = 10\text{ log}_{10} \dfrac{Max_j}{MSE},$$

(7)

where $Max_j$ is the maximum possible pixel value and MSE is the mean square error. In the absence of noise, a de-occluded image and its respective “clean” image are identical, therefore the MSE is zero and the corresponding PSNR value is infinite. When noise is introduced as a result of the de-occlusion processes, the multiple imputation de-occlusion method achieving the highest average PSNR values is preferred since it results in the best quality de-occluded images.

Contrast

The contrast of an image refers to the spread in the distribution of its pixel intensity values, which is measured by the range of pixel intensity levels. If the minimum and maximum intensity values are far apart, the image has good contrast, otherwise it has poor contrast. The standard deviation of intensity value is a natural characterization of an image contrast and this is used in this study.

For the jth image, the standard deviation of pixel intensity values is given by

$$S(I_j) = \sqrt{\sum _{k=0}^{L-1}\left( k-m(I_j)\right) ^2 P_j(k)},$$

(8)

where $P_j (k)$ is the probability of occurrence of the kth pixel intensity value and L is the number of grey levels.

Results and discussion

Assessment of image quality after using the various de-occlusion mechanisms

Figures 8, 9, 10 and 11 show the entropy, AMBE, PSNR and Contrast of the occluded face images after reconstruction with MICE, MissForest and RegEM imputation algorithms respectively.

From Fig. 8, the median entropy value is highest for the MICE de-occluded faces, followed by the RegEM, with the MissForest attaining the least median entropy value.

From Fig. 9, the MissForest has the least brightness preserving property compared with the MICE and RegEM, which have relatively lower and similar brightness conservation property with a few images having brightness significantly different from their respective clean images.

It can be seen from Fig. 10 that when the MissForest is used for de-occlusion, it results in images with the most noise since its associated PSNR values are relatively lower compared to the MICE and RegEM. However, the distribution of PSNR values for MICE and RegEM are relatively similar on average, although that of MICE shows more variability.

Results from Fig. 11 show that the MICE and RegEM produce images with relatively lower contrast. Nonetheless, the images obtained as a result of MissForest de-occlusion have relatively higher contrast compared with their respective clean images.

Assessment of the performance of the study algorithm under the various de-occlusion mechanisms

Sample results of matching the MissForest, MICE, and RegEM de-occluded test face images (of some subjects with happy facial expressions) to the train image database using the study algorithm are presented in Figs. 12 and 13.

Figure 12 shows the decisions and recognition distances for six (6) subjects from the JAFFE database when de-occlusion was carried out at 20% random missingness. It is seen that all the six subjects were correctly matched for the MICE and RegEM de-occluded test images. However, there were two (2) mismatches for the MissForest de-occluded test faces.

Figure 13 shows the decisions and recognition distances for six (6) subjects from the CKFE database when de-occlusion was carried out at 20% random missingness. It is seen that, there were 3 mismatches each when the MissForest and RegEM de-occluded test images were used for recognition but only one mismatch when the MICE de-occluded test faces were used for recognition.

Table 1 shows the average recognition rates of the study algorithm when using the occluded and de-occluded test face at 10% and 20% rates respectively.

It can be seen from Table 1 that at a 10% occlusion rate, using the corresponding occluded and the MissForest, MICE and RegEM de-occluded images as test face images, the average recognition rates of the study algorithm (DWT-PCA/SVS-L1) were 41.40%, 68.75%, 85.94% and 84.44% respectively. Also, at 20% occlusion rate, the average recognition rates of the study algorithm using the corresponding occluded, and MissForest, MICE and RegEM de-occluded images, as test face images were 23.63%, 54.69%, 78.65% and 76.54% respectively. At 10% and 20% degradation levels, the DWT-PCA/SVD-L1 recognition algorithm performed abysmally poor obtaining average recognition rates of 41.40% and 23.63% respectively when the occluded images were used as test images. The decline in the performance observed here was due to the increased degree of occlusion (from 10% to 20%).

Table 1 Average recognition rates of the study algorithm using the de-occlusion mechanisms at 10% and 20% degradation levels

Full size table

It is also evident from Table 1 that, the de-occlusion mechanisms (MICE, Missforest, RegEM) enhanced the performance of the recognition algorithm at each level of occlusion. Notably, the MICE de-occluded test face images gave the highest recognition rate (85.94% and 78.65% at 10% and 20% degradation levels respectively), followed closely by the RegEM de-occluded test face images (84.44% and 76.54% at 10% and 20% degradation levels respectively), with the study algorithm attaining the least average recognition rates (68.75% and 54.69% at 10% and 20% degradation levels respectively) when the MissForest de-occluded test face images were used for recognition. However, there was a moderate decline in the performance of the study algorithm with increasing level of occlusion as well as corresponding de-occlusions. These results are consistent with the works of [6, 7].

Conclusion and recommendation

In this study, we performed a comparison of three (3) multiple imputation methods (Multiple Imputation with Chain Equations (MICE), MissForest and Regularized expectation-maximization (RegEM)) as de-occlusion mechanisms in dealing with moderately low levels of occlusions in test face images, from two standard face image datasets (Japanese Female Facial Expressions (JAFFE) and Cohn-Kanade Facial Expression (CKFE)) on the basis of their effect on image quality and the performance of a face recognition module (DWT-PCA/SVD-L1). In assessing the image quality, both the MICE and RegEM methods outperformed the MissForest imputation methods when the Entropy, PSNR and AMBE were used as the evaluation criteria. Except for the Entropy where MICE attained the highest average value, the MICE and RegEM resulted in images of similar quality as measured by their AMBE and PSNR. None of the methods produced images of similar contrasts as their respective clean images. Particularly, the MissForest resulted in over-enhanced contrast images while the MICE and RegEM de-occlusion mechanisms produced relatively lower contrast images when compared to the clean images. This suggest that, the MICE and RegEM result in images with better details and better brightness conservation. The use of the multiple imputation-based test images improved the performance of the study recognition module. Results from the numerical evaluation showed that the study algorithm achieved the highest average recognition rate when de-occlusion was done using MICE (85.94% and 78.65% at 10% and 20% occlusion levels respectively), closely followed by the RegEM (85.94% and 78.65% at 10% and 20% occlusion levels respectively), with the study algorithm attaining the least average recognition rates (68.75% and 54.69% at 10% and 20% occlusion levels respectively) when the MissForest de-occluded test face images were used for recognition. These results were consistent across the 10% and 20% occlusion levels. Similar findings were obtained by [4, 6, 7] except that their works adopted different enhancement mechanisms for preprocessing the face images. Their underlying occlusion constraints were also acquired under different degrees of missingness and they did not assess the quality of the images after de-occlusion. The performance of the study recognition module (regardless of the multiple imputation method used for de-occlusion) appeared to be dependent on the level of occlusions. Particularly, the multiple imputation methods appear not to be robust to higher levels of occlusion. Despite this limitation, the study provides great insight into the use of multiple imputation methods in dealing with occlusions in the field of face recognition and its related areas.

Future work will focus on enhancing the recognition rate of the study algorithm, when multiple imputation-based de-occluded test face images are used for recognition, as well as improving the robustness of the study algorithm to higher levels of occlusions.

Availability of data and materials

The image data used in this manuscript are from previously published manuscript. The processed data are available upon request from the corresponding author.

References

Ghuman AS, Brunet NM, Li Y, Konecky RO, Pyles JA, Walls SA, Destefino V, Wang W, Richardson RM. Dynamic encoding of face information in the human fusiform gyrus. Nat Commun. 2014;5(1):1–10.
Article Google Scholar
Kriegeskorte N, Formisano E, Sorger B, Goebel R. Individual faces elicit distinct response patterns in human anterior temporal cortex. Proc Natl Acad Sci. 2007;104(51):20600–5.
Article Google Scholar
Abate AF, Cimmino L, Mocanu B-C, Narducci F, Pop F. The limitations for expression recognition in computer vision introduced by facial masks. Multimedia Tools and Applications. 2023;82(8):11305–19.
Article Google Scholar
Mensah JA, Ocran E, Asiedu L. On multiple imputation-based reconstruction of degraded faces and recognition in multiple constrained environments. Sci Afr. 2023;22: e01964.
Google Scholar
Chen G, Peng J, Wang L, Yuan H, Huang Y. Feature constraint reinforcement based age estimation. Multimedia Tools Appl. 2023;82(11):17033–54.
Article Google Scholar
Mensah JA, Asiedu L, Mettle FO, Iddi S. Assessing the performance of dwt-pca/svd face recognition algorithm under multiple constraints. J Appl Math. 2021;2021:1–2.
Article Google Scholar
Ayiah-Mensah D, Asiedu L, Mettle FO, Minkah R. Recognition of augmented frontal face images using fft-pca/svd algorithm. Appl Comput Intell Soft Comput. 2021;2021:1–9.
Google Scholar
Liu X, Pedersen M, Charrier C, Bours P. Can image quality enhancement methods improve the performance of biometric systems for degraded face images? In: 2018 Colour and Visual Computing Symposium (CVCS). IEEE;2018:1–5.
Asiedu L, Mensah JA, Ayiah-Mensah F, Mettle FO. Assessing the effect of data augmentation on occluded frontal faces using dwt-pca/svd recognition algorithm. Adv Multimedia. 2021;2021:1.
Article Google Scholar
Kamenetsky D, Yiu SY, Hole M. Image enhancement for face recognition in adverse environments. In: 2018 Digital Image Computing: Techniques and Applications (DICTA). IEEE;2018:1–6.
Rana ME, Zadeh AA, Alqurneh AMM. Use of image enhancement techniques for improving real time face recognition efficiency on wearable gadgets. J Eng Sci Techno. 2017;12(1):155–67.
Google Scholar
Oh HJ, Lee KM, Lee SU. Occlusion invariant face recognition using selective local non-negative matrix factorization basis images. Image Vision Comput. 2008;26(11):1515–23.
Article Google Scholar
Priya GN, Banu RW. Occlusion invariant face recognition using mean based weight matrix and support vector machine. Sadhana. 2014;39(2):303–15.
Article Google Scholar
Asiedu L, Mettle FO, Mensah JA. Recognition of reconstructed frontal face images using fft-pca/svd algorithm. J Appl Math. 2020;2020;1–8.
Article Google Scholar
Zhang N, Ji H, Liu L, Wang G. Exemplar-based image inpainting using angle-aware patch matching. EURASIP J Image Video Process. 2019;2019(1):1–13.
Article Google Scholar
Chan TF, Shen J. Nontexture inpainting by curvature-driven diffusions. J visual Commun Image Represent. 2001;12(4):436–49.
Article Google Scholar
A. Criminisi, P. Perez, K. Toyama, Object removal by exemplar-based inpainting, in: 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings., Vol. 2, IEEE, 2003, pp. II–II.
Zhang J, Zhao D, Gao W. Group-based sparse representation for image restoration. IEEE Trans Image Process. 2014;23(8):3336–51.
Article MathSciNet Google Scholar
Fan Q, Zhang L, Serikawa S. Improvement of patch selection in exemplar-based image inpainting. J Inst Ind Appl Eng. 2015;3(4):197–202.
Google Scholar
Ke L, Tai Y-W, Tang C-K, Occlusion-aware instance segmentation via bilayer network architectures. IEEE Trans Pattern Anal Mach Intell. 2023.
Jia M, Sun Y, Zhai Y, Cheng X, Yang Y, Li Y. Semi-attention partition for occluded person re-identification. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2023;37:998–1006.
Xu C, Makihara Y, Li X, Yagi Y. Occlusion-aware human mesh model-based gait recognition. IEEE Trans Inf forensics Secur. 2023;18:1309–21.
Article Google Scholar
Penone C, Davidson AD, Shoemaker KT, Di Marco M, Rondinini C, Brooks TM, Young BE, Graham CH, Costa GC. Imputation of missing data in life-history trait datasets: which approach performs the best? Methods Ecol Evol. 2014;5(9):961–70.
Article Google Scholar
Alruhaymi AZ, Kim CJ. Why can multiple imputations and how (mice) algorithm work? Open J Stat. 2021;11(5):759–77.
Article Google Scholar
Kontopantelis E, White IR, Sperrin M, Buchan I. Outcome-sensitive multiple imputation: a simulation study. BMC Med Res Methodol. 2017;17(1):1–13.
Article Google Scholar
Van Buuren S, Oudshoorn K. Flexible multivariate imputation by MICE. Leiden: TNO; 1999.
Google Scholar
Stekhoven DJ, Bühlmann P. Missforestnon-parametric missing value imputation for mixed-type data. Bioinformatics. 2012;28(1):112–8.
Article Google Scholar
Li H, Zhang K, Jiang T. The regularized em algorithm. In: AAAI, 2005; 807–812.
Van Buuren S. Multiple imputation of discrete and continuous data by fully conditional specification. Stat Methods Med Res. 2007;16(3):219–42.
Article MathSciNet Google Scholar
Van Buuren S, Brand JP, Groothuis-Oudshoorn CG, Rubin DB. Fully conditional specification in multivariate imputation. J Stat Comput Simul. 2006;76(12):1049–64.
Article MathSciNet Google Scholar
Quintero FOL, Contreras-Reyes JE. Estimation for finite mixture of simplex models: applications to biomedical data. Stat Model. 2018;18(2):129–48.
Article MathSciNet Google Scholar
Breiman L. Random forests. Mach Learn. 2001;45(1):5–32.
Article Google Scholar
Waljee AK, Mukherjee A, Singal AG, Zhang Y, Warren J, Balis U, Marrero J, Zhu J, Higgins PD. Comparison of imputation methods for missing laboratory data in medicine. BMJ Open. 2013;3(8): e002847.
Article Google Scholar
Dempter A. Maximum likelihood from incomplete data via the em algorithm. J Royal Stat Soc. 1977;39:1–22.
Article MathSciNet Google Scholar
Ke J, Zhang S, Yang H, Chen X. Pca-based missing information imputation for real-time crash likelihood prediction under imbalanced data. Transportmetrica A: Transp Sci. 2019;15(2):872–95.
Article Google Scholar
Hinton GE, Sabour S, Frosst N. Matrix capsules with em routing. In: International conference on learning representations, 2018.
Oufdou H, Bellanger L, Bergam A, El Ghaziri A, Khomsi K, Qannari EM, et al. Comparison of different regularized and shrinkage regression methods to predict daily tropospheric ozone concentration in the grand casablanca area. Adv Pure Math. 2018;8(10):793.
Article Google Scholar
Golub GH, Heath M, Wahba G. Generalized cross-validation as a method for choosing a good ridge parameter. Technometrics. 1979;21(2):215–23.
Article MathSciNet Google Scholar
Li W, Peng M, Wang Q. Improved pca method for sensor fault detection and isolation in a nuclear power plant. Nuclear Eng Technol. 2019;51(1):146–54.
Article Google Scholar
Gross R, Brajovic V, An image preprocessing algorithm for illumination invariant face recognition. In: International Conference on Audio-and Video-Based Biometric Person Authentication, Springer, 2003;10–18.
Shan S, Gao W, Cao B, Zhao D. Illumination normalization for robust face recognition against varying lighting conditions. In: 2003 IEEE International SOI Conference. Proceedings (Cat. No. 03CH37443). IEEE. 2003;157–64.
Du S, Ward RK. Adaptive region-based image enhancement method for robust face recognition under variable illumination conditions. IEEE Trans Circuits Syst Video Technol. 2010;20(9):1165–75.
Article Google Scholar
Jung CR, Scharcanski J. Adaptive image denoising and edge enhancement in scale-space using the wavelet transform. Pattern Recognit Lett. 2003;24(7):965–71.
Article Google Scholar
Mallat SG. A theory for multiresolution signal decomposition: the wavelet representation. IEEE Trans Pattern Anal Mach Intell. 1989;11(7):674–93.
Article Google Scholar
Ergen B. Comparison of wavelet types and thresholding methods on wavelet based denoising of heart sounds. J Signal Inf Process. 2013;4(3B):164.
Google Scholar
Jumah A Al. Denoising of an image using discrete stationary wavelet transform and various thresholding techniques. J Signal Inf Process. 2013;4:33–41.
Google Scholar
El-Badawy A, Rashad R, et al. Ultrasonic rangefinder spikes rejection using discrete wavelet transform: application to uav. J Sensor Technol. 2015;5(02):45.
Article Google Scholar
Devi D, Sophia S, Prabhu SB, Deep learning-based cognitive state prediction analysis using brain wave signal. In: Cognitive Computing for Human-Robot Interaction. Elsevier, 2021;69–84.
Nicolis O, Mateu J, Contreras-Reyes JE. Wavelet-based entropy measures to characterize two-dimensional fractional brownian fields. Entropy. 2020;22(2):196.
Article MathSciNet Google Scholar
Alexandris N, Gupta S, Koutsias N. Remote sensing of burned areas via pca, part 1; centering, scaling and evd vs svd. Open Geospatial Data Softw Stand. 2017;2(1):1–11.
Google Scholar
Tang J, Alelyani S, Liu H. Feature selection for classification: a review. Data Classif Algorithms Appl. 2014. p. 37.
Jolliffe IT, Cadima J. Principal component analysis: a review and recent developments. Philos Trans Royal Soc A: Math Phys Eng Sci. 2016;374(2065):20150202.
Article MathSciNet Google Scholar
Turk M, Pentland A. Eigenfaces for recognition. J Cogn Neurosci. 1991;3(1):71–86.
Article Google Scholar
Shinde K K, Tharewal S S, Suryawanshi K S, Kayte C N, Python based face recognition for person identification using pca and 2dpca techniques. In: 2020 International Conference on Smart Innovations in Design, Environment, Management, Planning and Computing (ICSIDEMPC), IEEE, 2020; 171–175.
Martinez AM, Kak AC. Pca versus lda. IEEE Trans Pattern Anal Mach Intell. 2001;23:228–33.
Article Google Scholar
Yuen PC, Lai J-H. Face representation using independent component analysis. Pattern Recognit. 2002;35(6):1247–57.
Article Google Scholar
Belhumeur P. N, Hespanha J. P, Kriegman D. J, fisherfaces Eigenfaces vs. Recognition using class specific linear projection. IEEE Trans Pattern Anal Mach Intell. 1997;19(7):711–20.
Article Google Scholar

Download references

Funding

The manuscript did not receive financial support from any funding institution.

Author information

Authors and Affiliations

Department of Statistics & Actuarial Science, School of Physical and Mathematical Sciences, University of Ghana, Legon, Accra, Ghana
Joseph Agyapong Mensah, Ezekiel N. N. Nortey, Eric Ocran, Samuel Iddi & Louis Asiedu
Department of Computer Science, Ashesi University, No. 1 University Avenue, Berekuso, Eastern Region, Ghana
Joseph Agyapong Mensah
African Population and Health Research Center (APHRC), Research Division, Nairobi, Kenya
Samuel Iddi

Authors

Joseph Agyapong Mensah
View author publications
You can also search for this author in PubMed Google Scholar
Ezekiel N. N. Nortey
View author publications
You can also search for this author in PubMed Google Scholar
Eric Ocran
View author publications
You can also search for this author in PubMed Google Scholar
Samuel Iddi
View author publications
You can also search for this author in PubMed Google Scholar
Louis Asiedu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Joseph Agyapong Mensah wrote the original draft of the manuscript, Ezekiel N.N. Nortey and Eric Ocran reviewed and edited the manuscript, Louis Asiedu and Samuel Iddi supervised the methodology development, reviewed and edited the manuscript.

Corresponding author

Correspondence to Louis Asiedu.

Ethics declarations

Ethics approval and consent to participate

The images in the manuscript are openly published data and allowed to be used for all academic research to benchmark face recognition algorithms for face verification. There are no ethical concerns in using these images as they are openly accessible and created for research purposes.

Consent for publication

All the authors grant our consent for the publication of the manuscript.

Competing interests

The authors declare that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Mensah, J.A., Nortey, E.N.N., Ocran, E. et al. De-occlusion and recognition of frontal face images: a comparative study of multiple imputation methods. J Big Data 11, 60 (2024). https://doi.org/10.1186/s40537-024-00925-6

Download citation

Received: 06 October 2022
Accepted: 16 April 2024
Published: 29 April 2024
DOI: https://doi.org/10.1186/s40537-024-00925-6

De-occlusion and recognition of frontal face images: a comparative study of multiple imputation methods

Abstract

Introduction

Materials and methods

Data acquisition

De-occlusion via imputation methods

Imputation algorithms

Multiple imputation with chain equations (MICE)

MissForest

Regularized expectation-maximization (RegEM)

Research design

Preprocessing

Discrete wavelet transform (DWT)

Mean centering

Feature extraction

Assessment of the quality of de-occluded images

Entropy

Absolute mean brightness error (AMBE)

Peak signal-to-noise ratio (PSNR)

Contrast

Results and discussion

Assessment of image quality after using the various de-occlusion mechanisms

Assessment of the performance of the study algorithm under the various de-occlusion mechanisms

Conclusion and recommendation

Availability of data and materials

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification