Binary image description using frequent itemsets

In this paper, a novel method for binary image comparison is presented. We suppose that the image is a set of transactions and items. The proposed method applies along rows and columns of an image; this image is represented by all frequent itemset. Firstly, the rows of the image are considered as transactions and the columns of the image are considered as items. Secondly, we considered rows as items and columns as transactions. Besides, we also apply our technique to color image; firstly we segment the image and each segmented region is considered as a binary image. The proposed method is tested on the MPEG7 database and compared with the moment’s method to show its efficiency.

for a large set of items in the initial database analysis. The result of this analysis is then used as a base for discovering other datasets during other passes. The rules having a support level above the minimum threshold are called small itemsets [14]. On the other hand, the algorithm is based on the big itemset property which states: Each subset of a large itemset is large and if an itemset is not wide and then none of its supersets are large [8].
In this paper, we propose a novel method to compare the binary image based Apriori algorithm. We transform our image into a set of items and transactions (see Fig. 1). In which, we suppose that image is a set of transactions and items. Our method is applied along rows and columns of an image. Thus, the Apriori algorithm is applied along rows and columns of an image: 1. Apriori algorithm along row: The rows of the image will be considered as transactions. The columns of the image will be considered as items. 2. Apriori algorithm along column: The rows of the image will be considered as items.
The columns of the image will be considered as transactions.
The rest of the paper is organized as follows. "Related work" section discusses related work. In the "Methodology" section, we explain the proposed approach. "Results and discussion" section shows experimental results and discussion. Finally, the conclusion is drawn, and the future work described, followed by references.

Related works
Different types of techniques are presently used in the domains of big data analytics and image processing [15][16][17][18][19]. The integration and interaction of the two wide fields need more vision to better exploit and explore the benefits of the two techniques. Vahini et al. [20] proposed a state of the art in image processing and big data analytics, this study aims to focus of the recent research progresses in the two broad fields of Big Data analytics and image processing to show out the importance of their interaction and integration. A novel workflow based on big data analytic for biomedical image classification is proposed in [21]. It is composed of two structures; the first is based on the spark and the second on the Hadoop framework. In [22] authors present an efficient fast-response content-based image retrieval (CBIR) Hadoop-based framework that consists of a set of modules working in two layers. Creating a content-based image retrieval system pointing at big data to operate efficiently with real-time data is counted as a critical competition.
There are a lot of methods for image comparison. Current papers aim at reviewing the available tools and techniques for image processing and comparison [23][24][25]. These tools in relation to big data can create a user-friendly solution in solving the industrial problem [26]. Several approaches to compare binary images are proposed in the literature [27]. Some are based on the comparison of image descriptors and others on the direct comparison of image. In direct approaches, the comparison is done between the images directly from the pixels. Direct comparisons between binary images are difficult to be implemented. This is because unlike the color image, the pixels in the binary image have little information: the amplitude of the variation is 0 or 1. Direct comparisons can be used to assess noise, the segmentation fault [28] and to compare images of faces contours to evaluate the dissimilarity [29][30][31]. Most are based on an aggregate measure and provide a measure of similarity in the scalar form (a positive real), they are also based on the local study (as part of grayscale images) [32] and provide a similarity index in two dimensions. Indirect methods are based on image processing to represent the image in a new space. The geometrical moments, with two dimensions, which provide descriptors for recognition have many applications in image processing. In robotics, they are used for motion search calculation and guidance for registration; they are also used in image processing for solving some problems such as matching scene [33] and character recognition [34]. The methods based on frontier methods such as Fourier descriptors, space decomposition, contour curvature-scale and decomposition of the principal component outline have many applications especially for the treatment of forms [35]. The methods based on the border are a robust descriptor for a shape, but they are not available for a complex image, which contains several details. Transformed the entire image: global transforms (Fourier and wavelet transform) are unsuitable for the processing of digital images as they are composed of discontinuities. Filtering non-linear methods are adapted for binary images [36]. However, in the case of linear filtering, the comparison of the transformed images can be performed after selecting the most significant coefficients (for the comparison of their standards), which is much less discriminating factor in the case of binary coefficients.
We have presented in this section direct and indirect methods of image comparison. Indirect methods are inadequately appropriate for comparing binary images other than shapes. However, direct methods are actually interesting for measuring a distance between sets of points and not only pixel to pixel. But, they do not generally allow access to the local dissimilarity information or so without adapting to the type of dissimilarity to assess. In this work, we apply the Apriori algorithm, which is an indirect method to find frequent itemsets.

Methodology
In this section, we first describe briefly Hu's moments method [37]. Then, we present our proposed method.

Moments method
Moments invariants are introduced by Hu. In [37], Hu derived six absolute orthogonal invariants and one skew orthogonal invariant based upon algebraic invariants, which are not only independent of position, size and orientation but also independent of parallel projection. The moment invariants have been proved to be the adequate measures for tracing image patterns regarding the translation of the images, scaling and rotation under the assumption of images with continuous functions and noise-free. Moment invariants have been extensively applied to image registration [38] and image pattern recognition [38][39][40][41]. The well-known moments include geometric moments [37], zernik moments [42,43], rotational moments [44] and complex moments [39].
In his work, Hu proved that each irradiance function f(x, y) (or image) has a one-to-one correspondence to a single set of moments and vice versa. The (p + q) th order geometric moment of f(x,y) can be expressed in the following equation: So, the 7 invariants moments Hu are shown in Eq. (4). With i, j = 0, 1, 2,... and

Proposed technique
In this section, we present our approach. In Fig. 2, an illustration to explain the proposed approach is presented. As shown in this figure, example image the size 8 × 10 .
If we take min_sup = 2, then the triplet ((2,3), 2,5) contains three frequent items of length 1, two frequent items with length 2 and one frequent item of large three (see Fig. 2b, c).
In general, we can rewrite a frequent pattern using the triplet (Position First Pixel, Start row, End row, Width) or (Position First Pixel, Start column, End column, Width).
Where Position_First_Pixel: indicates the position of the first pixel inside a frequent pattern.
• Star_row: denotes the number of starting row, • End_row: denotes the number of ending row, • Width: indicates the large of frequent pattern. The principle of the proposed approach is that the image is a set of transactions and items. We apply the Apriori algorithm along rows and columns of an image; this image is represented by all frequent items. Algorithm 1 presents the algorithm Apriori for a binary image. In the case of the row, the rows of the image are considered as transactions and the columns of the image are considered as items. In this case, the column, the rows of the image are considered as items and the columns of the image are considered as transactions.

Apriori algorithm for color image
The main idea is that we segment the image, and each segmented region is considered a binary image. The segmentation [45] is the first step in analyzing an image. It is a separation of the image elements into homogenous regions having the same property. These regions can be characterized by their borders, in this case, they are characterized by the pixels that compose those borders.
In general methods of image segmentation can be one of the following four cases: 1. Segmentation based on regions, 2. Segmentation based on contours, 3. Segmentation based on the classification of pixels as a function of intensity, 4. Segmentation based on cooperation of these three methods.
For simplicity in our study, we apply the first case which is based on dividing the image into small regions [46].

Results and discussions
We conducted an experiment to measure the performance of the proposed method.
In the experiment, we present the tools used to evaluate our approach. Then we compare the proposed approach with Hu moments approach and we test its robustness to noise. Finally, we apply our approach to color image, in addition to an application on the MPEG7 database is presented. We developed a java application that is used to implement our approach and a simple interface that makes it possible for the user to import a binary image for the Apriori algorithm (see Fig. 3). The user imports the binary image, an interface allows him to view the image and also in the same interface to find the Apriori algorithm result. The editor was used in this study is the Eclipse IDE. It is an integrated development environment and open source. It is also characterized by open plugin-based architecture. This is one of the most used editors by the developer's Java IDE. Our application runs under the Windows platform, using 2.20 GHz and 1024 MB memory.
In this study, we extract seven moment invariants from our test image and noisy image. In Fig. 4, image (a) is the original image, image (b) is a noisy image obtained  Concerning our approach, our technique is presented by: Apriori algorithm along the row. In this case, the rows of the image are considered as transactions and the columns of the image are considered as items. We take minsup row = 50%, minsup is arbitrarily set.
Apriori algorithm along the column. In this case, it is the same procedure applied along row except Apriori algorithm is applied to columns. The rows of the image are considered as items and the columns of the image are considered as transactions. Figure 5 gives a binary image of the English letter "F. " Figure 5a denotes the original image, Fig. 5b denotes the noised image, and Fig. 5c denotes the scaled image. Different kinds of imaging systems might give us different noises. In this study, we used Gaussian noise [47].
In the case of color images, we segment the image; the color image is presented in RGB space, so we apply our technique using the following steps: 1. Segmentation of the image into regions using k_means, 2. Each segmented region is considered as binary image to which we apply Apriori algorithm, 3. Extract the frequent items for each image.
In this study, we test our technique in image color, as shown in Fig. 6.  Table 1 shows the values of seven moment invariants for the images (a-d) of Fig. 4. We remark that only the value of the moment is different for the original image and the corresponding noisy image. Tables 2 and 3 show the obtained results when we use our method. We note that the results are the same for the three images, so there is no influence of the noise. Table 2 shows only the maximum length frequent items but in our experience, the program displays all levels of frequent items. Rows 8,9,10,11,12,13 and 14 constitute a frequent itemset of length 7 for each image. In addition, Table 4 gives the experimental results for the binary image with scaling and with added noise. The scale is solved by normalizing image to a fixed size.
Notice that, Hu moment invariants depend on noise. Despite the presence of noise two images have the same frequent columns or rows concerning the proposed approach.
From these results, it can be observed that the proposed approach is better than Hu moment invariants.
Images in Fig. 7 are used to show that our approach can be used to compare binary images of different levels. The results found with the Apriori algorithm along rows for Table 1 The value of Hu moment invariants   Image  φ1  φ2  φ3  φ4  φ5  φ6      images given in Fig. 7 are shown in Table 5. Image (a) depicts four frequent items in different length, from the longest to the shortest item. Image (b) has one frequent item having a maximum length. If we take only items of maximum length, images (a) and (b) will be similar. The results found with the Apriori algorithm along rows for binary images Fig. 6b-d are shown in Table 6.

Application on MPEG7 database
In order to test our approach; we apply the Apriori algorithm on the MPEG7 database [48]. In this database there are 70 classes of shapes, each one has 20 members as shown in Fig. 9. We extract the frequent items from the image by counting the frequency of non black pixel. An MPEG7 image of size 256 × 256 is presented in Fig. 8.
We have found that is easy to extract frequent item from image. In Figs. 10 and 11 the authors show the frequency of non black pixel respectively in each row and each column in the image. As can be seen in Figs. 10 and 11 we can notice that those graphs present mostly the shape of ' Apple' image. Frequency of non black pixel for each row and frequency of non black pixel for each column are given in Tables 7 and 8 respectively. As row '141' contains '222' white pixel it can be considered as frequent if min-sup is fixed to 10.

Conclusion and future work
In this work, we have proposed a technique to compare binary images. We considered an image as a set of transactions and items. The proposed technique depends on the Apriori algorithm, we apply Apriori along rows or columns of an image. In addition, we    16 17) apply our technique to color image. The result shows that our proposed approach based on the Apriori algorithm is agreeable particularly this technique is robust for noise.
In the future, we will integrate this technique to solve the rotation problem. Then, we will integrate methods, such as FP-growth, FTWeightedHashT Apriori and MISFPgrowth to solve the problem of large data (Large Binary Image).