Cursor movement detection in brain-computer-interface systems using the K-means clustering method and LSVM

In this study, we present the detection of the up-downward as well as the right- leftward motion of cursor based on feature extraction. In this algorithm, the K-means clustering method is used to recognize the available hidden patterns in each of the four modes (up, down, left, and right). The identification of these patterns can raise the accuracy of classification. The membership degree of each feature vector in the proposed new patterns is considered as a new feature vector corresponding to the previous feature vector and then, the cursor motion is detected using the linear SVM classifier. Applying the proposed method for data based on the hold-up cross validation causes the accuracy of the classifier in the up-downward and left- rightward movements in each person to increase by 2–10 %.

A BCI program specifies appropriate physical or mental tasks and selects electrodes related to these tasks. It extracts features of signal recording by means of these electrodes, and ultimately produces an algorithm with the highest classifiers. Then, it the information is transferred to communication and control units through implementing the algorithms to translation. Developing a quick and precise classification method is a key subject for communication and control over the EEG signal. Computer cursor control is one of the most useful applications of BCI. The first studies on the cursor control based on EEG signal were focused on one-dimensional control of cursor [13,14]. After obtaining satisfactory results, the attention of researchers was drawn to the multidimensional cursor control with the aim of increasing the relationship between the user and machine. Based on the studies, the multi-dimensional cursor can be controlled by implementing different brain signals, for example the P300 potential [15,16], synchronous and asynchronous signals [17] and evoked potentials [18,19]. They showed well that the input signals of an EEG-based brain-computer-interface system have commonly weak, non-constant and mind tasks-related noises with various artifacts such as external electromagnetic waves and electromyogram and electro-echogram waves. All of these defects were considered by researchers to improve the key features of BCI system function including the classifier velocity and precision. In addition to improving the functional features, the operational and realistic construction of the EEG-based BCI systems was another major challenge from the viewpoint of researchers. It highly relies on using a training process, which is based on a small number of training signals, classifying channel algorithms and features which can represent the studies tasks better. The proper input design, features extraction and appropriate classification of a BCI system has been undertaken over the past 20 years. Various features of the EEG signal serve as inputs for a BCI, such as the Mu band (8)(9)(10)(11)(12) and beta band (18)(19)(20)(21)(22)(23)(24)(25) [20], event-related potential such as P300 [21,22], steady-state visual-evoked response or the surface slow cortical potential (SPCs) [23,24], and several feature extraction methods for motor imagery data as input signals, including AR parameter [25] estimation in a specific frequency band [20,26], domain of slow cortical potential [27], common spatial pattern, event-related synchronization (ERD / ERS), wavelet correlation coefficient [28], and spatial pattern spectrum [29]. Many of the classifier algorithms, such as K-nearest Neighberhood (KNN), Linear Discriminate Analysis (LDA) [30], hidden Markov model [31], Neural Networks [32,33], Multilayer Perceptron, and Mahalanobis Distance (MDA) [34,35] were tested for BCI programs. The feature extraction and classifier algorithm has been investigated in BCI applications [36]. Since 2001, several BCI competitions have been organized for improved target assessment and comparison of methods for EEG data derived from mental tasks and also addressing of key issues in BCI research, and significant results have been reported for BCI data. Currently, BCI technology lacks desirable speed and precision, and these are two important variables affecting current attempts to address issues related to the creation of real BCI systems in the future.To this aim, the classifier can discriminate between the EEG signals recorded in different sessions and days with the mental tasks belong to the BCI system. It should also be borne in mind that BCI systems have the potential to help people of all ages with severe mobility impairments. Although some studies on the BCI performance in older people didn't yield good results due to poor rate of data transfer, .In the BCI competitions, [25] obtained a classifier precision of 88.7 % with a linear classifier using gamma band power (Channel 4) and SCPs (Channel 1) [36]. Then, by selecting a neural network as classifier, the classifier accuracy was enhanced to 91.47 %. In addition, [21,30] by extracting features of 6 channels and using wavelet transform algorithm, they obtained a classifier accuracy of 90.8 % by the neural network. [30,37] used principal component analysis (PCA) in 6 channels and their spectral properties, reporting an accuracy of 90.44 %. Trejo et al. [38] showed that people of various ages can benefit from this technology. It has also been observed in recent studies that the combination of EEG and EMG signals in BCI systems can be used for several activities in cases of mobility impairments. They believed that users would be able to understand how to interact with such interactive systems [39]. Although these methods have desirable classifier accuracy, according to complete training data, they were capable of using more than one electrode and complex vectors. Therefore, they are complex, low portable and in need of extensive training and computational time. Their results not only revealed the accuracy of diagnosis, but also significantly reduced the calculation time for training due to the reduced feature space. However, the drawback of these methods is that they used training accuracy as a criterion for the evaluation of different combinations of features. In fact, different subsets of features were trained for creating the optimal subset with the best accuracy for diagnosis, although it was ineffective. The objective of this study is to classify the EEG signal in the BCI systems using data mining technique. As we know, data mining plays a significant role in extraction of hidden information or patterns and relationships in a large amount of data. Feature extraction and selection are very important in data mining. In addition, an increase occurs in the computational time when the number of descriptive features increases. To deal with the large number of features in the present study, a method for recognizing the pattern and extracting the essential information for detection of the cursor movement has been studied. The present study aimed to find a method for the input motion efficently and efficently by implementing the data mining technique. This is done using the hybrid K-means clustering method and the linear support vector machine (LSVM) classifier [40] for detecting the upward, downward, leftward and rightward motion.
This study proposes a simple and straightforward algorithmic method to improve the classification of BCI data. A smaller number of features reduces the computational times. Although the speed comparison with other studies cannot be made objectively since they did not report any results on the training times or the testing times. The curser movements pattern recognition suggests a new way on how to determine features for the the up-and downward and left-and rightward movements. The K-LSVM reduces the input features space dimensions and reconstructs the format of the features to optimize the classifier. So the training time has been reduced and the accuracy is improved. Because the noisy information has been eliminated.
The present paper's agenda is as follows: Sect. "Materials and methods" describes the collection of data used for classification. Besides, it presents the K-means algorithm was proposed as an unsupervised algorithm to extract motion features and prevent from repetiting the training in different subsets. All of the data related to feature can be stored in a dense method for one-time training instead of performing multiple training pilots in different feature subsets since the K-means algorithm clusters the principal feature space with unsupervised learning. Therefore, a membership function was used to obtain optimal results from the K-means algorithm for training. Further, Sect. "Results" provides a summary of a combination of K-means and LSVM classifier for motion detection with high accuracy has been proposed in this study. Lastly, Sect. "Discussion" and "Conclusion" give critical insights into the performed interpretation and accuracy, and address some limitations and possibilities of the presented method.

Data acquisitions
The research was performed by online collection of EEG signals from three normal subjects at two different sessions with a delay of one week in 2014 at the Karadeniz Technical University of Turkey [41]. The sampling frequency was 256 Hz and a notch filter was used to eliminate noise. The device consisted of 18 electrodes based on the international system 10-20 shown in Fig. 1, in which the two electrodes FP1 and FP2 were related to EOG artifacts, the two electrodes O1 and O2 were associated with EMG artifacts and the remaining 14 electrodes were used in the proposed method for extracting features.  1 The manner of electrodes placement based on the international system 10-20 [41] Each person sat on a chair facing a 19 × 19 screen and was asked to remain motionless during the experiment. Each trial lasted at least 10 s and was performed with a delay of 2 s. During the trial, the target appeared in one of the four possible modes on the display and the person was asked to imagine the appeared mode for 8 s; each trial ended with a beep sound. The data recording was repeated for each person one week later. Each data recording contained 4 runs and each run contained 40 trials (10 trials for each class or movement). Therefore, each data recording contained 160 trials ( 40 × 4 = 160 ). The length of data for each trial was 2048 samples ( 8 × 256 = 2048 ). The data were generally divided into two categories of train and test data as shown in Table 1. The number of considered trials varied from person to person.

Feature extraction
The four following feature extraction parameters [41] were implemented to represent the experimental signals as shown in Table 2.

Continuous wavelet transform
The continuous wavelet transform (CWT) is a strong signal processing tool used as a feature in many BCI data analyses [32,42]. CWT is a convolution of the signal X(t) with the wavelet function ψ τ .s (t) so that: where, ψ(t) is a continuous function in time and frequency, known as the mother wavelet. s and τ are the scaling and shifting parameters, respectively.
The mother wavelet applied in this method is the Morlet wavelet transform. The analysis of the statistical features in the train dataset by applying the Morlet wavelet transfer showed that the mean and standard deviation of the continuous wavelet coefficients can be used as feature. The mean and standard deviation as the first feature are calculated in accordance with the following relationships. where L CWTC is the length of CWT. The mean and standard deviation of the continuous wavelet coefficients for all channels are calculated as the first feature based on the following three parameters.
1. Based on channel selection; the proposed method includes 14 channels.
2. Based on recording duration, which has been set to 8 s in the proposed method.

Autoregressive model
A raw signal x(t) , as an autoregressive (AR) model, is expressed as follows: where, A 1 .A 2 . . . . .A p are the parameters of the AR model, p is the order of the model and n is an integer representing the discrete time points of the signal. e(n) is the white noise with zero variance and mean. The suggested AR model is calculated as a second feature based on the following three parameters: 1. Based on channel selection; the proposed method includes 14 channels. 2. Based on recording duration, which has been set to 8 s in the proposed method. 3. The order of the model; the model order in the proposed method is p = 8.

Skewness and average derivative
Skewness and average derivative as third and fourth features are calculated based on the following relationships and parameters.
1. Based on channel selection; the proposed method includes 14 channels. 2. Based on recording duration, which has been set to 8 s in this method and the order of the model, which has been considered p = 8. Figure 2 shows the block diagram of the proposed method, which aims to detect cursor movement in up-and downward and left-and rightward directions. The proposed method is a hybrid k-means clustering algorithm and LSVM classifier. The feature extraction is used for patterns identification and the K-means clustering algorithm is used for clustering the movements based on the similarity of the features of the up-down, left-rightward movements considering the membership degree of data. To reduce the big dimensions of the feature space, the algorithm extracts some movement patterns before training of data for proper classification. This method considers a new pattern of k-means algorithm as a new feature of movements. The new features are different from the previous features that only included the pattern of movement features. Therefore, the dimension of feature space decreases and the new features show the similarity between the movements and patterns. Further, the boundary between up-downward and left-rightward movements are estimated by the classifier based on these new features. As the dimensions of the feature space decrease, the dataset is reconstructed with the new features. The optimal channel for each person has been calculated. In this method, once the up-downward mode and then the left-rightward mode are classified.

Description of the proposed method
A K-means algorithm can play a role in applying an optimization solution in order to reduce the distance between cluster centers and cluster members, which is calculated according to the relation (9).
In the relation (9), k is the cluster index, S k represents the kth cluster of the dataset, µ k indicated the center point of the cluster and k is considered as the total number of clusters, which can contribute to the normalization of the points of data for removing the (9) min µ 1 ,µ 1 ,··µ k K k=1 i∈S k X i − µ k 2 Fig. 2 Block diagram of the proposed method impact of the scale used for different features. The k-means algorithm adapts the location of the center repeatedly for decreasing the Euclidean distance in order to choose the cluster centers. In this regard, two methods are available for estimating the number of clusters: 1. Based on the experiences of expert. 2. Based on the similarity criterion.
The second method has been used for the similarity of movement patterns. The similarity criteria are calculated according to relations (10) and (11).
where d avg shows the average distance from each member i to the center µ k in the same cluster S k , d min indicates the smallest distance between the two clusters, X i j is considered as the j th input element of the member i.X µk j is regarded as the j th input element of the center µ k . Accordingly, N and F indicate the total number of data points and the dimension of the input vector, respectively. The number of optimal clusters ( K * ) is calculated by the function validity(θ) as indicated in relation (12).
where ( θ ) represents the validity ratio for assessing the various number of clusters, upon which an acceptable number of clusters is found to recognize the hidden patterns of movement. Based on the results, the average distance from each member to the center of that cluster ( d avg ) decreases by computing the minimum ratio of validity. However, an increase occurs in the minimum distance between two cluster centers ( d min ). In other words, the determind clusters are concentrated in their own cluster and separated from others. No multiple pattern is shown with a large number of members when the value of k is close to the number of data points. In fact, the cluster density is calculated by the k-means algorithm, which is small and quite similar to the original data. Therefore, the least range of k in the proposed method ( 2 < K < 14 ) is considered for finding the number of local minima of clusters to show the accumulated patterns of each type of motion. In this case, the magnitude of the scale of the main dataset is reduced by the available labels for each person.
After recognizing the movement patterns, some movements fall into other clusters with symbolic labels. The similarity between the untested movement and symbolic movement can pave the way for recognition. Thus, a membership function is used to assess the degree of similarity of the main data points and provided motion labels, which indicates the points' fuzzy membership of the recognized patterns. The membership function is as follows: In relation (14), c is the index for the new pattern, X i j represents the j th feature of the main input i and X µc j denotes the j th feature of the center µ c in the cluster S c , which is obtained from the previous K -means algorithm. K m and K b are the number of clusters related to the class data.
Using this membership function, the similarity between movements and patterns is assessed from the viewpoint of how well-suited a movement is for pattern recognition.

Results
The results have been obtained from three subjects by determining the optimal channel for each of them. The proposed method is the hybrid K-means clustering and LSVM classifier.
The K -means clustering has been used for recognizing the hidden patterns and the LSVM classifier for separating the up-downward and the left-rightward movements. The optimal channel for each person has been calculated according to Table 3. In this method, once the up-downward mode and then the left-rightward mode are classified. Based on the results, the LSVM classifier in the proposed method has effective results, as the applied hybrid method has led to an increase of 2-10 % in the accuracy of the classifier.
Sensitivity and specificity are two statistical measures for evaluating the outcome of a binary classification test. When data can be divided into two positive and negative groups, the accuracy of the results of a test that divides the data into these two groups can be measured and described using the indices of sensitivity and specificity. Sensitivity (the true positive rate) is the proportion of positives that are correctly identified as positive by the test. Specificity (the true negative rate) is the proportion of negatives that are correctly identified as negative by the test. Mathematically, sensitivity is expressed as the ratio of true positives to combined true and false positives. Sensitivity and specificity for up-downward and left-rightward movements in the proposed method are calculated based on the relations (15) and (16), as seen in Table 4.

Discussion
In this study, K-LSVM method was proposed based on known special patterns. This can be comparable to conventional data mining methods in motion detection. In the feature extraction step, conventional data mining methods are not used; instead, clustering is utilized to extract symbolic motions with the aim of representing the cluster of motions. The relationship between input motion and symbolic motion as the pattern membership is evaluated to predict new motions. The proposed method is comparable to Aydemir et al. [41] with same dataset and features. They classified the EEG signal based on the decision tree structure during the cursor movement. They could separate the up-and downward and left-and rightward movements by achieving the accuracy of 55.92 %, 57.92 and 80.24 % for the first, second and third person, respectively. As the feature applied in their approach is according to Table 2 and the three classifiers LDA, KNN and SVM have been used for separating the movements, the applied decision tree structure is different in each person and this is one of the weak points of this approach, while the hybrid algorithm applied in the proposed method in the present study, considering the use of the same features, is similar for all subjects, and hence, the proposed method has caused an increase of 2-10 % in the classifier accuracy. There are no researches on hybrid clustering and classifiers in BCI systems and recent studies using this approach have been concentrated on breast cancer diagnosis. Bennett et al. [43] tested this algorithm on a set of breast cancer data for clustering the benign and malignant tumors and suggested the detection of data patterns as one of the applications of clustering algorithm. Since the method used by the researchers did not focus on the manner of determining the quantity of clusters in datasets, the test clusters were randomly divided into two benign and malignant categories. The number of patterns should be determined to evaluate the hidden patterns of breast cancer. There is also a need for comprehensive exploration of the connection between the supervised learning and learning without  Their clustering was based on motor size. They applied the K-means algorithm to this dataset. Nine features were used in this method. In the first phase of the K-means clustering algorithm, the Manhattan distance criterion was used to measure the distance between nodes and the selection of the center of the primary cluster was done randomly. After implementing the algorithm at k = 2 on 286 data points, the average probability of the correct classification was 66.99 %. In the second phase of K-means clustering, the Euclidean distance criterion was used to measure the distance between nodes and the selection of the center of the primary cluster was done randomly. After implementing the algorithm at k = 2 on 286 data points, the average probability of the correct classification was 74.98 %. In the third phase of K-means clustering, the cluster centers were graded in order to raise the clustering accuracy and the Pearson's correlation criterion was used to measure the distance between nodes and the selection of the centers of the primary cluster was achieved through co integration. After implementing the algorithm at k = 2 on 286 data points, the same results were obtained from all clustering. The average probability of the correct classification was 100 %. Based on the above-mentioned issues, a large number of methods have been utilized for diagnosing breast cancer based on implementing clustering and classifier. Having a large scale of data is considered as one of the main disadvantages of conventional and traditional methods. Although the time calculations in different training are decreased by applying explorations in feature selection, no comprehensive search is necessary for selecting the feature for the clustering algorithm, especially the K-means algorithm, which results in better understanding of the number of training samples without any help for extracting and selecting the related feature. The proposed method is comparable to Aydemir et al. [41] with same dataset and features. See Fig. 3. They classified the EEG signal based on the decision tree structure during the cursor movement. The applied decision tree structure is different in each person and this is one of the weak points of this approach, while the hybrid algorithm applied in the proposed method in the present study, considering the use of the same features, is similar for all subjects, and hence, the proposed method has caused an increase of 2-10 % in the classifier accuracy. Deep Neural Networks (DNN), however, have shown promising results in BCI research. Deep neural networks are still lagging behind in performance due to the unavailability of large training datasets [46][47][48]. Main disadvantage of this method was its time consuming length.
We compare our proposed method with other studies as seen as Fig. 4. Lee et al. [49] computed the Common Spatial Pattern (CSP) which is the famous feature extraction method in BCI researches. Kayagil et al. [50] and Huang et al. [51] were the most closes ones to this method because of experimental properties. Aydemir et al. have same dataset and features.
In the present study, a combination of K-means clustering algorithm and classifier were implemented to compress the feature extraction space in order to decrease the computational costs. The proposed K-LSVM method in this study emphasizes on discovering the knowledge framework, searching for more knowledge of motion detection through implementing data mining methods. In addition, descriptive features and data will be collected in the future. However, feature extraction and selection are considered as a challenging issue for researchers, although large data with lost values can cause another issue which should be taken into consideration with respect to computational time.

Conclusions
In the present study, K-means was used as clustering algorithm in order to recognize and obtain motion patterns. These patterns are reconstructed as a new feature for the training phase. Pre-processing steps such as feature extraction and selection can play a role in creating a highly effective feature for the purpose of machine learning algorithms which is used for training the classifier. Regarding K-LSVM results, the calculation time can be significantly decreased without measuring the detection accuracy. However, using the k-means algorithm failed to decrease the size of training samples by filtering similar samples in the data, which can be regarded as a potential method for reducing the size of the training set. Based on the results, the LSVM classifier in the proposed method has effective results, as the applied hybrid method has led to an increase of 2-10 % in the accuracy of the classifier. The dataset used in this study represents a complete dataset with lost values; however, applying this method to a broad set of low-size data opens a venue of research in the future. So far, no general rule has been adopted for evaluating the number of patterns for detecting motion. In the future, a method for evaluating the number of symbolic motions can be developed, which may decrease the calculation time for extracting feature and allows experts to comprehend the motions better based on the patterns derived from the methods implemented for data mining methods.