Learning users’ emotions is essential in many applications. Emotions can be inferred through text expressions and HAR as facial expressions, hand gestures, body posture, and voice. In [9], social robots were used as communication assistance for education and entertainment. The study described an acceptable and natural interaction between social robots and children. The thermal facial reaction of youngsters, i.e., the nose tip temperature signal, was recorded and classified in real-time using the Mio Amico Robot during an experimental session. The categorization was performed by comparing the thermal signal analysis-classified emotional state to the emotional state recorded by Face reader 7. An empathic robot in [6] recognized human emotions through facial expressions and automatically responded to these specific emotional states. It produced a state-of-the-art accuracy rate of 95.58%. Additionally, it implemented a convolutional neural network (CNN) and a bank of Gabor filters in different experiments for feature representation. It also employed SVMs and multilayer perceptron as classifiers. Customer feedback detection in [12], a multimodal affect recognition system was developed to classify whether a customer likes or dislikes a product examined at a counter by analyzing the consumer’s facial expression, hand gestures, body posture, and voice after testing the product. Hand gesture recognition is widely used in scientific research; it is crucial for interacting with deaf individuals. Constantine et al. [10] proposed a transfer-learning approach using AlexNets and hyperparameter tuning using ABC, GA, and PSO algorithms. The methodology produced effective outcomes with an average accuracy of 98.09%, outperforming the best work in the medical sector. In [19], computational analysis techniques measured the emotional facial expression of people with Parkinson’s disease (PD). It is essential to examine an experimental pilot work for masked face detection in PD since PD experiences hypomania, which often reduces facial expression. This experiment achieved an accuracy of 85% on the testing images using a deep learning-based model.
Multilabel emotion classification is a hot topic in emotion analysis tasks since it represents real-life situations where the human may express a mixture of emotions in their text simultaneously. For example, the text may express happiness, love, optimism, sadness, or pessimism. Thus, building models with more than one output emotion for each input text is more beneficial. The following few paragraphs present some recent efforts in Arabic multilabel emotion classification.
Emotion mining in Arabic (EMA) [16] performed emotion and sentiment mining on Arabic tweets. First, preprocessing steps are performed by applying normalization rules adopted in [20], including diacritics or tashkeel, hamza, elongations, and non-Arabic letter removals. Then, most frequent emojis are replaced with the corresponding Arabic word using a manually created lexicon to replace each emoji. Finally, ARLSTEM [21] is used for stemming. In the feature selection stage, the author tried different features separately. However, the word embedding from AraVec proved to be the best feature. The tweet is finally classified either as neutral or as one or more of the 11 emotions (anger, disgust, anticipation, joy, love, optimism, fear, pessimism, sadness, trust, and surprise). A linear support vector classifier (SVC) achieved the best performance among all classifiers tested, with a test accuracy of 0.489.
TW-Star [22] used different preprocessing stemming (Stem), lemmatization (Lem), stop words removal, and common emoji recognition (Emo). The preprocessed tweets are classified by a binary relevance (BR) multilabel classifier using SVM with term frequency inverse document frequency (TF-IDF) features. Several experiments with different combinations of preprocessing achieved the best results of accuracy 0.465 by combining Emo, Stem, and Stop (Emo + Stem + Stop).
TeamUNNC [23] performed tokenization, white space removal, and punctuation treatment as individual words. In the second stage, word2vec embedding AraVec [18] were combined with Affective Tweets Weka-package features. Finally, the classification is implemented with a fully connected NN with three dense hidden layers and stochastic gradient descent (SGD) optimizer. The model achieved an accuracy of 0.446, exceeding the baseline model accuracy.
In [24], feature vectors were developed using the Doc2Vec model. Then, the random forest (RF) algorithm was used for classification, and Doc2Vec size varied from 10 to 1000 with an increment of 10 iterations. The number of decision trees used in the forest ranged from 10 to 150, with an increment of 10 with each iteration. The maximum tree depth in the algorithm varied from 2 to 20, with an increment of 1 with each iteration. This model achieved an accuracy of 0.25.
TeamCEN [25] Uses Globe vector representation [26] for representing the words into vectors.; it depends on word-word co-occurrence statistics. Then the presentation of the tweet is made by using aggregated sum and dimensionality reduction of the glove vectors of the words in that tweet. The classification is then done using RF and SVM.
Therefore, we developed a novel BiLSTM deep learning model for multilabel emotion classification in Arabic tweets using the same dataset as the models described above. The model performance surpassed the above models owing to the ability of LSTM networks to handle the sequential data as texts better than other machine learning and deep learning models.