Skip to main content

Arabic aspect sentiment polarity classification using BERT


Aspect-based sentiment analysis (ABSA) is a textual analysis methodology that defines the polarity of opinions on certain aspects related to specific targets. The majority of research on ABSA is in English, with a small amount of work available in Arabic. Most previous Arabic research has relied on deep learning models that depend primarily on context-independent word embeddings (e.g. word2vec), where each word has a fixed representation independent of its context. This article explores the modeling capabilities of contextual embeddings from pre-trained language models, such as BERT, and making use of sentence pair input on Arabic aspect sentiment polarity classification task. In particular, we develop a simple but effective BERT-based neural baseline to handle this task. Our BERT architecture with a simple linear classification layer surpassed the state-of-the-art works, according to the experimental results on three different Arabic datasets. Achieving an accuracy of 89.51% on the Arabic hotel reviews dataset, 73.23% on the Human annotated book reviews dataset, and 85.73% on the Arabic news dataset.


ABSA is not a conventional sentiment analysis but a more difficult task, it is concerned with defining the aspect terms listed in a document, as well as the sentiments expressed against each aspect.

As demonstrated by Hu and Liu [1], Sentiment Analysis (SA) can be studied at three levels: the document level where the task is to identify sentiment polarities (positive, neutral, or negative) that is indicated through-out the entire document. The sentence level is concerned with classifying sentiments relevant to a single sentence. But the document contains many sentences and each sentence may contain multiple aspects with different sentiments, so the document and sentence level sentiment analysis may not be accurate and need another suitable type that makes this fine-grained analysis called ABSA.

ABSA was first launched on SemEval-2014 [2], with the introduction of datasets containing annotated restaurant and laptop reviews. ABSA’s work was largely replicated at SemEval over the next two years [3, 4] as the task has extended into various domains, languages, and challenges. SemEval-2016 provided 39 datasets in 7 domains and 8 languages for the ABSA task, additionally, the datasets were provided with Support Vector Machine (SVM) as a baseline evaluation procedure.

There are three primary tasks common to ABSA, as mentioned in Pontiki et al. [3]: aspect category identification (Task 1), aspect opinion target extraction (Task 2), and aspect polarity detection (Task 3). In this paper, we concentrate only on Task 3.

Neural Network (NN) variations have been applied to many Arabic Natural Language Processing (NLP) applications, such as Sentiment Analysis [5], machine translation [6], named entity recognition [7], and speech recognition [8] and the highest results were obtained.

Using word embedding or distributed representations enhances neural network efficiency and improves the performance of Deep Learning (DL) models, therefore, it has been applied as a preliminary layer in various DL models. Two types of word embeddings are available: contextualized and non contextualized word embeddings. Most of the research available on Arabic ABSA is based on non-contextualized word embedding, such as (word2vec and fast-Text). The main drawback of non-contextualized word embeddings is the presentation of a set of static word embeddings that do not take into account the various contexts in which they may appear.

In contrast, pre-trained language models based on transformers, such as BERT can provide dynamic embedding vectors which change by changing the context of words in the sentences. This made it more encouraging to use BERT in many tasks via fine-tuning on the downstream dataset related to the task.

Despite the fact that Arabic language has a significant number of speakers (estimated to be around 422 million [9]) and is a morphologically rich language, the number of studies in Arabic aspect-based sentiment analysis is still restricted due to its complexity.

The key contributions of the current work are as follows:

  • To our knowledge, this is the first time a transfer learning-based model, such as BERT, has been used to handle Arabic Aspect sentiment polarity classification task.

  • Unlike most current Arabic ABSA approaches, which rely largely on hand-crafted features and external resources, we present an end-to-end model for handling Aspect sentiment polarity classification task that do not need external resources or feature engineering efforts.

  • Examining modeling competence of contextual embedding from pre-trained language models, such as BERT with sentence pair input on Arabic aspect sentiment classification task.

  • A simple BERT based model with a linear classification layer was proposed to solve aspect sentiment polarity classification task. Experiments conducted on three different Arabic datasets demonstrated that, despite the simplicity of our model, it surpassed the state-of-the-art works.

The rest of the paper is organized as follows. “Related work” addresses the related work; “BERT-based-model” illustrates the proposed model; “Data and baseline research” explains the datasets and the baseline procedures; “Experiments” presents results and discussions; finally, “Conclusion and future work” concludes the paper.

Related work

ABSA is an area of SA with research methodologies divided into two approaches: standard machine learning techniques and DL-based techniques.

ABSA’s earliest efforts depended mainly on machine learning methods that focus on handcrafted features like lexicons to train sentiment classifiers [11, 12]. These methods are effective but rely heavily on the efficiency of handcrafted features.

Subsequently, a set of methods based on neural networks consisting of a word embedding layer followed by a neural architecture were developed for the ABSA task and pretty results were achieved [13, 14, 15].

Several attention-based models have been applied to the ABSA task for its ability to focus on important parts of the sentence related to aspects [16, 17]. But a single attention network maybe not enough at capturing key dependency attributes between context and targets especially when the sentence is long, so multiple attention networks were proposed to solve this problem [18, 19].

In order to resolve the question-answering problem, the authors of [20] have developed the memory network concept (MemNN), which was later adopted in several NLP challenges including ABSA [21, 22, 23].

Pre-trained language models have recently played a vital role in many NLP applications, as they can take advantage of the vast volume of unlabeled data to learn general language representations; Elmo [24], GPT [25], and BERT [26] are among the most well-known examples. The authors of Li et al. [27] studied the use of BERT embeddings with several neural models, such as a linear layer, GRU model, self-attention network, and conditional random field to deal with the ABSA task. In Sun et al. [28], the authors have demonstrated that treating ABSA as a sentence pair classification task by building auxiliary sentence as input to the BERT, significantly improved the results.

Gao et al. [29] created a straightforward architecture to evaluate the representational effectiveness of BERT in the ABSA task. Surprisingly, combining BERT with sophisticated neural networks—which previously performed well with embedding representations—does not consistently improve performance over that of the standard BERT-FC implementation. On the other hand, the addition of target data demonstrates steady accuracy enhancement.

Song et al. [30] suggested attentional encoder network for the targeted sentiment classification task. (AEN) compares the text representation using pre-trained GloVe and BERT models and employs a multi-head attention

(MHA) approach for encoding. The authors employed intra-MHA to encode the context and inter-MHA to encode the target word. Experiments and analysis show that the suggested model is effective and lightweight. The strength of BERT embeddings in ABSA was further investigated by Hoang et al. [31], Bhoi and Joshi [32], Xu et al. [33], Nurifan et al. [34].

In general, Arabic ABSA research development is slower than English. Ruder et al. [35] combined aspect embedding with each word embedding and sent the mixture to CNN model to address aspect polarity and category identification tasks. Subsequently, Ruder et al. [36] took advantage of modeling internal information related to the hierarchical review structure in solving the ABSA task.

Two supervised machine learning-based techniques, RNN and SVM supplemented with a set of handcrafted features, were suggested by Al-Smadi et al. [37] and excellent results were achieved using SVM but RNN was faster in terms of training execution time.

To stimulate learning the connection between context words and targets, [38] combined aspect embedding with each word embedding and sent the mixture to LSTM. Then, the attention mechanism was applied to focus on context words related to certain aspects.

Abdelgwad et al. [39] proposed applying IAN network supported with Bi-GRU for extracting target and context representations in a better manner.

Al-Dabet et al. [40] proposed applying deep memory network based on a stack of IndyLSTM supplemented with recurrent attention network to solve aspect sentiment classification task.

Mohammad et al. [41] proposed investigating the use of the Multilingual Universal Sentence Encoder (MUSE) with the GRU model to improve previous outcomes for aspect extraction and aspect polarity classification tasks, in contrast to previous studies that used primarily word- or character-level representations, the MUSE model creates sentence-level embeddings.

Bensoltane and Zaki [42] attempted to investigate the modeling power of BERT in aspect-extraction and aspect-category identification tasks. In addition to examining the effects of adding stronger layers on top of BERT while dealing with the ATE task.

Fadel et al. [43] proposed the BF-BiLSTM-CRF model based on BERT to handle the target extraction task. They combined contextualized string embedding with the BERT language model to improve word representation as embedding layer, and on top of it they stacked two Bilstm layers with crf layer as output layer.

In this paper, a simple BERT-based model with sentence pair input and a linear classification layer was proposed to handle the Arabic aspect sentiment classification task, and state-of-the-art results are achieved on three different Arabic datasets.


BERT (Bidirectional Encoder Representations from Transformers) is a deep learning technique for NLP in which deep neural networks use unsupervised language representation and bidirectional models built on Transformers (a deep learning algorithm in which each output element is linked to each input element and the weightings between them are dynamically determined based on their relation using the attention mechanism). BERT is pre-trained on two separate but related NLP tasks using the bidirectional capability: Masked Language Modeling and Next Sentence Prediction. BERT effectively handles ambiguity, which is the most difficult aspect of understanding natural language, and can reach high degree of accuracy in analyzing languages close to human beings. We first use the BERT portion with L transformer layers to measure the corresponding contextualized representations for the T-length input token sequence. The hidden state representation of the [CLS] token is then fed to the task-specific layer to predict sentiment polarity labels. The overall architecture of the proposed model is depicted in Fig. 1.

Fig. 1
figure 1

Model overall architecture

Auxiliary sentence

Since the BERT model accepts a single or pair of sentences as input, and due to the ability and effectiveness of the BERT model in dealing with sentence pair classification tasks, the ABSA task can be transformed into a sentence-pair classification task using the pre-trained BERT model, with the first sentence containing words that express sentiments within the sentence, and the second sentence containing information related to the aspect (the auxiliary sentence). In other words, the model receives the review sentence as the first sentence and the aspect terms as an auxiliary sentence, and the task would be to determine the sentiments towards each aspect.

BERT as embedding layer

In comparison to the conventional word2vec embedding layer that offers static context-independent word vectors, the BERT layers offer dynamic context-dependent word embeddings by taking the entire sentence as input then calculating the representation of each word by extracting information from the entire sentence.

Inputs are processed in a special way by BERT, where sentences are tokenized first, as usual in every model, additionally, extra tokens are inserted at the start [CLS] and end [SEP] of the tokenized sentence. Then due to the utilization of self attention mechanism that enables BERT models to process tokens in parallel, and to deal with the next sentence prediction task, some special embedding tokens must be added to include all necessary information.

The tokenized sentence with [CLS] and [SEP] tokens are first fed into the embedding layer, which results in tokens embeddings. Those tokens embeddings don’t include position information which added by means of position embeddings. Finally, it must be determined whether each token is associated with sentence A or B, this is possible by creating a new fixed token known as a segment embedding.

Then, the token embeddings, segment embeddings, and position embeddings for each token are combined and feed the mixture into L transformer layers to optimize token level feature. The output representation Hl−1 from the l − 1 transformer layer is fed as input into the next layer l where l [1, L]. The representation Hl = {h1l,...,hlt} at the l-th layer is calculated as follows:

$$H^{l} = Transformer(H^{l - 1} ),$$

where t refers to the number of input tokens. Outputs from the last transformer layer HL are considered as full contextual representations for the input tokens. Then the last hidden state of BERT corresponding to [CLS] token h[CLS] is used as input to the task specific layer.

Design of downstream model

In order to identify sentiment polarities toward aspects, word embeddings extracted from the BERT model are fed into a task-specific layer, a simple linear layer in our case, where both the input hl and the weight W (the learnable parameters) matrices are multiplied with the addition of the bias term b to transform their incoming features to output features in a linear manner. The softmax function is then used to determine the likelihood of each category P.

$${\text{P }} = {\text{ softmax}}\left( {{\text{W}}^{{\text{T}}} {\text{h}}^{{1}}_{{\text{t}}} + {\text{ b}}} \right).$$

Data and baseline research

We conducted our experiments on 3 Arabic datasets: Human Annotated Arabic Dataset of Book Reviews (HAAD), the Arabic news dataset, and the Arabic Hotel Reviews Dataset. The following subsections will describe each dataset in detail.

HAAD dataset

For Arabic ABSA, the HAAD dataset [44] is regarded to be the first available dataset. There are 1513 Arabic book reviews in the HAAD dataset, each was annotated with aspect terms (T1), aspect term polarity (T2), aspect category (T3), and aspect category polarity (T4). This study focuses only on (T2). The SemEval-2014 framework was used to annotate the HAAD dataset. HAAD has a total of 2838 aspect terms, the distribution of which over the sentiment polarity classes (Positive, Negative, Conflict, Neutral) in both training and testing datasets are summarized in Table 1. The dataset was supported by baseline method for each task to compare with.

Table 1 Analysis of the HAAD dataset’s aspect terms and polarities

For the aspect term polarity baseline (T2): each aspect term in the test set was assigned the most common polarity label of that aspect in the training set, if it was found in the training set. If the aspect term was not in the training set, the most common label (Positive, Negative, Conflict, Neutral) in the training set will be used to label it in the test set.

Fig. 2 depicts an XML snapshot that corresponds to an annotated HAAD sentence.

Fig. 2
figure 2

Example of the HAAD dataset schema

The Arabic news dataset

The Arabic news dataset [10] comprises Facebook posts about the 2014 Gaza attacks and their comments. The SemEval-2014 framework was used to annotate the news dataset. There are 2265 news posts and 13628 comments in the Arabic news dataset each classified into three sentiment classes: Positive, negative, and neutral. Each post was manually annotated with aspect terms (T1), aspect term polarity (T2), aspect category (T3), and aspect category polarity (T4). Each comment was annotated with comment category detection (T5), and comment category polarity estimation (T6). This study focuses only on (T2). There are 9655 different aspects each related to one of four categories: Plans, Results, Peace, and Parties.

The dataset was supported by baseline method for each task to compare with. For the aspect term polarity baseline (T2): the same baseline method previously applied in task 2 on the Human Annotated Arabic dataset of Book Reviews, was applied as a baseline procedure in this dataset.

Figure 3 depicts an XML snapshot that corresponds to an annotated Arabic news sentence.

Fig. 3
figure 3

Example of the Arabic news dataset schema

The Arabic hotel reviews dataset

The Arabic Hotel Reviews Dataset was presented in SemEval-2016 in support of ABSA’s multilingual task involving work in 8 languages and 7 domains [3]. There are 19,226 training tuples and 4802 testing tuples in the dataset. The XML schema was used to annotate the dataset. The dataset consists of a set of reviews, each review contains a number of sentences, with each sentence having three tuples: aspect-category, OTE, and aspect polarity. Figure 4 depicts an XML snapshot that corresponds to an annotated Arabic hotel review.

Fig. 4
figure 4

Example of the Arabic hotels dataset schema

The dataset supports both text-level annotations (2291 reviews) and sentence-level annotations (6029 sentences). This study focuses only on sentence-level tasks. The dataset’s scale and distribution are explained in Table 2. Also, SVM classifier supported with N-gram features was applied to the Arabic hotel review dataset for various ABSA tasks and was considered as baseline research to compare with.

Table 2 The scale and distribution of the Arabic hotels reviews dataset


Our models was trained and tested separately on 3 different Arabic datasets. When training the models on each dataset, 70 % of the dataset was used, for validation 10% was used and for testing 20%. The Pytorch library was used to implement all the neural networks. All models computations were carried out independently on the GeForce GTX 1080 Ti GPU. The following subsections explain model training in detail.

Evaluation method

In order to determine the effectiveness of the proposed model, the accuracy metric was adopted, which was defined as follows:

$$Accuracy = \frac{correct\;predictions\;number}{{overall\;samples\;number}}.$$

Accuracy measures the number of correct samples to all samples, higher accuracy indicates better performance.


We evaluated our BERT-based models with various preprocessing steps such as stemming, normalization, and stop words removal and discovered that there was no improvement in the outcomes, but rather that they were worse. This can be explained by the fact that these processes have a negative impact on the contextual meaning of the words learnt using the BERT. Consequently, we chose not to apply any of these procedures to the datasets.

Hyperparameters setting

The pretrained ”Arabic BERT” [45] was used, which was previously trained on about 8.2 billion words of MSA and dialectical Arabic. The BERT-Base model consisting of 12 hidden layers, 12 attention heads, and hidden size of 768 has been particularly used. Adam optimizer was used to fine-tune the model on the downstream task with a learning rate of 1e-5, dropout rate of (0.1 for Arabic hotel reviews dataset and 0.3 for both HAAD and Arabic news datasets), hidden dropout probability of 0.3, batch size of (24 for Arabic hotel reviews dataset, 16 for HAAD and 64 for Arabic news dataset), and number of epochs equal to 10.

Comparison models

LSTM just one LSTM is used for sentence modeling, with the last hidden states used as a representation for final classification.

TD-LSTM it employs two LSTM networks to model the target’s prior and subsequent contexts to provide target-dependent representation for sentiment prediction [14].

INSIGHT-1 combined aspect embedding with each word embedding and fed the resulting mixture to CNN for Aspect sentiment analysis [35].

HB-LSTM developed hierarchical bidirectional LSTM for ABSA, that can take advantage of hierarchical modeling information of the review in improving performance [36].

AB-LSTM-PC combined aspect embedding with each word embeddings to motivate learning the connections between context words and targets, then applied the attention mechanism for focusing on context words related to specific aspects [38].

IAN-BGRU Used Bi-GRU to extract hidden representations from targets and context then applied two associated attention networks on those representations to model targets and their context in an interactive manner [39].

MBRA made use of external memory network containing a stack of bidirectional lndy-lstms consisted of 3 layers, and a recurrent attention mechanism to deal with complex sentence structures [40].

Results and discussion

Table 3 shows that simply adding a basic linear layer on top of BERT, outperformed the baselines and achieved better results than many previous Arabic DL models. This may be justified by the superior ability of the BERT model to extract semantic representations compared to the context-free word embedding models. In particular, during the training phase, BERT can learn information from both directions and is more adept at dealing with the OOV problem than other context-free word embedding models. Moreover, the use of the auxiliary sentence further improved the results of the BERT model, which is apparent in the higher results of the BERT-pair model compared to the BERT-single, achieving state-of-the-art results. This is evidence of the effectiveness of Bert’s contextual representations at encoding associations between aspect terms and context words.

Table 3 Models accuracy results on the Arabic hotel reviews dataset, Arabic news dataset, and HAAD

Overfitting issue

Despite the use of Bert-base model “the smallest pretrained version of Bert”, the number of parameters seemed to be large (110M) for this task, which made us wonder: Is our model overfitting the downstream data? so, we trained the BERT-linear model on three different Arabic datasets for 10 epochs and noticed the oscillating accuracy results on the development sets after each epoch. As indicated in Fig. 5, the accuracy results of the development sets are relatively stable and do not decrease significantly as the training progresses, which reveals that the BERT model is extremely robust to overfitting.

Fig. 5
figure 5

The accuracy results on the development set for three different Arabic datasets

Finetuning or not

We investigated the effect of fine-tuning on the final results by keeping the parameters of the BERT component fixed during the training phase. Figure 6 shows a simple comparison between the performance of the model when fine-tuning and when setting the parameters fixed. The general purpose BERT representation is obviously far from acceptable for the downstream task, and task-specific fine-tunning is necessary to use BERT’s capabilities to increase performance.

Fig. 6
figure 6

Effect of fine-tuning BERT on different datasets

Case study

The assessment results demonstrated the superiority of our model over traditional deep learning models that employ context-free word embeddings and feature-based machine learning techniques.

Table 4 shows the ability of BERT to detect true polarity labels when the sentence contains one or more targets with the same polarity. As shown in this example “The furniture is very old and the service is poor and I would not recommend anyone to stay there”, there are two aspects (furniture, service) with the same polarity (Neg, Neg), BERT models could identify the negative polarity toward both aspects.

Table 4 Case study of different models on the Arabic hotel reviews dataset

Furthermore, by including target information as an auxiliary sentence, our BERT-pair model was able to determine the true polarity label for each target, particularly when the sentence has many targets with conflict polarities. For example, in this review sentence “The service is bad but the hotel is very nice and beautiful”, there are two different aspects (Service,Hotel) with two different sentiments (Neg and Pos). Despite this challenge, our model is capable of accurately identifying and determining the expressed polarity toward each aspect. The ability to determine sentiment polarity when the sentence contains shifting words (e.g. not) is one of the most common problems faced by most SA algorithms. The issue is that, shifting words cause the sentiment polarity of a review or aspect to change completely. To properly classify sentiment polarity when the sentence contains shifting words, dedicated dictionaries to these words were used, as in most well-known SA methods, but our model is capable of determining this without the use of any other resources, such as lexicons. For example, in this sentence “The hotel location is not good for the elderly” our model can easily determine the sentiment polarity toward (Hotel#Location) as Negative. But employing negation does not always result in a change in the sentiment polarity. For instance, our model is unaffected by the negation and can classify the aspect Hotel#Location into Positive sentiment polarity in the review sentence “The hotel location is not only suitable for elderly people, but also is very close to most of the city historical sights”. This is evidence of the effectiveness of Bert’s contextual representations at encoding associations between targets’ and context words.

Conclusion and future work

In this paper, we explored the modeling capabilities of contextual embeddings from the pre-trained BERT model with the benefit of sentence pair input on the Arabic aspect sentiment polarity classification task. Specifically, we examined the incorporation of the BERT embedding component with a simple linear classification layer and extensive experiments were performed on three Arabic datasets. The experimental results show that despite the simplicity of our model, it surpassed the state-of-the-art works, and is robust to overfitting.

For future work, we intend to enhance the BERT-linear-pair model by replacing the linear layer on top of the BERT embedding layer, with recurrent neural networks, self-attention networks or other sophisticated networks such as AEN. We also plan to use transformer-based models in other Arabic ABSA tasks such as aspect extraction and aspect category detection.

Availability of data and materials

The datasets analyzed during the current study are available in: HAAD: Arabic hotel reviews dataset: Arabic news dataset:


  1. Hu M, Liu B. Mining and summarizing customer reviews. In: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining. 2004; p 168–177.

  2. Pontiki M, Galanis D, Pavlopoulos J, Papageorgiou H, Androutsopoulos I, Manandhar S. SemEval-2014 task 4: aspect based sentiment analysis. In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), 2014; p. 27–35, Dublin, Ireland. Association for Computational Linguistics.

  3. Pontiki M, Galanis D, Papageorgiou H, Androutsopoulos I, Manandhar S, Al-Smadi M, Al-Ayyoub M, Zhao Y, Qin B, De Clercq O, et al. Semeval-2016 task 5: Aspect based sentiment analysis. In: International workshop on semantic evaluation. 2016; p 19–30.

  4. Pontiki M, Galanis D, Papageorgiou H, Manandhar S, Androutsopoulos I. Semeval-2015 task 12: Aspect based sentiment analysis. In: Proceedings of the 9th international workshop on semantic evaluation (SemEval 2015), 2015; p. 486–495.

  5. Kwaik KA, Saad M, Chatzikyriakidis S, Dobnik S. Lstm-cnn deep learning model for sentiment analysis of dialectal arabic. In: International Conference on Arabic Language Processing, 2019; p. 108–121. Springer.

  6. Ameur MSH, Meziane F, Guessoum A. Arabic machine transliteration using an attention-based encoder-decoder model. Procedia Computer Science. 2017;117:287–297.

    Article  Google Scholar 

  7. Khalifa M, Shaalan K. Character convolutions for Arabic named entity recognition with long short-term memory networks. Computer Speech and Language. 2019;58:335–346.

    Article  Google Scholar 

  8. Algihab W, Alawwad N, Aldawish A, AlHumoud S. Arabic speech recognition with deep learning: A review. In: International Conference on Human-Computer Interaction, 2019; p 15–31. Springer.

  9. Alsharhan E, Ramsay A. Investigating the effects of gender, dialect, and training size on the performance of arabic speech recognition. Language Resources and Evaluation. 2020;54(4):975–998.

    Article  Google Scholar 

  10. Al-Sarhan H, Al-So’ud M, Al-Smadi M, Al-Ayyoub M, Jararweh Y. Framework for affective news analysis of Arabic news: 2014 gaza attacks case study. In 2016 7th International Conference on Information and Communication Systems (ICICS). 2016; p 327–332. IEEE.

  11. Jiang L, Yu M, Zhou M, Liu X, Zhao T. Target-dependent Twitter sentiment classification. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, 2011; p 151–160, Portland, Oregon, USA. Association for Computational Linguistics.

  12. Kiritchenko S, Zhu X, Cherry C, Mohammad S. Nrc-Canada-2014: Detecting aspects and sentiment in customer reviews. In: Proceedings of the 8th international workshop on semantic evaluation (SemEval 2014), 2014; p. 437–442.

  13. Poria S, Cambria E, Gelbukh A. Aspect extraction for opinion mining with a deep convolutional neural network. Knowledge-Based Systems. 2016;108:42–49.

    Article  Google Scholar 

  14. Tang D, Qin B, Feng X, Liu T. Effective lstms for target-dependent sentiment classification. 2015; arXiv preprint arXiv:1512.01100.

  15. Zhang M, Zhang Y, Vo D-T. Gated neural networks for targeted sentiment analysis. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2016; 30.

  16. Wang Y, Huang M, Zhu X, Zhao L. Attention-based lstm for aspect-level sentiment classification. In: Proceedings of the 2016 conference on empirical methods in natural language processing, 2016; p 606–615.

  17. Yang M, Tu W, Wang J, Xu F, Chen X. Attention based lstm for target dependent sentiment classification. In: Proceedings of the AAAI Conference on Artificial Intelligence, 2017; 31.

  18. Chen P, Sun Z, Bing L, Yang W. Recurrent attention network on memory for aspect sentiment analysis. In Proceedings of the 2017 conference on empirical methods in natural language processing, 2017; p 452–461.

  19. Ma D, Li S, Zhang X, Wang H. Interactive attention networks for aspect-level sentiment classification. 2017; arXiv preprint arXiv:1709.00893.

  20. Weston J, Chopra S, Bordes A. Memory networks. 2014; arXiv preprint arXiv:1410.3916.

  21. Tang D, Qin B, Liu T. Aspect level sentiment classification with deep memory network. 2016; arXiv preprint arXiv:1605.08900.

  22. Tay Y, Tuan LA, Hui SC. Dyadic memory networks for aspect-based sentiment analysis. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, 2017; p 107–116.

  23. Tay Y, Tuan LA, Hui SC. Learning to attend via word-aspect associative fusion for aspect-based sentiment analysis. In: Proceedings of the AAAI conference on artificial intelligence. 2018; 32.

  24. Peters ME, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L. Deep contextualized word representations. 2018; arXiv preprint arXiv:1802.05365.

  25. Radford A, Narasimhan K, Salimans T, Sutskever I. Improving language understanding by generative pre-training. 2018.

  26. Devlin J, Chang M-W, Lee K, Toutanova K. Bert: Pretraining of deep bidirectional transformers for language understanding. 2018; arXiv preprint arXiv:1810.04805.

  27. Li X, Bing L, Zhang W, Lam W. Exploiting bert for end-to-end aspect-based sentiment analysis. 2019; arXiv preprint arXiv:1910.00883.

  28. Sun C, Huang L, Qiu X. Utilizing bert for aspect-based sentiment analysis via constructing auxiliary sentence. 2019; arXiv preprint arXiv:1903.09588.

  29. Gao Z, Feng A, Song X, Wu X. Target-dependent sentiment classification with bert. IEEE Access. 2019;7:154290–154299.

    Article  Google Scholar 

  30. Song Y, Wang J, Jiang T, Liu Z, Rao Y. Attentional encoder network for targeted sentiment classification. 2019; arXiv preprint arXiv:1902.09314.

  31. Hoang M, Bihorac OA, Rouces J. Aspect-based sentiment analysis using bert. In Proceedings of the 22nd Nordic Conference on Computational Linguistics. 2019; p. 187–196.

  32. Bhoi A, Joshi S. Various approaches to aspect-based sentiment analysis. 2018; arXiv preprint arXiv:1805.01984.

  33. Xu H, Liu B, Shu L, Yu PS. Bert post-training for review reading comprehension and aspect-based sentiment analysis. 2019; arXiv preprint arXiv:1904.02232.

  34. Nurifan F, Sarno R, Sungkono KR. Aspect based sentiment analysis for restaurant reviews using hybrid elmo-wikipedia and hybrid expanded opinion lexicon-senticircle. International Journal of Intelligent Engineering and Systems. 2019;12(6):47–58.

    Article  Google Scholar 

  35. Ruder S, Ghaffari P, Breslin JG. Insight-1 at semeval-2016 task 5: deep learning for multilingual aspect-based sentiment analysis. 2016; arXiv preprint arXiv:1609.02748.

  36. Ruder S, Ghaffari P, Breslin JG. A hierarchical model of reviews for aspect-based sentiment analysis. 2016; arXiv preprint arXiv:1609.02745.

  37. Al-Smadi M, Qawasmeh O, Al-Ayyoub M, Jararweh Y, Gupta B. Deep recurrent neural network vs. support vector machine for aspect-based sentiment analysis of Arabic hotels’ reviews. Journal of Computational Science. 2018;27:386–393.

    Article  Google Scholar 

  38. Al-Smadi M, Talafha B, Al-Ayyoub M, Jararweh Y. Using long short-term memory deep neural networks for aspect-based sentiment analysis of arabic reviews. International Journal of Machine Learning and Cybernetics. 2019;10(8):2163–2175.

    Article  Google Scholar 

  39. Abdelgwad MM, Soliman THA, Taloba AI, Farghaly MF. Arabic aspect based sentiment analysis using bidirectional gru based models. Journal of King Saud University-Computer and Information Sciences. 2022;34(9): 6652–6662

  40. Al-Dabet S, Tedmori S, Mohammad A-S. Enhancing arabic aspect-based sentiment analysis using deep learning models. Computer Speech and Language. 2021; 101224.

  41. Mohammad A-S, Hammad MM, Sa’ad A, Saja A-T, Cambria E. Gated recurrent unit with multilingual universal sentence encoder for Arabic aspect-based sentiment analysis. Knowledge Based Systems. 2021; 107540.

  42. Bensoltane R, Zaki T. Towards arabic aspect-based sentiment analysis: a transfer learning-based approach. Social Network Analysis and Mining. 2022;12(1):1–16.

    Article  Google Scholar 

  43. Fadel AS, Saleh ME, Abulnaja OA. Arabic aspect extraction based on stacked contextualized embedding with deep learning. IEEE Access. 2022;10:30526–30535.

    Article  Google Scholar 

  44. Al-Smadi M, Qawasmeh O, Talafha B, Quwaider M. Human annotated Arabic dataset of book reviews for aspect based sentiment analysis. In: 2015 3rd International Conference on Future Internet of Things and Cloud, 2015; p 726–730. IEEE.

  45. Safaya A, Abdullatif M, Yuret D. KUISAIL at SemEval-2020 task 12: BERT-CNN for offensive speech identification in social media. In: Proceedings of the Fourteenth Workshop on Semantic Evaluation, 2020; p. 2054–2059, Barcelona (online). International Committee for Computational Linguistics.

Download references


Not applicable.


Open access funding provided by The Science, Technology & Innovation Funding Authority (STDF) in cooperation with The Egyptian Knowledge Bank (EKB). This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Author information

Authors and Affiliations



MMA: Have done all the work in the paper. THAS: Review and editing, Supervision. AIT: Review and editing. All authors have read and approved the final manuscript.

Corresponding author

Correspondence to Mohammed M. Abdelgwad.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

There is no competing interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Abdelgwad, M.M., Soliman, T.H.A. & Taloba, A.I. Arabic aspect sentiment polarity classification using BERT. J Big Data 9, 115 (2022).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: