Part of speech tagging: a systematic review of deep learning and machine learning approaches

Chiche, Alebachew; Yitagesu, Betselot

doi:10.1186/s40537-022-00561-y

Journal of Big Data

Table 1 Strength and weakness of the proposed methodologies

From: Part of speech tagging: a systematic review of deep learning and machine learning approaches

Study	Strength	Weakness
Kumar et al. [72]	Propose a deep learning approach for POS tagging and compares the deep learning sequential models to find the suitable method for POS tagging at word level and character level. The tagged corpus was experimented and evaluated with different models like bidirectional LSTM (BLSTM), recurrent neural network (RNN), long short-term memory (LSTM), and gated recurrent units (GRU). On the other hand, to get a better result, experiments were conducted using both character and word levels at different hidden states. The experimental result shows that BLSTM achieves the highest evaluation metrics. It achieves 0.8748 for precision, an f1-measure of 0.8739, 0.8757 for recall, and 0.8757 for accuracy	The proposed model is tested with a small corpus size. The proposed model is misclassified when there is the presence of unwanted symbols appended to words in both words and character level embedding. The performance of the proposed model doesn't compare with the state-of-the-art works
Mohammed [51]	Propose an efficient statistical POS tagger for the Somali language by adopting HMM and CRF and neural network methods of machine learning approaches. They prepare the corpus, which consists of 14,369 tokens representing 1234 sentences and 24 tagsets. All POS tagger scores 87.51% average accuracy using a tenfold cross-validation	The corpus used for the experiment is not a standard corpus. And also, the size of the data is not enough to train algorithms. The accuracy of the tagger is also not good compared to previous works
Besharati et al. [54]	They proposed a multi-layer perceptron and long short-term memory neural network approaches, which are an efficient approach on their high generality capability, to assign the appropriate tags for both out of vocabulary and in-vocabulary words. This hybrid model is better in improving the prediction accuracy to 97.29%	Since the dataset used is not enough for training a neural network, the proposed approach was not achieved high accuracy to extract word vectors
Hirpssa and Lehal [39]	A machine learning approach has been proposed to develop the Amharic POS tagger. They compared HMM-based Trigrams'n'Tags (TnT), Conditional Random Field (CRF), and Naive Bays (NB) based tagger. They have used the existing ELRC corpus with 210 K token by incorporating a manually tagged corpus with 31 tags. The experiment result shows that CRF-based Amharic POS tagger achieved an average accuracy of 94.08%, which is a better performance compared to others	However, CRF-based taggers performed better; their performance is not significantly improved compared with state-of-the-art CRF-based POS taggers. The amount and type of feature set are not enough to improve the performance of the tagger
Anastasyev et al. [59]	A Feedforward neural network method was proposed for character-level word representation to provide better results in terms of speed and performance. And also deployed loss forces as a model to learn the dependencies to make the learning process. The proposed model shows an accuracy of 96.46%, 97.97%, and 95.64% on modern literature, news, and Vkontakte, respectively	The final results achieved by the proposed approach are not significantly better than previous works. And the model also has poorer performance than the best model on the deployed data set
Mishra [66]	They proposed a machine learning and neural network model to implement a statistical POS tagger for Kannada. The strength of this work is that they have developed a generic POS tagger, then compared with the performances of various modeling techniques, and also both character and word embedding are explored for Kannada POS tagging. The proposed model outperforms the previous Kannada POS tagger by 6%	From the result, it was observed that there are more ambiguities in predictions like ambiguities between finite verbs and common nouns; common nouns and adverbs. These problems are faced due to the inconsistency in the labeling of training data. Although the model outperforms the state of arts POS tagger in Kannada, the performance of the model achieved is 92.31% accuracy which is much less than works in the POS tagging field
Gashaw and Shashirekha [46]	They have examined and obtained significant performance differences compared to previous works using morphological knowledge, previously used dataset, similar feature extraction, and parameter tuning by deploying a grid search and tagging algorithms. And also used different corpus for experimenting the algorithms. The proposed approach scores an average accuracy of 86.44 for ELRC, 95.87 for ELEC-Extended, and 92.27 for ELRCQB tagsets. The experimental result shows extending the tagset can increase the accuracy by 9.43, which is a significant performance	The developed tagsets are not verified by the linguistic expert. So, the performance of the tagger was affected. For instance, the tagger has a problem in identifying the name of people and places
Khan et al. [33]	Developed an Urdu POS tagger using both machine and deep learning approaches under language-dependent feature sets with two datasets, which then compared the effectiveness of both approaches. Based on the experiments, the CRF-based model performs better compare to RNNs, SVM, and n-gram techniques on CLE dataset, whereas the DRRN approach outperforms others with BJ dataset	The researchers experimented with the models with labeled datasets and also used simple feature sets, which work easily with the simplest algorithms
Singh et al. [56]	They proposed deep learning approaches to develop a Hindi POS tagger. They have experimented with a large corpus consisting of 50,000 hind-tagged sentences. Based on the experiment, the proposed model achieved 97.05% average tagging accuracy	The study uses a manually annotated corpus for training and does not compare with previously proposed works
\Baig et al. [70]	They proposed a statistical data-driven method to design and implement an Urdu POS tagging model using Urdu tweets. They combined the existing annotated tweets corpus with new tagsets constructed for POS tagging. They have also solved a shortage of corpus using a supervised bootstrapping technique. The new POS tagger shows an accuracy of 93.8% precision, 92.9% recall, and 93.3% F-measure	The corpus used in the experiments is not a standard corpus and is prepared from tweeter only. The other limitation is the performance of the new model is not compared with the state of arts
Bonchanoski and Zdravkova [71]	Proposed an automatic POS tagger for Macedonian language. One of the strengths of the proposed work is that they used a combined dataset of available online lexicon with a self-created crowdsourcing corpus. They implemented and compared TnT tagger, averaged perceptron, cyclic dependency network, and guided learning framework for tagging. But they have not achieved a better result in terms of tagging accuracy. The accuracy that was achieved is 96.37%, which is reaching a result comparable to more experimented languages	They compare only the proposed models, but it would be better to compare previously proposed works. And also, the corpus was created using crowdsourcing, so the dataset needs to be checked by experts
Sarbin et al. [61]	Long Short-Term Memory (LSTM), and Bi-directional Long Short-Term Memory (BiLSTM), Simple Recurrent Neural Network (RNN), and Gated Recurrent Unit (GRU) based POS tagger for Nepali was implemented and compared. The algorithms are trained and tested on Nepali tagset; accordingly Bi-directional LSTM performs better than the other three algorithms with a testing accuracy of 97.27%	The researchers use small datasets for training and testing sets compared to the previous works. It is not compared with previous works
Kumar et al. [69]	Proposed a DL-based POS tagging for Malayalam twitter data using sequential deep learning methods such as Bidirectional LSTM (BLSTM), Recurrent Neural Network (RNN), Long Short-Term Memory (LSTM), and Gated Recurrent Units (GRU). They trained the model to tag tweets both at word-level and character-level. And also, the models are trained by changing the hidden states, in which they found that when the hidden states increase, the performance of the tagger increases. Bidirectional LSTM achieves better tagging accuracy of 92.58%	The researchers use unrefined and rough tagsets, which are previously available tagsets from previous works. Besides, the corpus used is not enough to train deep learning algorithms and the tagger is only developed on the tweeter corpus
Kabir et al. [73]	They build a Bengali POS tagger using the Deep Learning approach, particularly deep belief network. They have created a word dictionary for POS tagging by using the corpus. The dictionary constructed from POS tagging can minimize the ambiguity of tagging processes. The deep learning-based Bengali POS tagger scores 93.33% accuracy on the corpus	The study uses a corpus prepared by Microsoft Research India as a part of the Indian Language Part-of-Speech Tagset (IL-POST) project. The corpus was prepared based on IL-POST for Indian Languages. The corpus used for experiments is not enough. Since there is a class imbalance in the corpus, there are zero accuracies for some classes, and also, the proposed model didn't compare with previous works
Alharbi et al. [1]	They proposed a POS tagging for Arabic Gulf Dialect using Bi-LSTM. Support Vector Machine (SVM) classifier and bi-directional Long Short-Term Memory (Bi-LSTM) machine learning methods are applied for sequence modeling purposes. The POS tagging model was improved from a 75% state-of-the-art POS tagger to over 91% accuracy for Gulf dialect using a Bi-LSTM. Also, they prepare a POS tagging dataset and multiple sets of features for testing the models	The models are tested on the existing dataset, which is not suitable for the experiment. And also, dataset was not verified by language experts. The feature sets are constructed without consultation with language experts
Meftah et al. [65]	They proposed a neural network-based POS tagging for social media content such as Facebook, tweets and forums. They have used the transfer learning technique to alleviate the unavailability of enough annotated corpus created from social media content. The POS tagging model was developed based on five languages, namely English, German, French, Italian, and Spanish. Also, the proposed model used both word-level and character representations by combining pre-trained embedding like GloVe, Word2Vec, and FastText for word-level representation. A cross-task transfer learning on those multiple social media languages was efficient. The proposed approach achieves 91.03%, 90.33%, and 89.66% for Spanish, German and Italian, respectively	The use of rough texts directly taken from social media that might affect the performance of the tagger model. Better to take a thorough pre-processing task on the texts. And the use of one language corpus using transfer learning to develop a POS tagger model for another language might not give an expected result
Argaw [55]	Develop POS tagging for Amharic language using a deep learning approach. They experimented with three algorithms such as bidirectional Long Short-Term Memory (Bi-LSTM), Long Short-Term Memory (LSTM), and recurrent neural networks (RNNs) to develop the model. An automatically generated neural word embedding is used as a feature to address the use of hand-crafted features for developing a POS tagging model. The empirical result shows 93.67% F-measure using Bi-LSTM recurrent neural network	The study uses the existing corpus used in previous related works in the Amharic language, which is a medium-sized corpus not enough for deep learning approaches. And also, the corpus used in the study is of lower quality. They didn't compare the performance of the model with the previous works experimented with hand-crafted features
Deshmukh and Kiwelekar [14]	Propose a bidirectional long short-term memory (Bi-LSTM) and deep learning model to develop a POS tagging for Marathi language text. They tried to develop Bi-LSTM and deep learning-based POS tagging models based on three folds validation. Based on the experiment Bi-LSTM and deep learning model achieved an accuracy of 97% and 85%, respectively. And also, the proposed BI-LSTM and deep learning models are compared with machine learning techniques like naïve Bayes, Hidden Markov model, K nearest neighbor (KNN), random forest, conditional random fields, and neural network on the same dataset	The experiments are conducted with 1500 sentences consisting of 10,115 words which are quite smaller for modeling deep learning and Bi-LSTM methods. And the proposed models are not compared with the state-of-art works in the same field of study
Prabha et al. [60]	Develop a deep learning-based POS tagger for Nepali language using Long Short-Term Memory Networks (LSTM), Gated Recurrent Unit (GRU), Recurrent Neural Network (RNN), and their bidirectional variants. They have deployed the word-level representations. Bi-directional versions of the POS tagger model achieved the maximum performance scores, which shows significant improvement and performs better than the previous POS taggers with 99% tagging accuracy	The corpus used for this research is from the Center for Research in Urdu Language Processing (CRULP). The corpus used for developing a Nepali POS tagger is translated from English i.e., PENN Treebank corpus. The use of different language resources for building other languages POS tagger models may not be advisable because of the difference in nature of languages
Srivastava et al. [63]	Presented an unsupervised DL-based POS tagging for Sanskrit language. Instead of traditional Word2Vec implementations, character level n-grams implementation was used. They use a BiLSTM autoencoder, and a POS tagging accuracy of 93.2% is achieved	They used much less annotated Sanskrit corpus, which is 115,000 words prepared by JNU. The corpus used is not sufficient to experiment unsupervised deep learning approaches
Attia et al. [62]	Develop Awngi language parts of speech tagger using Hidden Markov Model (HMM). They created 23 hand-crafted tag sets and collected 94,000 sentences. A tenfold cross-validation mechanism was used to evaluate the performance of the Awngi HMM POS tagger. The empirical result shows that uni-gram and bi-gram taggers achieve 93.64% and 94.77% tagging accuracy, respectively	The tagger is trained with only 23 hand-crafted tagsets. And the corpus used was the first manually annotated corpus which needs expert knowledge to come with better results. And the POS tagger model doesn't compare with the previous related works experimented with using HMM
Patoary et al. [74]	A DL-based POS tagging model for the Bengali language is proposed, basically using suffixes of the language. The experiment is conducted with a labeled corpus containing 2927 words. The proposed DL-based POS tagging model achieved an accuracy of 93.90%. And also, the deep learning model achieved better accuracy compared with previous models like rule-based and global linear models. Moreover, the proposed model is incorporated in python for the open-source Bengali NLP toolkit	One of the shortcomings of this work is that the corpus used for the experiment is not enough for modeling deep learning. The performance of the proposed model is evaluated using accuracy only. Hence the performance of the model may vary when tested with other performance metrics such as f-measure, recall, and precision
Gopalakrishnan et al. [58]	Implement a deep neural network-based POS tagger for the biomedical domain. The experiment is conducted using LSTM, RNN, and GRU algorithms. The POS tagging is evaluated with three algorithms to come up with a better-performing POS tagging model. And Bi-directional LSTM, Bi-directional, RNN, and Bi-directional GRU were also experimented. As experiment reveals that Bi-directional LSTM, Bi-directional, RNN, and Bi-directional GRU scores better accuracy than simple LSTM, RNN, and GRU deep learning models. Since these algorithms are able to access and understand more context information from the dataset, they achieved better performance. The proposed model has achieved 94.80% of detection accuracy	All experiments are conducted on the same dataset, which is publicly available for researchers. The proposed model is not compared with the previous states-of-art works conducted with a similar domain. Also, it is better to experiment with other algorithms which may achieve a better result than the proposed model
Bahcevan et al. [57]	Proposed a deep Neural Network Language Models for Turkish to overcome the POS tagging problem. The experiment is conducted using Long Short-Term Memory (LSTM) and Recurrent Neural Network (RNN). The performance comparison with the state-of-art methods is conducted. The experiment results reveal that LSTM outperforms RNN with an 88.7% f-measure metric	Though Long Short-Term Memory (LSTM) outperforms Recurrent Neural Network (RNN) with f-measure metric, the performance of the LSTM is not enough. It is better to experiment with other methods and compare them
Akhil et al. [75]	A POS tagger is proposed using deep learning approaches for Malayalam. The experiments are conducted on a real dataset. The experiments are conducted using Gated Recurrent Unit (GRU), Long Short-Term Memory (LSTM), Recurrent Neural Networks (RNN), and Bi-directional Long Short-Term Memory (BLSTM) for implementing POS tagger. The proposed model compared with previous models and outperformed them. So, the model achieves 0.9878 of precision, 0.9788 of recall, and 0.9832 f-measure	The tagged corpus size is not enough for modeling the deep learning-based tagger. And also, the model is evaluated using precision, recall, and f-measure, but better to evaluate with accuracy also

Back to article page