From: Social media text analytics of Malayalam–English code-mixed using deep learning
Data set | Methodology | Limitations | Results |
---|---|---|---|
Tamil and Malay alam [33] | A sub-word level to-kenizer, a text rep resentation layer, and a transformer model for classification | Could not identify sarcasm used in negative comments | F1-score of 0.58 and 0.66 average-F1 for Tamil and Malay- alam code-mixed datasets |
Hindi-English and Spanish–English data sets [34] | Ensemble of self-attention-based Long Short Term Mem- ory (LSTM), and convolutional neural network (CNN) | Data imbalances are not handled | F1-score of 0.707 and 0.725 respectively |
Hindi-English [35] | LSTM network, with character-level embedding and a FastText embedding | Issue in short sentences which has unclear semantic structure | F1-score of 0.679 |
English and Spanish [36] | Multilingual XLM-R | Computationally intensive and failed to see the patterns in the results | F1-score of 0.537 |
Hinglish [37] | One-Dimensional (1-D) convolution and 1-D max-pooling, self-attention mech- anisms, and finally, the dense layer | Lack of good pretrained models and hyper-parameter optimization | F1-score of 0.684 |