Social media text analytics of Malayalam–English code-mixed using deep learning

Journal of Big Data

Table 1 Recent references in a nutshell

Data set	Methodology	Limitations	Results
Tamil and Malay alam [33]	A sub-word level to-kenizer, a text rep resentation layer, and a transformer model for classification	Could not identify sarcasm used in negative comments	F1-score of 0.58 and 0.66 average-F1 for Tamil and Malay- alam code-mixed datasets
Hindi-English and Spanish–English data sets [34]	Ensemble of self-attention-based Long Short Term Mem- ory (LSTM), and convolutional neural network (CNN)	Data imbalances are not handled	F1-score of 0.707 and 0.725 respectively
Hindi-English [35]	LSTM network, with character-level embedding and a FastText embedding	Issue in short sentences which has unclear semantic structure	F1-score of 0.679
English and Spanish [36]	Multilingual XLM-R	Computationally intensive and failed to see the patterns in the results	F1-score of 0.537
Hinglish [37]	One-Dimensional (1-D) convolution and 1-D max-pooling, self-attention mech- anisms, and finally, the dense layer	Lack of good pretrained models and hyper-parameter optimization	F1-score of 0.684