Techniques used | MNB | Logistic regression | LinearSVC | KNN | |
---|---|---|---|---|---|
Baseline models | 86.0 | 84.0 | 84.0 | 77.6 | |
1 | Remove non-Arabic letters | 85.4 − | 83.6 − | 82.8 − | 76.7 − |
2 | Remove numbers | 85.5 − | 82.9 − | 84.3 | 77.4 − |
3 | Remove usernames, external links, and hashtags | 85.2 − | 83.2 − | 83.4 − | 78.1 + |
4 | Remove punctuation | 86.0 | 84.0 | 84.0 | 77.6 |
5 | Remove diacritics | 86.0 | 83.6 | 83.8 − | 76.6 − |
6 | Remove repeated characters | 86.4 + | 84.3 + | 84.9 + | 79.2 + |
7 | Remove duplicate letters | 86.0 | 83.2 − | 84.1 + | 78.7 + |
8 | Remove Kashida | 86.3 + | 83.8 − | 84.6 + | 78.0 + |
9 | Replace أ,إ, and آ with ا | 85.8 − | 83.6 − | 84.1 + | 77.4 − |
10 | Replace ى with ي | 86.7 + | 84.0 | 84.6 + | 77.9 + |
11 | Replace ي and ئ with ى | 86.8 + | 84.0 | 84.2 + | 78.0 + |
12 | Replace ىء and ئ with ي | 86.0 | 83.0 − | 84.3 + | 77.8 + |
13 | Replace ؤ and ئ with ء | 85.8 − | 83.8 − | 83.9 | 77.7 + |
14 | Replace ئ with ى | 86.0 | 84.0 | 84.3 + | 77.6 |
15 | Replace ة with ه | 86.7 + | 83.8 − | 84.8 + | 77.1 − |
16 | Replace چ with ج | 86.0 | 84.0 | 84.0 | 77.6 |
17 | Replace ڤ with ف | 86.0 | 82.8 + | 84.0 | 77.6 |
18 | Replace ءى and ءي with ئ | 86.0 | 84.0 | 84.0 | 77.6 |
19 | Replace ص with س | 85.7 − | 83.7 − | 83.6 − | 78.0 + |
20 | Replace ض with ظ | 86.0 | 83.6 − | 84.0 | 77.9 + |
21 | Replace ؤ with و | 85.8 − | 82.8 − | 84.2 + | 77.6 |
22 | Replace كـ with ك | 86.0 | 84.0 | 84.2 + | 77.6 |
23 | Remove stop words | 85.2 − | 84.4 + | 83.4 − | 76.6 − |
24 | Light Stemming | 86.6 + | 85.3 + | 86.2 + | 79.1 + |
25 | Root stemming | 84.4 − | 85.2 + | 85.1 + | 77.8 + |
26 | Lemmatization | 86.7 + | 86.2 + | 86.5 + | 80.1 + |