From: IDC: quantitative evaluation benchmark of interpretation methods for deep text classification models
PGT | Performance metric | Interpretation methods | |||||
---|---|---|---|---|---|---|---|
 | Saliency map | Grad-CAM | Integrated gradient | DeepLIFT | LRP | Hierarchical attention | |
WGA (8Â K) | Îş | 0.33 | 0.58 | 0.58 | 0.58 | 0.59 | 0.16 |
Pinterp | 0.40 | 0.71 | 0.70 | 0.72 | 0.72 | 0.28 | |
Rinterp | 0.53 | 0.61 | 0.61 | 0.62 | 0.63 | 0.29 | |
MILP (2Â K) | Îş | 0.17 | 0.25 | 0.25 | 0.25 | 0.25 | 0.09 |
Pinterp | 0.21 | 0.29 | 0.29 | 0.30 | 0.29 | 0.13 | |
Rinterp | 0.26 | 0.30 | 0.30 | 0.30 | 0.31 | 0.19 | |
Hybrid (7Â K) | Îş | 0.27 | 0.43 | 0.42 | 0.43 | 0.44 | 0.15 |
Pinterp | 0.41 | 0.57 | 0.56 | 0.58 | 0.58 | 0.27 | |
Rinterp | 0.36 | 0.52 | 0.50 | 0.51 | 0.53 | 0.28 |