From: Survey of transformers and towards ensemble learning using transformers for natural language processing
Model
BERT
XLNet
RoBERTa
ALBERT
EM
0.794
0.854
0.886
0.830
F1
0.873
0.921
0.951
0.901