Journal of Big Data

Table 13 GPU usage of different models

From: Survey of transformers and towards ensemble learning using transformers for natural language processing

Model	BERT	XLNet	GPT2	RoBERTa	ALBERT
GPU/GB	4.47	4.97	3.99	4.30	3.27

Back to article page