Fig. 2From: Cross-modality representation learning from transformer for hashtag predictionThe overview of LXMERT4 Hashtag. The [CLS] token on the top of the text embedding passes through the cross-attention layers and learns the joint representation. The corresponding feature vector of [CLS] token is represented as the yellow square on the top of the language output vector, where multilabel classification is used for top K hashtags recommendationBack to article page