A BERTweet-based design for monitoring behaviour change based on five doors theory on coral bleaching campaign

Harywanto, Gabriela Nathania; Veron, Juan Sebastian; Suhartono, Derwin

doi:10.1186/s40537-022-00615-1

Research
Open access
Published: 31 May 2022

A BERTweet-based design for monitoring behaviour change based on five doors theory on coral bleaching campaign

Gabriela Nathania Harywanto¹,
Juan Sebastian Veron¹ &
Derwin Suhartono¹

Journal of Big Data volume 9, Article number: 73 (2022) Cite this article

2263 Accesses
1 Citations
Metrics details

Abstract

Coral reefs are very important ecosystem which are the foundation of all life on this earth, but now they are under threat. Coral bleaching are happening now at a serious rate and the ultimate goal of conservation effort toward this issue is behaviour change. One of the most important parts of conservation effort is monitoring. However, monitoring the success of the coral bleaching campaign on behaviour change requires extensive data collection so traditional methods are not effective because they require resources that may not be met. The goal of this study is to build fast and vast automation in analyzing the stage of behaviour change. Social media data has prospect to become good alternative to be used because social media usage is currently increasing every year, including Twitter. Therefore, an automatic classification model was designed which can identify the stages of behaviour change based on the Five Doors Theory on Twitter. Five Doors Theory define 5 stages of behavior change: Desirability, Enabling Context, Can Do, Buzz, and Invitation. The data was fetched through a trusted repository, Mendeley Data, with title "An Annotated Dataset for Identifying Behaviour Change Based on Five Doors Theory Under Coral Bleaching Phenomenon on Twitter". There are 1,222 tweets with keywords related to coral bleaching that have been annotated according to the behaviour change stages. There are two proposed designs: embedding extraction which utilizes the output of each encoder layer in BERTweet and stacking ensemble which uses several BERTweet models with different hyperparameters that are ensembled using a logistic regression model. The best accuracy of 0.7796 with an f1-score of 0.7945 was obtained in the stacking ensemble design scenario. The classification model created can identify each class at the stage of behaviour change well, even though the dataset is unbalanced in its distribution. The proposed design has a performance that exceeds all baseline models and the standalone BERTweet. In conclusion, the automatic classification model create the process of monitoring the stages of behavior change run effectively and efficiently so that the success of the coral bleaching campaign can be monitored and achieved.

Introduction

Coral reefs support an extremely high level of biodiversity and provide an important ecosystem foundation for millions of people [1]. Directly, economic activities that depend on marine resources are strongly supported by the existence of coral reefs. Coral reefs experience various challenges: long-term changes in ocean and atmosphere interactions, rising sea temperatures, increasing CO₂ levels, weather changes due to major storms, earthquakes, volcanic eruptions, and extreme weather changes [2]. Those challenges lead to the phenomenon of coral bleaching which is a threat to the biodiversity of coral reefs worldwide. Global coral bleaching in 2014–2017 was the third time in the last 20 years and killed thousands of square kilometers of coral reefs and other coral organisms [3, 4].

In natural way, coral bleaching can recover within a certain period of time. However, due to the continuous increase in seawater temperatures, the recovery capacity cannot compensate for the bleaching phenomenon that occurs. Awareness of this issue is very important as the first effort to conserve and maintain coral reefs. Raising awareness of the value of biodiversity, knowing how to conserve it, and using it sustainably is the key to success in achieving all of the Aichi Biodiversity Targets [5]. The target of increasing awareness is stated in Aichi Target 1. However, the success of this target is difficult to monitor and evaluate traditionally [6].

The use of social media has increased significantly over the last few years. Social media can be a prospective source of data to monitor and evaluate public awareness of environmental issues [7], including coral bleaching. The ultimate goal of efforts or campaigns on the issue of the coral bleaching is not only raising awareness, but also changes in the behaviour of the community so that they are actively involved in conservation efforts. According to Robinson, there are 5 stages of behaviour change called the Five Doors Theory, including: Desirability, Enabling Context, Can Do, Buzz, dan Invitation [8].

Through social media, the development of various studies and efforts related to coral bleaching issues that have taken place in various regions can also be found. In social science concept, there are two driver which linked to coral reefs conservation effort: proximate driver and distal driver (Fig. 1) [9]. However, frequently efforts are made to only focus on and include proximal driving factors, such as fishing restrictions [10]. Whereas the ultimate thing in coral reef conservation efforts is to overcome distal social drivers such as human behaviour. Therefore, the analysis of behaviour change can be an important indicator in conservation efforts of coral bleaching.

As show in Fig. 1, distal drivers are components in social systems that indirectly affect how people interact with coral reefs. Proximal drivers directly affect coral reef ecosystems (center). Coral reefs provide various important ecosystem benefits for humans, thus affecting aspects of human well-being. The one-way arrow shows the path from distal drivers to human well-being. The two-way arrows show the complex interrelationships and reciprocity that occur between the various components.

Automatic classification of behaviour stages has been carried out [11]on the topic of energy use as a campaign effort on climate change. Several machine learning models were used: Naive Bayes, Support Vector Machine (SVM), and Decision Tree, which were trained for the 5-stage behaviour change classification task. Data obtained from Twitter with topics: Earth Hours 2015 (EH15), Earth Hours 2016 (EH16), dan Conference of the Parties 21 (COP21). This study concludes that most users are in the Desirability stage and in the second position is the Can Do stage. This shows that in the climate change campaign, some people already have concerns and desires to change their behaviour and some have taken action.

The use of transformer-based deep learning models that are trained specifically on certain types of text and on certain topics shows better performance than models trained in general with conventional texts. In one of the Tweet classification competitions about Covid-19, the top 3 rankings were occupied by teams using the COVID-Twitter-BERT (CT-BERT). This model is based on the BERT-Large model, but has been further trained with 22.5 million Covid-19 related Tweets [12]. NutCracker Team [13], first place, collaborated the CT-BERT model with RoBERTa using a two-level ensemble. Whereas NLP North Team [14], second place, use stand-alone CT-BERT model dan UIT-HSE Team [15], third place, ensemble several CT-BERT models that have different hyperparameters with soft voting and hard voting techniques.

One of the transformer-based pre-trained deep learning models that are specially trained on the type of English Tweet text is BERTweet [16]. This model is designed to address the challenges of characteristic differences between Tweets and conventional texts such as Wikipedia and news articles. Tweets tend to be shorter and use informal vocabulary and abbreviations. Thus, BERTweet model is specially trained with 850 million English Tweets. This model has outperformed its competitors, RoBERTa [17] and XLM-R [18], on various tasks such as POS-Tagging, NER, and Text Classification, across various datasets.

This study utilize deep learning model, BERTweet, which has been specifically trained on Tweets and proven to overcome other strong baseline models [16], to build a 5 stages classification system for behaviour change on the topic of coral bleaching. The data is obtained from Twitter at a certain time in 2021. There are two proposed design that will be used: embedding extraction which utilizes the output of each encoder layer in BERTweet and stacking ensemble which uses several BERTweet models with different hyperparameters that are ensembled using a logistic regression model. Previous study reported that embedding extraction approach is a good transformer-based task-specific model for a transformer encoder and only need one run of transformer training to create various extraction scenarios so it is cost effective in computing resources [19]. The ensemble technique is proven to provide an increase in performance because deep learning ensemble models derive the advantages of the deep learning model itself and the advantage of the ensemble learning [20]. The main benefit of making this model is the fast and vast automation in analyzing the stage of behaviour change towards the coral bleaching campaign which is so limited compared other environmental issue.

The main novelty carryout in this experiment are creating deep-learning-based model for classification 5 stages of behaviour change on coral bleaching topic and new exploration on hyperparameter configuration and logistic regression model selection in stacking ensemble design. Through this experiment, all proposed modification design proven to outclass all baseline and original model.

Related works

The value of coral reefs and its threats

Coral reefs provide food and habitat for marine species, like small fishes and generate structure barrier on coastline to protect bioerosion and physical erosion [21]. There are also many benefits from coral reefs for humans, which are fisheries, coastal protection, medicine and tourism [22]. Coral reefs contribute as a source of protein for many organisms and a source of local income, so it cannot be separated from the coastal ecosystem. Coral reefs are also the source of the success of reef tourism, due to its economic value and on-reefs activities, such as diving, snorkeling, glass-bottom boating and tourism attractions, such as seafood and scenery [23]. Oceans produce about half of the oxygen in the earth and absorbing about 30 percent of carbon dioxide. Coral reefs are the foundation of the ocean health and without them, marine life would not exist [24].

In the midst of its crucial existence for many organisms, coral reefs experienced various threats. It is reported that mass bleaching events occurred around the world in 1998, 2002, 2010, and 2016 along with individual coral bleaching happens more often [25]. During the 2016 mass bleaching event in the Great Barrier Reef (GBR), only 8.9% of reefs survived, compared to the last two mass bleaching events, 42.4% of reefs survived in 2002 and 44.7% survived in 1998 [1]. Coral bleaching also occurred in Maldives in 2016, leaving only less than 6% of the total coral population surviving [26].

Computer science and coral reef conservation

There are several attempts to mitigate coral bleaching that are related to computer science. The detection of coral species with the Artificial Neural Network [27] was built by collecting several images from the West Atlantic Ocean, Eastern Australia, Central Indian Ocean, Southeast Asia and Central Pacific Ocean, then used as training and testing dataset. There is also an attempt to save corals by classifying coral scenery images in the Gulf of Eilat to see if they are urchin, healthy corals, or dead corals based on the image recognition using Convolutional Neural Network [28].

Social media data in conservation

Conventional extensive and large data collection will take a lot of money, time, and not even have sufficient resources available but social media which has been increasing over years can become an alternative. However, the use of social media data in conservation science is very limited and only available in a few sectors. In the conservation area, social media data not only can be used to raise awareness, but also to assess the attention received by particular species or ecosystems on social media platforms. Data from social media could give a direct behavioural basis for assessing public participation in biodiversity conservation. Temporal studies of social media data might also be utilized to better understand changes in biodiversity preference across time [7].

Researchers are using social media data for conservation science by gathering information from user's profile from a certain social media [29]. Flickr posts and Twitter tweets are used for assessing global popularity and threats to Important Bird and Biodiversity Areas (IBAs) by calculating the density of social media posts from geographical location worldwide ranging between February 2016 and June 2017 [30]. Instagram posts also contributes to data source for Hawaiian Monk Seal conservation by filtering post with hashtag '#monkseal' and check if the photo contains human disturbance or not by looking at Human-Wildlife interaction rule [31]. Sogou and WeChat posts also be used to strengthen public awareness of wildlife conservation in China by classifying them to six categories and also analyze differences among data groups using Kruskal—Wallis Test [32]. Twitter tweets can also be used to monitor five stages of behaviour changes according to Five Doors Theory, so furthermore to understand targeted strategies and intervention for driving intended change that are associated with climate change [33].

The power of BERT modification

In particular, the development of deep learning for text classification has made extensive use of Google’s BERT. In a tweet classification task about COVID-19, which classifies whether a tweet is an informative tweet or not, various BERT models that have been modified and specifically trained are used, such as: BERT + [34], CT-BERT [14], and BERTweet [16]. Top result in the tweet classification task about COVID-19 was achieved by CT-BERT model and its modification. This is because that model is a BERT model who was specially trained on Tweets and on the topic of COVID-19.

Five doors theory of behavior change

Robinson introduced a theory called the Five Doors Theory which focuses more on enabling the relationship between human behaviour and modifying technological and social contexts [8]. Five Doors Theory consists of 5 stages:

Desirability: People in this stage are motivated to reduce their frustration, which can be daily discomfort or about deeper personal frustration or sadness or wanting something to change for the better.
Enabling context: People in this stage are changing their environment to allow for new behaviours. This includes infrastructure, services, social norms, governance, knowledge—literally anything that can have a positive or negative influence on certain behaviours, but they are only planning what they can do to change their environment, not to the point of acting.
Can do: People in this stage are already acting and doing something to change their environment. People at this stage also give suggestions for taking action to contribute to their environment.
Buzz: People in this stage share their happy experiences and success stories.
Invitation: People in this stage invite and involve others for a specific purpose.

Each stage in Five Doors Theory has its own linguistic pattern. According to [11], the linguistic pattern of the Desirability stage usually expresses negative sentiments and emotions such as frustration, anger, and personal sadness. This stage usually includes a URL to reveal the fact, and a question asking for help on how to solve the problem/frustration they are facing. The linguistic pattern of the Enabling Context stage is usually expressed in neutral sentiments and emotions. This stage generally provides facts on how to solve a problem based on facts, accompanied by a URL and conditional to show that, by taking a particular action, benefits are potentially obtained. The linguistic patterns of the Can Do stages are usually expressed with neutral sentiments and generally contain suggestions and commands aimed at oneself and others. The linguistic pattern of the Buzz stage usually has positive sentiments and emotions of happiness and joy, as the tweets generally talk about users' success stories and about the actions that they have taken in their engagement with climate change and sustainability. The linguistic pattern of the Invitation stage usually has positive sentiments and happy emotions, as it focuses on engaging others in a positive way. The text generally contains vocative forms that call on others to join this movement. The example sentences for each stage of the five stages of behaviour change can be seen in Table 1.

Table 1 Tweet example according to Five Doors Theory

Full size table

Five Doors Theory is being used in many projects, such as making a conceptual design to raise collective awareness and leverage energy savings by adapting and applying Five Doors Theory into the platform design [35]. Five Doors Theory is also being used as a base for the Climate Change Multitask Game [36]. There are several features that are used in the game, which are number of pledges answered by the user, ratio of pledges the user is already doing, ratio of pledges accepted, ratio of pledges refused, number of points per visit and social logging. Five Doors Theory can also be used to reflect behavioural stages in Tweets. The tweets will be extracted and categorized into each class by its linguistic pattern with GATE. The features that are extracted are polarity, emotions, directives or if the tweet consists of URL or not. The tweets will be tested on three models, which are Naive Bayes, Support Vector Machine and J48. The J48 model has the best performance because it has the highest accuracy, precision, recall and F1 score [11].

Methodology

In 1960s until 2010, statistic-based or machine learning text classification models were ruled, such as Naïve Bayes (NB), K-Nearest Neighbor, and Support Vector Machine (SVM). These models need features engineering effort which costly and time-consuming. Furthermore, these models usually neglect the sequential structure or contextual information in text data, so make it challenging to understand the text semantic information. Nowdays, the text classification start to shift into deep learning, such as transformer-based models, which keep off designing rules and features by humans and also automatically provide semantically meaningful representations [37].

The use of transformer-based models [38] has now become a trend in NLP tasks, including text classification tasks. Models trained specifically for certain text types can outperform models trained with conventional text types. In this study, the text is in the form of tweets, so the main model explored in this study is the BERTweet model. BERTweet has the same architecture as the BERT base, trained with RoBERTa pre-training procedures, and specially trained on 850 M English Tweets. The different characteristic between Tweet and conventional text, Tweet tend to be shorter and use informal vocabulary and abbreviations, become the reason to choose text-specific trained model.

In this study, the outline of the experimental flow that will be carried out can be seen in Fig. 2. There are several machine learning models as the baseline model and several deep learning models as the baseline and the main design of the proposed designs. There are 2 main designs of the proposed model: BERTweet embedding extraction and BERTweet stacking ensemble.

To evaluate the classification performance of the designed models, 2 metrics are used: accuracy and F1 score. To calculate accuracy and F1 score, the value of precision and recall must also be calculated. Precision is the ratio of correct positive predictions from total positive predictions. Recall is the ratio of correct positive prediction from data that are actually positive.

To calculate those metrics, defined for any classifier f: D → C = {1, …, n} and finite set S ⊆ D × C, let m^f,S ∈ ${\mathrm{N}}_{0}^{n\times n}$ be a confusion matrix, where =${m}_{ij}^{f, S}$|{s ∈ S | f(s₁) = i ∧ s₂ = j}|. For any such matrix, let P_i, R_i and F1_i denote precision, recall and F1-score with respect to class i:

$${P}_{i}= \frac{{m}_{ii}}{\sum_{x=1}^{n}{m}_{ix}} ; {R}_{i}= \frac{{m}_{ii}}{\sum_{x=1}^{n}{m}_{xi}} ;F{1}_{i}=H\left({P}_{i}, {R}_{i}\right)= \frac{2{P}_{i}{R}_{i}}{{P}_{i}+{R}_{i}}$$

(1)

with P_i, R_i, F1_i = 0 when the denominator is zero. Precision and recall are also known as positive predictive value and sensitivity.

For every scenario, the F1 score are computed using Macro F1 which follow the step of computing the F1 score for each class and then averaging it via arithmetic mean, the mathematical formula can be seen in:

$$F1= \frac{1}{n}\sum_{x}F{1}_{x}=\frac{1}{n}\sum_{x}\frac{2{P}_{x}{R}_{x}}{{P}_{x}+{R}_{x}}$$

(2)

Baseline

The task of classifying behaviour changes in the phenomenon of coral bleaching is a new and less popular task, therefore no previous research has been found that can be used as a reference for model performance. Simple yet reasonable models are chosen for baseline model. From the deep learning approach, the BERT-large model was chosen which was not specifically trained on Tweet-type text, while the machine learning approach used 4 classifiers: Support Vector Machine (SVM), Logistic Regression (LR), K-Nearest Neighbors (KNN), and Random Forest (RF), with features obtained using the word embedding Glove pre-trained Twitter-100.

For the SVM, LR, KNN, and RF models, the input data is in the form of tweets whose word representation is extracted using pre-trained Global Vectors for Word Representation (Glove) Twitter-100, then the vectors of each word in a tweet are summed and averaged. The BERT model receives input in the form of tweets that have been cleaned and tokenized with the BERT tokenizer.

Dataset

The dataset was taken from a trusted repository, Mendeley Data, with title "An Annotated Dataset for Identifying Behaviour Change Based on Five Doors Theory Under Coral Bleaching Phenomenon on Twitter" [39]. This dataset contains 1,222 tweets with keywords related to coral bleaching that have been annotated according to the behaviour change stages. The distribution of data for each class of behaviour change can be seen in Figure 3. The distribution of Can Do and Invitation class are uneven, much less than other classes.

The dataset was split into training and testing set. The splitting process is carried out by maintaining the ratio of the distribution of data in each class with proportion of 80% for training set and 20% for testing set. The distribution of classes in the training and testing sets can be seen in the Table 2.

Table 2 Distribution of each class on training and testing set

Full size table

Pre-processing

To accommodate the two types of models used in this research, deep learning models and machine learning models, there are 2 main streams of pre-processing applied to tweets, which can be seen in Figure 4. Each of the results of the pre-processing will then be used as material for feature extraction for machine learning models and tokenization for deep learning models.

More detail about tokenization procedure for deep learning experiment can be seen in Figure 5. In general, there were 3 types of experiment using deep learning architecture: BERT as baseline model for deep learning, BERTweet Embedding Extraction and BERTweet Stacking Ensemble as enhanced methods to overcome classification task on this research.

BERTweet embedding extraction

Each Transformer layer within the BERTweet model learns different and unique information. Several experiments using the BERT model have been done using fine-tuning approaches such as BERT Large and BERT Base. However, it is reported that embedding extraction approach, where embedding results from each encoder extracted as features, has certain advantages, such as being a good transformer-based task-specific model for a Transformer encoder because not all tasks can be easily represented by a default Transformer encoder architecture and getting results from many scenarios just by running the transformer encoder once and make cheaper models on top of it. From the result of the experiment, the performance of embedding extraction approach by concatenating the last four layers can match the performance of fine-tuning approaches such as BERT-base and BERT-large [19].

The experiment is done by feeding the tokenized input to the Transformer block. There are 8 different scenarios of the extraction (Table 3). A Transformer block consists of 12 encoder blocks, but the extracted encoder blocks result depend on the setting that is used. In general, the flow of this experiment can be seen in Fig. 6. Each encoder block generates a CLS token embedding result (e₁, e₂, e₃, …, e₁₂) and those result are concatenated based on several combinations according to the scenarios to become input for classification block (h).

$${\text{h}}\, = \,concat(e_{c1,} e_{c2} , \, e_{c3} , \, \ldots , \, e_{cn} | \, c\, = \,\{ {\text{chosen encoder}}\} )$$

Table 3 BERTweet Embedding Extraction scenarios description

Full size table

The classification block consists of dense layers and dropout layers to extract the concatenated pooled token and produce only 5 features at the end because there are 5 classes in this classification task. The initial hyperparameters in the experiments are learning rate and epsilon of 1e-4 in 7 epochs.

BERTweet stacking ensemble

Minor differences in hyperparameter configurations can give different performance results for each model. Calibrating hyperparameters is key to increasing model performance in deep learning and NLP. Once adapted across methods, hyperparameter tuning significantly improves performance in every task. In many cases, modifying the setup of a single hyperparameter yields a larger increase in performance than shifting to a better algorithm or training on a larger corpus [40].

On this occasion, experiments were conducted with the difference in determining the value of learning rate and epsilon. The range of values to consider for the learning rate is less than 1.0 and greater than 1e-6, but these should not be taken as strict ranges and greatly depend on the parametrization of the model [41]. In a study [42], a lower learning rate, such as 2e−5, is necessary to make BERT overcome the catastrophic forgetting problem and an aggressive learn rate of 4e−4, the training set fails to converge. The epsilon is to avoid divide by zero error while updating the variable when the gradient is almost zero. So, ideally epsilon should be a small value, but a very small value will make normalization in weight update to 1. The trade-off is that the bigger epsilon, the smaller the weight updates are and thus slower the training progress will be.

After few initial tries, there are 2 values of learning rate and 2 values of epsilon which considered as combination choice for model configuration. Thus, there are 4 combinations of hyperparameter setting which can be seen in Table 4. All models (modelSE#1 to modelSE#4) are standard BERTweet model with modification on learning rate and epsilon. Those models will be used as the standalone model which will then be combined for stacking ensemble scenarios. In these experiments, the batch size was set as 4. Smaller batch size has an advantage over larger ones. Smaller batch size works better due to the trade-off between number of samples and number of updates [43]. This time the dataset used is not large, so it is possible to have a small batch size with the available computing resources. Thus, the weight update process will run more frequently, so significantly increasing training stability.

Table 4 The model configuration of the four individual models with batch size 4 and 9 epochs

Full size table

In Fig. 7, there are n BERTweet models which have different hyperparameters setup that will be ensembled using a stacking technique by treating the confidence score results from each BERTweet model (b₁, b₂, b₃, … , b_n) as input for a machine learning model (h) by concatenation.

$${\text{h}}\, = \,concat(b_{c1,} b_{c2} , \, b_{c3} , \, \ldots , \, b_{cn} | \, c\, = \,\{ {\text{chosen model}}\} )$$

Based on the previous studies shown that ensemble approach was effective [15], so the experiments have been carried out by combining all combination of 2 models from a total of 4 standalone models and combining all 4 standalone models. There are 6 scenarios of combination of 2 models which use n = 2 and a scenario of combination of 4 models which use n = 4. The combining technique is done using stacking technique, where each model is trained in parallel, then the results of confidence score for each class of each model are combined by concatenation, and in the end a machine learning model, logistic regression with SGD training, is used to provide the final prediction results. Each model produces 5 value of confidence score. The experiments of combining all combination of 2 models feed the machine learning model with 10 number of features and the experiment for combining all 4 models feed machine learning model with 20 number of features. Logistic regression model was selected because it is simple, fast, and computationally inexpensive [44]. Computational of deep learning models have cost quite expensive, therefore machine learning model with low cost is chosen.

For each combination experiment performed has its own logistic regression model which is partially trained for each epoch. So, the learning process of each logistic regression model is continuous every epoch. This allows for improved logistic regression model performance as the epoch progresses.