Arabia Felix 2.0: a cross-linguistic Twitter analysis of happiness patterns in the United Arab Emirates

Al Shehhi, Aamna; Thomas, Justin; Welsch, Roy; Grey, Ian; Aung, Zeyar

doi:10.1186/s40537-019-0195-2

Research
Open access
Published: 15 April 2019

Arabia Felix 2.0: a cross-linguistic Twitter analysis of happiness patterns in the United Arab Emirates

Aamna Al Shehhi ORCID: orcid.org/0000-0003-1868-1003^1,2,
Justin Thomas³,
Roy Welsch⁴,
Ian Grey⁵ &
…
Zeyar Aung⁶

Journal of Big Data volume 6, Article number: 33 (2019) Cite this article

3697 Accesses
12 Citations
2 Altmetric
Metrics details

Abstract

The global popularity of social media platforms has given rise to unprecedented amounts of data, much of which reflects the thoughts, opinions and affective states of individual users. Systematic explorations of these large datasets can yield valuable information about a variety of psychological and sociocultural variables. The global nature of these platforms makes it important to extend this type of exploration across cultures and languages as each situation is likely to present unique methodological challenges and yield findings particular to the specific sociocultural context. To date, very few studies exploring large social media datasets have focused on the Arab world. This study examined social media use in Arabic and English across the United Arab Emirates (UAE), looking specifically at indicators of subjective wellbeing (happiness) across both languages. A large social media dataset, spanning 2013 to 2017, was extracted from Twitter. More than 17 million Twitter messages (tweets), written in Arabic and English and posted by users based in the UAE, were analyzed. Numerous differences were observed between individuals posting messages (tweeting) in English compared with those posting in Arabic. These differences included significant variations in the mean number of tweets posted, and the mean size of users networks (e.g. the number of followers). Additionally, using lexicon-based sentiment analytic tools (Hedonometer and Valence Shift Word Graphs), temporal patterns of happiness (expressions of positive sentiment) were explored in both languages across all seven regions (Emirates) of the UAE. Findings indicate that 7:00 am was the happiest hour, and Friday was the happiest day for both languages (the least happy day varied by language). The happiest months differed based on language, and there were also significant variations in sentiment patterns, peaks and troughs in happiness, associated with events of sociopolitical and religio-cultural significance for the UAE.

Introduction

Twitter is a social media platform which allows users to post text messages of up to 140-characters in length. These messages, known as tweets, may also be associated with images or videos clips. Tweets are time-stamped and some are even tagged with GPS co-ordinates. Twitter users can also follow one another, viewing and responding to each others tweets. Additional features include the ability to cluster tweets using hashtags (#) or to create conversations by mentioning (@) or (RT) retweeting other users tweets. In 2015, Twitter reportedly had upwards of 307 million active users worldwide [1,2,3,4,5,6].

Typically, Twitter users share information about their daily activities, attitudes and moods. Some users may even form shared communities based on mutual interests [3]. Twitter also occasionally becomes a key source of information about specific events, for example, presidential elections [7], major incidents or the outbreak of infectious diseases. In this last case, Twitter has been found to surpass traditional methods of syndromic surveillance and was successfully used to monitor the diffusion of infectious illnesses such as cholera in Haiti [8], and influenza in the United States or America [9]. In short, Twitter data can provide a macro-level perspective on various aspects of human behavior and related psychological variables.

In this study, we quantitatively explore the expression of positive and negative affective states, performing a cross-linguistic (Arabic/English) sentiment analysis of the UAEs Twitter dataset over a 5 years span (January 1, 2013 to August 31, 2017) comprising over 17 million tweets. This study is the first to specifically investigate public sentiment in an Arabian Gulf nation. The following research questions are addressed:

1.
Is language use (Arabic vs. English vs. bilingual) associated with differences in Twitter use (e.g., frequency of tweets, number of followers, etc.)?
2.
Based on Hedonometer [10, 11], is language use (Arabic/English) associated with differences in the expression of positive sentiment (Happiness)?
3.
What are the cross-linguistic and temporal patterns of happiness in the UAE Twitter data?
4.
Are there geographic differences in happiness patterns, exploring the UAEs seven emirates: Abu Dhabi, Dubai, Ajman, Umm Al Quwain (UAQ), Fujairah, Sharjah and Ras Al Khaimah (RAK)?

We name our study “Arabia Felix 2.0”. It is a Twitter based extension of the previous sociopsychological study on the United Arab Emirates called the “New Arabia Felix” [12].

The present work is divided into the following sections. "Related work" section provides a review of the Twitter sentiment analysis literature and gives a brief description of study area and the United Arab Emirates (UAE). "Methodologies" section details the various methodologies used in the present study to explore sentiment (happiness) analysis. "Results and discussion" section presents the the results and discussions related to the present analysis. Finally, "Conclusion" section provides conclusions and outlines our future directions for research.

Related work

Watson and Tellegen [13] present a two-dimensional model of human emotion, where one dimension represents the valence of affective states, positive and negative, and a second dimension reflects the level of physiological arousal, high and low. These states are highly dynamic and influenced by internal and external stimuli, such as the weather, social interactions and our subjective interpretations of such external events [14]. These affective states (positive or negative, high or low arousal) can be expressed in written or spoken language [15], reflected in English by words such as happy, sad, excited, bored. Increasingly, people are using social media platforms such as Twitter to express their current status, including direct or indirect references to their affective states [3, 16, 17]. This phenomenon provides researchers with an opportunity to explore affective states as a function of time and place, and with reference to a specified attitude object (e.g. religion, presidential candidates, smoking). The large quantities of data involved in this type of social media exploration have led to the development of sophisticated analytic techniques based on natural language processing. Such affect-focused techniques have generally been referred to as sentiment analysis [18].

Sentiment analysis aims to extract and quantify subjective opinion or feelings from the written word, including transcribed speech [6, 19]. Sentiment analytic techniques have been successfully applied across a wide domain of topics, such as, public sentiment across time, sentiment in response to major events and how people feel towards specific products or services. This type of information can be particularly valuable to commercial entities and those interested in monitoring and promoting community or even national wellbeing [10]. Subjective wellbeing (happiness) at the national and community level has, in recent years, increasingly been viewed as a useful indicator of governmental performance, in some ways, a psychological parallel to fiscal indicators such as gross domestic profit [20].

This interest has led to the developed and refinement of big data sentiment analytic techniques and their application to a broad array of topics. For instance, a technique known as Hedonometer and Valence Shift Word Graphs were used to study the affective linguistic trends of song lyrics and blog posts over time [10]. The authors concluded that the happiness (greater expressed positive affect, relative to negative affect) of song lyrics declined from 1960s to 1990s and then remained stable after 1995. However, the happiness of blogs appeared to increase from 2005 until 2009. Numerous other sentiment analytic studies have attempted to explore temporal patterns in affective states over hours, days, months and years. For instance, the authors in [14] studied affective rhythms across 84 countries. They used a tool known as Linguistic Inquiry and Word Count (LIWC), and developed a lexicon for measuring positive and negative affect from Twitter data. They found fairly universal diurnal patterns. Specifically, the early part of the day was associated with heightened positive affect, which reached a peak around 6:00 am and then decreased over the duration of the day. This study also found that the weekday morning positivity pattern shifts by two hours during the weekend. Other Twitter studies exploring the temporal dynamics of happiness, using the previously mentioned Hedonometer [11], have also observed fairly pronounced weekly patterns; weekends tend to be more positive than early weekdays. Such sentiment analytic temporal explorations are generally consistent with intuitive expectations.

Beyond transient affective states such as happiness, the authors in [21] focused on self-referential tweets using a template driven retrieval strategy to explore life-satisfaction, a more trait-like component of subjective wellbeing. As predicted, Twitter derived estimates of life satisfaction were relatively stable across time (no temporal patterning) and appeared to be uninfluenced by seasonal transition, for example, celebrity deaths or political crises. Additionally, those Twitter users categorized as satisfied, as opposed to dissatisfied, demonstrated patterns in their tweets that were consistent with previous findings in the subjective wellbeing literature. For example, the satisfied expressed significantly more positive and less negative affective words. They also used less profanity, and were significantly more positive about religion than their dissatisfied counterparts.

Several studies have also focused on Twitter deduced sentiment as it pertains to specific events or social occasions. Twitter data were used to explore the affective states of soccer fans in response to the US team during 2014 FIFA World Cup [18]. This study used a Word-Emotion Association lexicon and emoticons to detect sentiment. The findings suggest that the expressed emotions of football supporters followed anticipated patterns: fear and anger when the opposing team scored, and anticipation and joy when the US team scored.

In a service-industry context, Twitter data and sentiment analysis have been employed to measure customer satisfaction within the hospitality industry [6]. A study undertaken in the metropolitan area of Las Vegas used a sentence level form of sentiment analysis [22] and found that the Twitter derived sentiment correlated well with official, third party, hotel rankings. Similarly, within the world of finance, researchers report correlations between public mood and the state of the stock market [5]. In this last case, the authors used Opinion Finder and Google-Profile of Mood States (GPOMS) to calculate Twitter sentiment, while the Dow Jones Industrial Average (DJIA) was used as an indicator for the stock market. The study concluded that variations in peoples affective states (as deduced from the Twitter data) were predictive of stock market values.

Exploring the meteorological relationship to mood, another Twitter study [16] used OpinionFinder sentiment lexicon and Profile of Mood States (POMS) to extract sentiment. This data was then correlated with US. Climate Reference Network (USRCN) and Global Historical Climatology Network (GHCN), which provides a breakdown of temperature, precipitation, snow depth, wind speed and solar energy received. The study found that high temperatures were associated with fatigue, anger and reduced depression, while snow was associated with increased depression, and precipitation was associated with decreased tiredness.

Overall the sentiment analytic explorations of social media data have produced findings that are generally consistent with intuition and often highly convergent with findings obtained via other research methods and data sources. To date however, the majority of this work has focused on the English language. There is a need to further extend and explore the validity and reliability of sentiment analytic techniques across sociocultural and linguistic contexts.

Study context and target population

The UAE was formed in 1971. It is a federation of seven states or emirates: Abu Dhabi, Dubai, Ajman, Fujairah, Ras al-Khaimah, Sharjah, and Umm al-Quwain. The last five emirates on this list are sometimes referred to as the Northern Emirates. These emirates north of Dubai are smaller in landmass and population than Dubai and Abu Dhabi and have pursued less aggressive programs of urbanization and modernization [12]. Despite minor inter-emirate differences in demography, the nation’s official language is Arabic and the state religion is Islam. Abu Dhabi is the largest emirate and home to the nation’s capital Abu Dhabi City. It is also the site of the nation’s largest oil reserves.

Since the commercial exploitation of oil and gas began in the late 1960s, few nations on earth have witnessed such rapid social and economic development [23]. The World Bank ranked the UAE as the fifth richest country in the world in terms of GDP (purchasing power parity) per capita in 2016 [24]. This development has attracted a large expatriate workforce and the indigenous citizens (Emiratis) have become a minority in some areas. End of the year population estimates by the UAE National Bureau of Statistics for 2009 suggested that the UAE nationals comprise 11.38% of the UAE’s population [25]. This rapid development has also included education and the language of instruction across the nations tertiary educational institutions is predominantly English [26]. This means that alongside Arabic, English is also widely used in the UAE.

Methodologies

In the methodology section, we present data collection. We then explain the methods we applied to analyze the data and to address the aforementioned research questions.

Data collection and preparation

The UAE historical tweets were purchased directly from Twitter, using the Historical PowerTrack enterprise product. The parameters for querying the tweets are, location (UAE, Abu Dhabi, Dubai etc.) and date. The extracted dataset consists of more than 22 million tweets spanning five years (January 1, 2013, to August 31, 2017). This data comprised a cross-section of tweets from all seven emirates.

Statistical analysis

The first research question aims to investigate the relationship between language (Arabic and English) and Twitter use; we classified Twitter users into two groups: monolingual, and bilingual. The monolingual users are defined as those users who tweet exclusively in either Arabic or English. The bilingual users tweet in both languages (data for participants tweeting in other languages were excluded). For this analysis, we considered a statistical method for comparing two samples. For Monolingual user: we applied Mann-Whitney U test and Unpaired t-test. Mann-Whitney U test is a nonparametric test for comparing two independent samples when the normality assumption is not satisfied. The test has the hypotheses that:

$$\begin{aligned}& {\text {Null Hypothesis}}, H_{0}{:}\,{\text{the distributions are the same}} \\ & {\text {Alternative Hypothesis}}, H_{1}{:}\, {\text{the distributions are not the same}} \end{aligned}$$

The unpaired t-test is a parametric test for testing two independent samples when the normality assumption is satisfied. The test has the hypotheses that:

$$\begin{aligned}&{ \text{Null Hypothesis}, H_{0}{:}\,{\mu _{1}-\mu _{2}=0}} \\ & {\text{Alternative Hypothesis}}, H_{1}{:}\,{\mu _{1}-\mu _{2}\ne 0} \end{aligned}$$

For Bilingual user: we applied Wilcoxon signed rank test and Paired t-test.Wilcoxon signed-rank test is a non-parametric test for two dependent samples when the normality assumption is not satisfied. The test has the hypotheses that:

$$\begin{aligned}&{\text{Null Hypothesis}}, H_{0} {:}\,{\text{the distributions are the same}}\\&{\text{Alternative Hypothesis}}, H_{1}{:}\, {\text {the distributions are not the same}} \end{aligned}$$

The paired t-test is a parametric test for testing two dependent samples when the normality assumption is satisfied. The test has the hypotheses that:

$$\begin{aligned} & \hbox {Null Hypothesis}, H_{0}{: \mu _{d}=0}\\ & \hbox {Alternative Hypothesis}, H_{1}{: \mu _{d}\ne 0} \end{aligned}$$

Happiness measures

For the other three research questions, investigating the happiness/sentiment and geographic differences, we used the Hedonometer algorithm and valence shift word graphs. Hedonometer is based on a language assessment by Mechanical Turk 1.0 (LabMT-1.0) sentiment lexicon. This technique is used to perform a quantitative content analysis for valence (positive vs. negative affect) of each tweet across the Twitter dataset. The second method used, Valence Shift Word Graph, facilitates analysis of how individual words in the dataset contribute to changes or shifts in valence patterns.

Hedonometer

The Hedonometer technique was introduced by Dodds and Danforth [10] and further explained in Dodds et al. [11]. The technique is based on using 10,222 English words to calculate the overall emotional content of a given text. The 10,222 words in the base-lexicon have previously been ranked for happiness from 1 (least happy) to 9 (most happy) [12, 23]. The mean happiness of the wordlist in the original study was 5.37. Eq. (1) explains how the total happiness of a given text is computed [11]:

$$\begin{aligned} h_{avg}(T)=\frac{\sum _{i=1}^{N}{h_{avg}(w_i)f_i}}{\sum _{i=1}^{N}f_i}=\sum _{i=1}^{N}{h_{avg}(w_i)p_i}, \end{aligned}$$

(1)

where T is a given text; N is the number of unique words in T; $w_i (1\le i\le N)$ is a given word in the text; $f_i$ is the frequency of the ith word $w_i$; $h_{avg}(w_i)$ is the average happiness of the word $w_i$; $\varDelta h_{avg}$ is the range of words to exclude; ($5-\varDelta h_{avg}<h_{avg}<5+\varDelta h_{avg}$. It is a natural tuning parameter, and we can choose $\varDelta h_{avg}=1$ to balance the sensitivity versus the robustness) and $p_i=f_i/\sum _{i=1}^{N}f_i$ is the normalized frequency.

For this study, we select ${\varDelta }h_{avg}=1$; therefore, we exclude words with scores between 4 and 6 from the analysis following the work of Cody et al. [27]. Wordlists exist for several languages, including Spanish, Portuguese, Arabic, Indonesian, Russian, Korean and Chinese [28]. In this study, we focus only on the Arabic and the English wordlists. The Arabic wordlist consist of 10,000 words with a mean happiness score of 5.35. There are differences in the mean happiness ratings for similar words across the Arabic and the English wordlists. For example, the word Allah (God) rates 8.38 in Arabic, while the word God scores 7.28 in English.

Valence Shift Word Graph

The Valence Shift Word Graphs [10] can help analyze the impact that specific events have on sentiment/happiness. This technique allows the identification of changes in the valence and the specific word use that might have contributed to the shift. This analysis is presented as a graph which ranks the words in absolute decreasing order, based on the relative frequency and valence of the words in the text detailing their contribution to shifts in valence between the target and reference text [10, 27]. The percentage contribution of the word i to the difference between the text and reference is computed as shown in Eq. 2:

$$\begin{aligned} \varDelta _i(b,a)=100 \times \frac{(p_{i,b}-p_{i,a})(v_i-v_a)}{\delta (b,a)}, \end{aligned}$$

(2)

where $\delta (b,a)=v_b-v_a$, the valence difference. It can take a positive or negative value; $p_{i,b}$ and $p_{i,a}$ are the fractional abundances of the word i in the texts a and b, respectively; $v_i$ is the valence of the word i; and $v_b$ and $v_a$ are the valences of the texts a and b.

Figure 1 presents an example of a valence shift word graph. (Note: it is just a generic example). The graph is divided into two sections: right section (positive values) and left section (negative values) upon the $\varDelta _i(b,a)$ value. The values in the right section represent the words which correspond to the increase in the happiness and those in the left section the words that correspond to the decrease in the happiness. Increases in the happiness are influenced by two conditions: Increases in the positive word uses or decreases in the negative word uses and visa versa. The orange bars in the graph represent the positive (happy) words while the green bars represent the negative (less happy) words. For instance, the word “love” is represented by a orange bar, which indicates that it is a happy word, and its use has decreased as “love” is shown on the left side of the graph to represent a decrease in the happiness.

Results and discussion

The UAE Twitter dataset for five years (January 1, 2013 to August 31, 2017) was used in the present analysis. Each tweets metadata can be classified into two main categories: attributes which are related to the user, and attributes related to the post or tweet itself. User attributes include: number of friends, device display name, number of followers, verification status, summary, user language, preferred user name, and user display name. The tweet attributes include: tweet text, favorite count, text language, geolocation, location name, and posted time.

The UAE Twitter dataset allow us to explore the demography and tweet patterns of the UAEs Twitter users. Table 1 summarizes the general information about the dataset. There are 70 different languages identified; 45.7% of all the tweets are written in Arabic with English accounting for 31.5% of the tweets. These are the two most commonly used languages in the dataset. The diversity in the languages reflects the UAEs diverse expatriate population, many of whom do not speak Arabic [12]. We divide our cross-linguistic tweet analysis into four main sections:

Table 1 General information of the dataset

Full size table

1.
Statistical differences between the two major languages of the tweets (Arabic and English),
2.
Geographical comparisons, exploring the differences in happiness between the whole UAE and the seven UAE emirates: Abu Dhabi, Dubai, Ajman, UAQ, Fujairah, Sharjah and RAK,
3.
Temporal comparisons, studying the hourly, daily, and monthly happiness patterns, and
4.
Analysis of happiness associated with specific sociopolitical and religio-cultural events.

Language comparisons: Arabic vs. English

In this section, we analyze the Arabic and the English Twitter users based on the following criteria: number of tweets, the number of followers, the number of following, and the mean happiness scores (computed using Eq. 1).

Before we start our main analysis, we classify the users into two groups: monolingual and bilingual. The monolingual users are defined as those users who tweet exclusively in either Arabic or English. The bilingual users tweet in both English and Arabic. For this analysis, we use time-independent (cross-sectional) tests and compare the users in both groups.

Monolingual users

For the monolingual users, there are 1,829,404 Arabic tweets posted by 85,476 unique users and 3,852,677 English tweets posted by 218,277 unique users. For this analysis, we use the Mann-Whitney U test for the different categories. It is a nonparametric test for comparing two independent samples when the normality assumption is not satisfied. We have made the following observations.

On average, the monolingual Arabic users tweeted less ($\text{ mean: } {\bar{x}}=21.40$, $\text{ standard } \text{ deviation: } \sigma =147.72$) than the monolingual English users (${\bar{x}}=17.65$, $\sigma =127.45$). This difference is statistically significant at 95% CI (confidence interval) ($p<0.05$) with a small effect size ($d=0.065$).
In terms of followers, the monolingual Arabic users have a lesser number of followers on average (${\bar{x}}=2970.30$, $\sigma =54768.23$) than their monolingual English counterparts (${\bar{x}}=3449.24$, $\sigma =78999.25$). This difference is also significant at 95% CI ($p<0.05$) with a small effect size ($d=0.061$).
The number of followings also exhibits a similar pattern. The monolingual Arabic users followed more number of other users (${\bar{x}}= 700.80$, $\sigma =8241.16$) than the monolingual English users (${\bar{x}}=507.05$, $\sigma =4037.36$) again with a significant different at 95% CI ($p<0.05$) and a small effect size ($d=0.018$).

To explore differences in happiness between the monolingual Arabic users and their English counterparts, we employ an Unpaired t-test. This is a parametric test for testing two independent samples when the normality assumption is satisfied. The test reveal that the monolingual Arabic users are happier/express more positive sentiment (${\bar{x}}=6.81$, $\sigma =0.63$) than their English counterparts (${\bar{x}}=6.46$, $\sigma =0.51$). This difference is statistically significant: 95% CI ($t=127.5$, $p<0.05$) with a large effect size ($d=0.63$). We hypothesize that this apparent difference in sentiment is a reflection of the cultural and linguistic differences. For instance, in Arabic, positive adjectives are very frequently used as first names such as “Saeed” (happy), “Amal” (hope), and “Jameel” (beautiful). Similarly, there is an Islamic prohibition against using negative words as names, for example, “Harb” (war) and “Murrah” (bitter) [29]. Furthermore, Arabic is also commonly written without the diacritic marks (tashkeel), used to indicate vowels, therefore there can be a great deal of orthographic polysemy, for example the popular boy name “Obaid” and the word “Abeed” (slave—often used as an insult) would be indistinguishable when written without diacritic marks (a common practice).

Bilingual users

The number of bilingual users, who tweet in both Arabic and English, is 60,643. We only analyze the number of tweets the bilingual users tweet in each language. We use the Wilcoxon signed rank test, a non-parametric test for two dependent samples when the normality assumption is not satisfied. The Wilcoxon test indicates that there is a significant difference at 95% CI ($p<0.05$) with a small effect size ($d=0.02$) with the Arabic tweets being more abundant (${\bar{x}}=141.03$, $\sigma =641.15$) than the English tweets (${\bar{x}}=54.64$, $\sigma =340.98$) among the bilingual users.

For analysis of the mean happiness scores of each user’s posts, we use a Paired t-test. The Arabic tweets were again associated with a higher happiness score (${\bar{x}}=6.67$, $\sigma =0.40$) than the English tweets (${\bar{x}}=6.45$, $\sigma =0.50$). The difference is significant at the 95% CI ($t=74.2$, $p<0.05$) with a large effect size ($d=0.49$). Again, the reason for this difference can be attributed to the linguistic features of Arabic as mentioned in the previous paragraph. The mean happiness scores in both languages is above the midway point, which accords with the idea of the linguistic positivity bias [28].

Geographical comparisons: Whole UAE vs. Seven Emirates

In this section, we first investigate the patterns of happiness across the UAE as a whole, and then we focus our analysis on the seven emirates: Abu Dhabi, Dubai, Ajman, UAQ, Fujairah, Sharjah, and RAK. We use the word shift graphs to show how particular words contribute to the differences in sentiment between either time points or emirates. For the Arabic words, we use their English translations throughout this section of the paper; however, a glossary of the original Arabic words can be found in Additional file 1: Table S1.

Before our primary study, we first conducted a brief validity check for some common-sense words for both the Arabic and the English tweets in line with the previous studies [11, 21]. The details are given in section Validity Check in Additional file 1.

The primary analysis began by investigating the daily frequency of tweets in Fig. 2. Figure 2a presents Arabic tweet frequency; the order of the lines is expected because Abu Dhabi, Dubai, and Sharjah have the largest populations, as reported by World Population Review [30]. We can also see that there is a relative dearth of tweets until mid-2014; which indicates the time that the Twitter platform gained popularity in the area. Using Arabic, for the UAE as a whole, peaks in tweet volume correspond to important events such as Eidul Fiter, Eidul Adha and, Ramadan, and UAE National Day. A large contributor to the elevated frequency of the tweets on these days can be attributed to the trending of hashtags: for example, on July 06, 2016 the hashtag #eaydakum_mubarak accounted for 201 tweets, and on September 12, 2016 the hashtag #Eid_Adha accounted for 56 tweets. Additional file 1: Table S2, presents the highest peak days for the Arabic tweets and the most popular hashtags for Whole UAE as well as for each emirate.

Figure 2b shows the frequency of the English tweets. The order of the lines is as expected because Dubai has by far the most expatriate-dense population as indicated by the larger number of English monolinguals in our dataset and confirmed by the demographic statistics from the Dubai Statistics Center (DSC) [31]. The UAE also has a large Filipino population and the popularity of a Filipino TV show seems to have had a large impact on English tweet frequency. A large contributor to the elevated frequency of the tweets on these days can be attributed to the trending of hashtags for example on January 1, 2017 the hashtag #happynewyear accounted for 255 tweets, and on October 24, 2015 the hashtag #aldubebtamangpanahon elucidated for 6884 tweets. Additional file 1: Table S5, presents the highest peak days for the English tweets and the most popular hashtags for the UAE and each of the constituent seven emirates.

Now, we analyze the wellbeing of the UAE after averaging the users’ daily happiness scores to remove biases. Figure 3 shows the mean happiness computed using Eq. 1 for both the Arabic and the English tweets over the days throughout the five-year period. The row represents the happiness for the whole UAE and each emirate separately. The pattern is similar across all emirates. The average happiness has high fluctuation dissipation relations for a lower number of tweets as indicated in Fig. 2.

Figure 3a presents the daily mean happiness scores for the Arabic tweets. The overall average score for the whole UAE is 6.611, and fairly similar for Abu Dhabi, Dubai, Ajman, UAQ, Fujairah, Sharjah, and RAK with scores of 6.614, 6.607, 6.621, 6.644, 6.665, 6.624, and 6.643, respectively.

Figure 3b shows the daily mean happiness scores for the English tweets. The overall average score for the whole UAE is 6.24, and for Abu Dhabi, Dubai, Ajman, UAQ, Fujairah, Sharjah, and RAK are 6.261, 6.312, 6.24, 6.344, 6.38, 6.22, and 6.329, respectively. Theses slight differences are not statistically significant.

The Arabic tweets are associated with higher mean happiness scores than the English tweets, and this is most likely due to the same linguistic reasons mentioned above in the analysis of the monolingual(Arabic/English) users in "Results and discussion" section. Even though there are no statistically significant differences between the seven different emirates or the UAE as a whole, it is still worth exploring their word shift graphs to identify the words that provide the greatest contributions to the respective happiness scores. The word shift graphs in Figs. 4 and 5 compare the happiness of the tweets of the seven emirates against the whole UAEs tweets as a reference. These graphs show the words that contribute to the shift in the mean happiness of each emirate with reference to the UAE as a whole.

Figure 4 depict the contributions of the words to the mean happiness score of Abu Dhabi, based on Arabic (Fig. 4) and English (Fig. 5) tweets in comparison with the UAE as a whole.

Focusing on the Arabic tweets of Abu Dhabi, it is observed that Abu Dhabi happiness on average is similar to other emirates. The positive (right) side of the graph in Fig. 4 shows that Abu Dhabi’s attenuated happiness is most largely attributable to a increase in the positive word “Mohammed”, “Happiness”, “Paradise”, and “Good”. The UAE is a Muslim nation; in Islam this word is frequently used in the context of religious devotion and also well wishing (e.g., God reward you with good). The increased happiness is also attributable to decreased uses of the negative words such as “against”, “die”, and “disadvantage”.

Regarding the English tweets of Abu Dhabi, Fig. 5 lists the top 20 words contributing to the change in the mean happiness, relative to the UAE as a whole. The positive (right) side of the graph shows that Abu Dhabi’s increased happiness is attributable to the less frequent uses of the negative words such as “don’t”, “hate”, “miss”, and “shit” as well as to increased uses of the positive words such as “great” and “amazing”. Reduced usages of profanity and expletives, such as “shit”, may be related to the introduction of a new law (as a part of the cyber-crimes law) enacted in June 2015. The new law threatens fines and custodial sentences for the use of indecent, inappropriate, or abusive language online [32]. The pattern is very similar across all emirates.

Temporal comparisons: diurnal, day of the week, and monthly patterns

In this subsection, tweets are analyzed from a temporal perspective focusing on diurnal, day of the week, and monthly patterns.

Diurnal analysis

Exploring the shifting sentiment (happiness) patterns across a 24-hour period reveals a picture consistent with the previous research [11, 14]. Figure 6 presents the hourly average happiness scores for both Arabic and English across the whole UAE, and in the different cities.

In the context of Arabic, Fig. 6a shows a decrease in happiness for all locations at 12:00 am (midnight). The night time decrease in happiness fits with much of the previous research exploring circadian rhythms, circadian rhythm disruption, and mood disorders [33, 34]. Conversely, peak happiness occurs at around 7:00 am. Examples of words accounting for this positive shift include “Allah”, “The good”, and “Happiness” (Additional file 1: Fig. S2). Circadian rhythms may also explain this positive pattern, furthermore, words such as “faith” and “paradise” may be related to devotional religious morning practices common amongst Muslims (e.g., Fajr prayer). Previous research has linked religiosity with positive emotional states [35].

Between the two languages we can see diurnal differences. For the Arabic tweets (Fig. 6a), there appears to be a rapid decrease in happiness after 7:00 am, whereas the English sentiment patterns (Fig. 6b) remain relatively stable throughout the day with no major peaks or troughs until around midnight. These patterns are difficult to interpret; however, it might reflect demographic (e.g., age, employment status etc.) variables differentially associated with Arabic vs. English use (citizen vs. expatriate status). Additional file 1: Fig. S3, presents the word shift of the happiest hour of UAE and each emirates for English tweets.

Day of the week analysis

Our day of the week analysis also accords with the previous studies [11, 14], whilst simultaneously reflecting the religio-cultural context of the UAE. Figure 7 shows the mean happiness over the 7 days of the week. We can see similar patterns for almost all the seven emirates and the UAE as a whole, with the exception of Tuesday in Arabic. The most intuitively consistent finding here is the fact that Friday is the day associated with the highest levels of happiness in both English and Arabic. Friday is the first day of the weekend in the UAE and much of the Muslim world. Friday is the holiest day of the week and the time of the major congregational prayer. This can be seen in the Arabic word shift analysis (Additional file 1: Fig. S4), with increased uses of the words such as “Allah”, “the paradise” and “Mohammed”, which are positive with religious connotation. Furthermore, many people have Friday as a day off (schools, etc. close). It obvious from english word shift analysis (Additional file 1: Fig. S5), with increased uses of the words such as “photo”, “weekend” and “beach”, which are positive with holiday connotation. Also in accord with the previous sentiment analytic twitter research, the early part of the week appears to be associated with a relatively low level of happiness [14]. In this study, Sunday is generally the low sentiment point of the week and this accords with Sunday also being the first day of the UAE’s working week.

Monthly analysis

Figure 8 present the monthly mean happiness for both the Arabic and the English tweets. From the graph, there is a clear difference in the happiness trends in the two languages. The summertime (June) spike within the Arabic tweets is almost certainly associated with the religious holiday celebrated the Eid. The word shift analysis confirms this, as the keyword associated with June’s elevated positive sentiment is “Allah” (God) and “Eid” (Additional file 1: Fig. S6), Allah word ranked second in positivity in the Arabic sentiment lexicon [28]. Through July and August, happiness appears to decrease. This is in line with the English tweets’ pattern, which is low for the whole of the summer. The September spike in Arabic is arguably attributable to Eid al Adha, a widely celebrated feast at the end of the Haj season. The idea of decreased happiness occurring during the summer contradicts several previous findings like [36]. However, the UAE climate is such that summer is unbearably hot and humid with temperatures reaching 48 °C and humidity reaching 90%. Previous studies have also found the UAE summer to be associated with seasonal increases in depressive symptoms and reduced levels of vitamin D [12].

Analysis of specific events

Hedonometer was also used to identify days associated with particularly positive or negative sentiment scores. We then attempted to link these days with events of sociocultural significance. The identified days and associated events, for English and Arabic, are detailed in Table 2. We excluded the tweets from 2013 and 2014 since there is not enough tweets. In Arabic, all of the happiest events, highest happiness score computed using Eq. 1, are religious festivals such as Eidul Fitr, Eidul Adha, and the day before Ramadan. Conversely, the saddest days, lowest happiness score computed using Eq. 1, were associated with major incidents and unpleasant news stories. For the English tweets, the happiest events are religious, National Day, and New Year and there is not a clear events associated with the saddest ones.

Table 2 Dates and the main events associated with Arabic and English tweets

Full size table

Conclusion

This study presents the first comprehensive analysis of the UAEs happiness using both Arabic and the English Twitter data. We observe differences between the Arabic and the English users in various areas. For example, monolingual Arabic users produce relatively fewer tweets and have more followers than there monolingual English user counterparts. Also, Arabic users demonstrate different seasonal and diurnal sentiment patterns. We observed that there is a one-hour difference in the morning peak between the Arabic (7:00 am) and the English (6:00 am) tweets. However, 7:00 am is the happiest hour during the day for both languages. Moreover, Friday is found to be the happiest day in both languages likely due to Fridays public holiday status in UAE. June and September are the happiest months based on Arabic tweets, attributable to the Eidul and summer holiday. Based on English tweets, October and December are the happiest months. This probably attributable to the more tolerable weather (drop in temperature and humidity), and the happiness in December is arguably attributable to Christmas Eve and the New Year celebration. These explorations can potentially have fairly widespread societal applications, such as:

1.
Exploring geographic variables and social media sentiment to inform urban planning and social policy.
2.
Exploring reactions and sentiment to particular events to help inform government policy and decision-making.

One limitation in the present study is that the UAE is composed of a highly international population, with expatriates hailing from many nations. Being unable to determine user nationality, gender and age limit our ability to fully interpret the data. Future studies may be able to apply heuristics or machine-learning algorithms to deduce these essential demographic variables. However, despite these limitations, this study demonstrates that cross-linguistic sentiment analysis in the context of the UAE produces findings that are consistent with expectations based on sociocultural norms and with those of previous research in other nations [11, 14]. Future sentiment analytic techniques applied to Arabic need to consider the languages orthographic polysemy (written words, especially common names, having many meanings). One way to address this would be to remove all polysemous first names from the lexicon, for example Saeed is widely used as a males name in the UAE and it also means happy, Amal means hope and it is also a very common female name. Furthermore, written in Arabic, common male names such as Saifullah and Abdullah would contain the word Allah written separately from the qualifying prefix e.g. Abd Allah, Saif Allah, Dhikhr Allah. Such religo-culturally rooted naming conventions are potentially giving an inflated positivity score. For an obvious comparison, English would not commonly use the words hope, happy and God as proper names. Incorporating this cultural and linguistic awareness into future word-level algorithms will make Arabic sentiment analytic tools far more sensitive going forwards.

References

Bian J, Yoshigoe K, Hicks A, Yuan J, Zhe He, Mengjun Xie, Yi Guo, Mattia Prosperi, Ramzi Salloum, Franois Modave. Mining Twitter to assess the public perception of the “Internet of Things”. PLoS ONE. 2016;11(7):e0158450.
Article Google Scholar
Kwak H, Lee C, Park H, Moon S. What is Twitter, a social network or a news media? In: Proceedings of the 19th international conference on world wide web. p. 591–600, Raleigh, North Carolina, USA, 2010; New York: ACM.
Java A, Song X, Finin T, Tseng B. Why we Twitter: Understanding microblogging usage and communities. In: Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on web mining and social network analysis. p. 56–65, San Jose, California, USA, 2007; New York: ACM.
Mehrotra R, Sanner S, Buntine W, Xie L. Improving LDA topic models for microblogs via tweet pooling and automatic labeling. In: Proceedings of the 36th international ACM SIGIR conference on research and development in information retrieval. p. 889–892, Dublin, Ireland, 2013; New York: ACM.
Bollen J, Mao H, Zeng X. Twitter mood predicts the stock market. J Comput Sci. 2011;2(1):1–8.
Article Google Scholar
Philander K, Zhong YY. Twitter sentiment analysis: capturing sentiment from integrated resort tweets. Int J Hosp Manag. 2016;55:16–24.
Article Google Scholar
Maeve D, Aaron A. Political content on social media: what Americans see, discuss or post. Pew Research Center, http://www.pewinternet.org/2016/10/25/political-content-on-social-media/. Accessed 13 Nov 2016.
Hirschfeld D. Twitter data accurately tracked Haiti cholera outbreak. Nature. 2012. http://www.nature.com/news/twitter-data-accurately-tracked-haiti-cholera-outbreak-1.9770. Accessed 13 Nov 2016.
Butler D. When Google got flu wrong. Nature. 2013;494(7436):155.
Article Google Scholar
Dodds PS, Danforth CM. Measuring the happiness of large-scale written expression: Songs, blogs, and presidents. J Happiness Stud. 2010;11(4):441–56.
Article Google Scholar
Dodds PS, Harris KD, Kloumann IM, Bliss CA, Danforth Christopher M. Temporal patterns of happiness and information in a global social network: hedonometrics and Twitter. PLoS ONE. 2011;6(12):e26752.
Article Google Scholar
Thomas J. Psychological well-being in the Gulf states: The New Arabia Felix. London, UK: Palgrave Macmillan; 2014.
Google Scholar
Watson D, Tellegen A. Toward a consensual structure of mood. Psychol Bull. 1985;98(2):219–35.
Article Google Scholar
Golder SA, Macy MW. Diurnal and seasonal mood vary with work, sleep, and daylength across diverse cultures. Science. 2011;333:1878–81.
Article Google Scholar
Mohammad SM. From once upon a time to happily ever after: tracking emotions in mail and books. Decis Support Syst. 2012;53(4):730–41.
Article Google Scholar
Li J, Wang X, Hovy E. What a nasty day: exploring mood-weather relationship from Twitter. In: Proceedings of the 23rd ACM international conference on information and knowledge management. p. 1309–1318, Shanghai, China, 2014; New York: ACM.
Kouloumpis E, Wilson T, Moore JD. Twitter sentiment analysis: The Good the Bad and the OMG! In: Proceedings of the 5th international conference on weblogs and social media. p. 538–541, Barcelona, Catalonia, Spain, 2011; Melon Park: AAAI Press.
Yu Y, Wang X. World Cup 2014 in the Twitter world: a big data analysis of sentiments in US sports fans’ tweets. Comput Hum Behav. 2015;48:392–400.
Article Google Scholar
Rao Y, Li Q, Wenyin L, Qingyuan W, Quan Xiaojun. Affective topic model for social emotion detection. Neural Netw. 2014;58:29–37.
Article Google Scholar
Quercia D, Ellis J, Capra L, Crowcroft J. Tracking “gross community happiness” from tweets. In: Proceedings of the ACM 2012 conference on computer supported cooperative work. p. 965–968, Seattle, Washington, USA, 2012; New York: ACM.
Yang C, Srinivasan P. Life satisfaction and the pursuit of happiness on Twitter. PLoS ONE. 2016;11(3):e0150881.
Article Google Scholar
Hu M, Liu B. Mining and summarizing customer reviews. In Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining. p. 168–177, Seattle, Washington, USA, 2004; New York: ACM.
Country cooperation strategy for WHO and the United Arab Emirates 2005–2009. Technical report, World Health Organization, Cairo, Egypt, 2006; 1
PPP (current international \$). World Development Indicators database, World Bank. Database updated on 1 July 2017. Accessed 02 July 2017.
Methodology of estimating the population in UAE. Technical report, National Bureau of Statistics, UAE, 2010. http://www.uaestatistics.gov.ae/ReportPDF/Population%20Estimates%202006%20-%202010.pdf. Accessed 16 Jan 2017.
Christopher D. Higher education in the Gulf: a historical background. In: Higher education in the Gulf states: shaping economies, politics and culture. p. 23–40. 2008.
Cody EM, Reagan AJ, Mitchell L, Dodds PS, Danforth CM. imate change sentiment on Twitter: an unsolicited public opinion poll. PLoS ONE. 2015;10(8):e0136092.
Article Google Scholar
Sheridan DP, Clark EM, Desu S, Frank MR, Reagan AJ, Williams JR, Mitchell L, Harris KD, Kloumann IM, Bagrow JP, Megerdoomian K, McMahon MT, Tivnan BF, Danforth CM. Human language reveals a universal positivity bias. Proc Natl Acad Sci. 2015;112(8):2389–94.
Article Google Scholar
Islamic rulings for new born babies>> 3. Naming the newborn, 2017. IslamicLessons.com, http://islamiclessons.com/blog/islamic-rulings-for-new-born-babies-3-naming-the-newborn/#refs. Accessed 16 Jan 2017.
United Arab Emirates Population, 2019. http://worldpopulationreview.com/countries/united-arab-emirates-population/ . Accessed 09 Mar 2019.
Population and vital statistics, 2016. Dubai Statistics Center, Government of Dubai, https://www.dsc.gov.ae/en-us/Themes/Pages/Population-and-Vital-Statistics.aspx?Theme=42. Accessed 16 Jan 2017.
New UAE online law: Dh250,000 fine for swearing on WhatsApp, 2015. Emirates24|7, http://www.emirates247.com/news/emirates/new-uae-online-law-dh250-000-fine-for-swearing-on-whatsapp-2015-06-16-1.593945. Accessed 08 Dec 2016.
Jones SH. Circadian rhythms, multilevel models of emotion and bipolar disorder—an initial step towards integration? Psychosis. 2001;21(8):1193–209.
Google Scholar
Wehr TA, Wirz-Justice A. Circadian rhythm mechanisms in affective illness and in antidepressant drug action. Pharmacopsychiatry. 2008;15(01):31–9.
Article Google Scholar
Abdel-Khalek Ahmed M. Religiosity, health, and well-being among Kuwaiti personnel. Psychol Rep. 2008;102(1):181–4.
Article Google Scholar
Thomas J, Anouti FA, Hasani SA, Abdel-Wareth L, Haq Afrozul. Sunshine, sadness and seasonality: 25-hydroxyvitamin D, and depressive symptoms in the United Arab Emirates (uae). Int J Men Health Promot. 2011;13(1):23–6.
Article Google Scholar

Download references

Authors' contributions

AA, JT conceived of the presented idea. AA developed the theory and performed the computations. JT aided in interpreting the results. AA, JT wrote the manuscript with support from RW, IG, ZA. JT, IG provided the dataset. All authors discussed the results and contributed to the final manuscript. All authors read and approved the final manuscript.

Acknowledgements

We would like to thank Pegasus FZ LLC for assistance with the extraction of Twitter data.

Competing interests

The authors have declare that they have no competing interests.

Availability of data and materials

Data cannot be made publicly available due to comply with the Twitter terms of service.

Funding

This research work was funded by Khalifa University of Science and Technology, Masdar Institute, Abu Dhabi, United Arab Emirates, and Zayed University.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Author information

Authors and Affiliations

Electrical and Computer Engineering, Khalifa University, Abu Dhabi, UAE
Aamna Al Shehhi
Sloan School of Management, Massachusetts Institute of Technology, Cambridge, MA, USA
Aamna Al Shehhi
Department of Psychology, Zayed University, Abu Dhabi, UAE
Justin Thomas
Sloan School of Management and IDSS Center for Statistics and Data Science, Massachusetts Institute of Technology, Cambridge, MA, USA
Roy Welsch
School of Arts and Sciences, Lebanese American University, Beirut, Lebanon
Ian Grey
Department of Computer Science, Khalifa University, Abu Dhabi, UAE
Zeyar Aung

Authors

Aamna Al Shehhi
View author publications
You can also search for this author in PubMed Google Scholar
Justin Thomas
View author publications
You can also search for this author in PubMed Google Scholar
Roy Welsch
View author publications
You can also search for this author in PubMed Google Scholar
Ian Grey
View author publications
You can also search for this author in PubMed Google Scholar
Zeyar Aung
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Aamna Al Shehhi.

Additional file

Additional file 1: Table S1.

Arabic words and its translations. Fig. S1. Tweets distributions of morning and evening words of Arabic and English tweets with morningand evening keywords over hours of the day. Table S2. Peak days for Arabic tweets and most popular hashtags. Table S3. Peak days for English tweets and most popular hashtags. Fig. S2. Arabic Word shift graph for happiest hour for each city. Fig. S3. English Word shift graph for happiest hour for each city. Fig. S4. Arabic word shift graph for the happiest day of the week of each city. Fig. S5. English word shift graph for the happiest day of the week of each city. Fig. S6. Arabic word shift graph for the happiest month of each city.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article

Al Shehhi, A., Thomas, J., Welsch, R. et al. Arabia Felix 2.0: a cross-linguistic Twitter analysis of happiness patterns in the United Arab Emirates. J Big Data 6, 33 (2019). https://doi.org/10.1186/s40537-019-0195-2

Download citation

Received: 14 October 2018
Accepted: 03 April 2019
Published: 15 April 2019
DOI: https://doi.org/10.1186/s40537-019-0195-2

Arabia Felix 2.0: a cross-linguistic Twitter analysis of happiness patterns in the United Arab Emirates

Abstract

Introduction

Related work

Study context and target population

Methodologies

Data collection and preparation

Statistical analysis

Happiness measures

Hedonometer

Valence Shift Word Graph

Results and discussion

Language comparisons: Arabic vs. English

Monolingual users

Bilingual users

Geographical comparisons: Whole UAE vs. Seven Emirates

Temporal comparisons: diurnal, day of the week, and monthly patterns

Diurnal analysis

Day of the week analysis

Monthly analysis

Analysis of specific events

Conclusion

References

Authors' contributions

Acknowledgements

Competing interests

Availability of data and materials

Funding

Publisher’s Note

Author information

Authors and Affiliations

Corresponding author

Additional file

Additional file 1: Table S1.

Rights and permissions

About this article

Cite this article

Share this article

Keywords