Arabia Felix 2.0: a cross-linguistic Twitter analysis of happiness patterns in the United Arab Emirates

The global popularity of social media platforms has given rise to unprecedented amounts of data, much of which reflects the thoughts, opinions and affective states of individual users. Systematic explorations of these large datasets can yield valuable information about a variety of psychological and sociocultural variables. The global nature of these platforms makes it important to extend this type of exploration across cultures and languages as each situation is likely to present unique methodological challenges and yield findings particular to the specific sociocultural context. To date, very few studies exploring large social media datasets have focused on the Arab world. This study examined social media use in Arabic and English across the United Arab Emirates (UAE), looking specifically at indicators of subjective wellbeing (happiness) across both languages. A large social media dataset, spanning 2013 to 2017, was extracted from Twitter. More than 17 million Twitter messages (tweets), written in Arabic and English and posted by users based in the UAE, were analyzed. Numerous differences were observed between individuals posting messages (tweeting) in English compared with those posting in Arabic. These differences included significant variations in the mean number of tweets posted, and the mean size of users networks (e.g. the number of followers). Additionally, using lexicon-based sentiment analytic tools (Hedonometer and Valence Shift Word Graphs), temporal patterns of happiness (expressions of positive sentiment) were explored in both languages across all seven regions (Emirates) of the UAE. Findings indicate that 7:00 am was the happiest hour, and Friday was the happiest day for both languages (the least happy day varied by language). The happiest months differed based on language, and there were also significant variations in sentiment patterns, peaks and troughs in happiness, associated with events of sociopolitical and religio-cultural significance for the UAE.

l events [14].These affective states (positive or negative, high or low arousal) can be expressed in written or spoken language [15], reflected in English by words such as happy, sad excited, bored.Increasingly, people are using social media platforms such as Twitter to express their current status, including direct or indirect references to their affective states [3,16,17].This phenomenon provides researchers with an opportunity to explore affective states as a function of time and place, and with reference to a specified attitude object (e.g.religion, presidential candidates, smoking).The large quantities of data involved in this type of social media exploration have led to the development of sophisticated analytic techniques based on natural language processing.Such affect-focused techniques have generally been referred to as sentiment analysis [18].

Sentiment analysis aims to extract and quantify subjecti e opinion or feelings from the written word, including transcribed speech [6,19].Sentiment analytic techniques have been successfully applied across a wide domain of topics, such as, public sentiment across time, sentiment in response to major events and how people feel towards specific products or services.This type of information can be particularly valuable to commercial entities and those interested in monitoring and prom ting community or even national wellbeing [10].Subjective wellbeing (happiness) at the national and community level has, in recent years, increasingly been viewed as a useful indicator of governmental performance, in some ways, a psychological parallel to fiscal indicators such as gross domestic profit [20].

This interest has led to the developed and refinement of big data sentiment analytic techniques and their application to a broad array of topics.For instance, a technique known as Hedonometer and Valence Shift Word Graphs were used to study the affective linguistic rends of song lyrics and blog posts over time [10].The authors concluded that the happiness (greater expressed positive affect, relative to negative affect) of song lyrics declined from 1960s to 1990s and then remained stable after 1995.However, the happiness of blogs appeared to increase from 2005 until 2009.Numerous other sentiment analytic studies have attempted to explore temporal patterns in affective states over hours, days, months and years.For instance, the authors in [14] studied affective rhythms across 84 countries.They used a tool known as Linguistic Inquiry and Word Count (LIWC), and developed a lexicon for measuring positive and negative affect from Twitter data.They found fairly universal diurnal patterns.Sp

ifically, the
early part of the day was associated with heightened positive affect, which reached a peak around 6:00 am and then decreased over the duration of the day.This study also found that the weekday morning positivity pattern shifts by two hours during the weekend.Other Twitter studies exploring the temporal dynamics of happiness, using the previously mentioned Hedonometer [11], have also observed fairly pronounced weekly patterns; weekends tend to be more positive than early weekdays.Such sentiment analytic temporal explorations are generally consistent with intuitive expectations.

Beyond transient affective states such as happiness, the authors in [21] focused on self-referential tweets using a template driven retrieval strategy to explore life-satisfaction, a more trait-like component of subjective wellbeing.As predicted, Twitter derived estimates of life satisfaction were relatively stable across time (no temporal patterning) and appeared to be uninfluenced by seasonal transition, for example, celebrity deaths or political crises.Additionally, those Twitter users categorized as satisfied, as opposed to dissatisfied, demonstrated patterns in their tweets that were consistent with previous findings in the subjective wellbeing literature.For example, the satisfied expresse significantly more positive and less negative affective words.They also used less profanity, and were significantly more positive about religion than their dissatisfied counterparts.

Several studies have also focused on Twitter deduced sentiment as it pertains to specific events or social occasions.Twitter data were used to explore the affective states of soccer fans in response to the US team during 2014 FIFA World Cup [18].This study used a Word-Emotion Association lexicon and emoticons to detect sentiment.The findings suggest that the expressed emotions of football supporters followed anticipated patterns: fear and anger when the opposing team scored, and anticipation and joy when the US team scored.

In a service-industry context, Twitter data and sentiment analysis have been employe to measure customer satisfaction within the hospitality industry [6].A study undertaken in the metropolitan area of Las Vegas used a sentence level form of sentiment analysis [22] and found that the Twitter derived sentiment correlated well with official, third party, hotel rankings.Similarly, within the world of finance, researchers report correlations between public mood and the state of the stock market [5].In this last case, the authors used Opinion Finder and Google-Profile of Mood States (GPOMS) to calculate Twitter sentiment, while the Dow Jones Industrial Average (DJIA) was used as an indicator for the stock market.The study concluded that variations in peoples affective states (as deduced from the Twitter data) were predictive of stock market values.

Exploring the meteorological relationship to mood, another Twitter study [16] used OpinionFinder sentiment lexicon and Profile of Mood States (POMS) to extract sentiment.This data was then correlated with US.Climate Reference Network (USRCN) and Global Historical Climatology Network (GHCN), which provides a breakdown of temperature, precipitation, snow depth, wind speed and solar energy received.The study found that high temperatures were associated with fatigue, anger and reduced depression, while snow was associated with increased depression, and precipitation was associated with decreased tiredness.

Overall the sentiment analytic explorations of social media data have produced findings that are generally consistent with intuition and often highly convergent with findings obtained via other research methods and data sources To date however, the majority of this work has focused on the English language.There is a need to further extend and explore the validity and reliability of sentiment analytic techniques across sociocultural and linguistic contexts.


Study context and target population

The UAE was formed in 1971.It is a federation of seven states or emirates: Abu Dhabi, Dubai, Ajman, Fujairah, Ras al-Khaimah, Sharjah, and Umm al-Quwain.The last five emirates on this list are sometimes referred to as the Northern Emirates.These emirates north of Dubai are smaller in landmass and population than Dubai and Abu Dhabi and have pursued less aggressive programs of urbanization and modernization [12].Despite minor inter-emirate differences in demography, the nation's official language is Arabic and the state religion is Islam.Abu Dhabi is the largest emirate and home to the nation's capital Abu Dhabi Ci y.It is also the site of the nation's largest oil reserves.

Since the commercial exploitation of oil and gas began in the late 1960s, few nations on earth have witnessed such rapid social and economic development [23].The World Bank ranked the UAE as the fifth richest country in the world in terms of GDP (purchasing power parity) per capita in 2016 [24].This development has attracted a large expatriate workforce and the indigenous citizens (Emiratis) have become a minority in some areas.End of the year population estimates by he UAE National Bureau of Statistics for 2009 suggested that the UAE nationals comprise 11.38% of the UAE's population [25].This rapid development has also included education and the language of instruction across the nations tertiary educational institutions is predominantly English [26].This means that alongside Arabic, English is also widely used in the UAE.


Methodologies

In the methodology section, we present data collection.We then explain the methods we applied to analyze the data and to address the aforementioned research questions.


Data collection and preparation

The UAE historical tweets were purchased directly from Twitter, using the Historical PowerTrack enterprise product.The parameters for querying the tweets are, location (UAE, Abu Dhabi, Dubai etc.) and date.The extracted dataset consists of more than 22 million tweets spanning five years (January 1, 2013, to August 31, 2017).This data comprised a cross-section of tweets from all seven emirates.


Statistical analysis

The first research question aims to investigate the relationship between language (Arabic and English) and Twitter use; we classified Twitter users into two groups: monolingual, and bilingual.The monolingual users are defined as those users who tweet exclusively in either Arabic or English.The bilingual users tweet in both languages (data for participants tweeting in other languages were excluded).For this analysis, we considered a statistical method for comparing wo samples.For Monolingual user: we applied Mann-Whitney U test and Unpaired t-test.Mann-Whitney U test is a nonparametric test for comparing two independent samples when the normality assumption is not satisfied.The test has the hypotheses that:

The unpaired t-test is a parametric test for testing two independent samples when the normality assumption is satisfied.The test has the hypotheses that:

For Bilingual user: we applied Wilcoxon signed rank test and

aired t-test.Wilcoxon signed-rank te
t is a non-parametric test for two dependent samples when the normality assumption is not satisfied.The test has the hypotheses that:

Null Hypothesis, H 0 : the distributions are the same Alternative Hypothesis, H 1 : the distributions are not the same Null Hypothesis, H 0 :
µ 1 − µ 2 = 0 Alternative Hypothesis, H 1 : µ 1 − µ 2 � = 0
Null Hypothesis, H 0 : the distributions are the same Alternative Hypothesis, H 1 : the distributions are not the same The paired t-test is a parametric test for testing two dependent samples when the normality assumption is satisfied.The test has the hypotheses that:


Happiness measures

For the other three research questions, investigating the ha piness/sentiment and geographic differences, we used the Hedonometer algorithm and valence shift word graphs.Hedonometer is based on a language assessment by Mechanical Turk 1.0 (LabMT-1.0)sentiment lexicon.This technique is used to perform a quantitative content analysis for valence (positive vs. negative affect) of each tweet across the Twitter dataset.The second method used, Valence Shift Word Graph, facilitates analysis of how individual words in the dataset contribute to changes or shifts in valence patterns.


Hedonometer

The Hedonometer technique was introduced by Dodds and Danforth [10] and further explained in Dodds et al. [11].The technique is based on using 10,222 English words to calculate the overall emotional content of a given text.The 10,222 words in the baselexicon have previously been ranked for happiness from 1

least happy) t
9 (most happy) [12,23].The mean happiness of the wordlist in the original study was 5.37.Eq. ( 1) explains how the total happiness of a given text is computed [11]:

wh

e T is a given text; N is the nu
ber of unique words in T; w i (1 ≤ i ≤ N ) is a given word in the text; f i is the frequency of the ith word w i ; h avg (w i ) is the average happiness of the word w i ; ∆h avg is the range of words to exclude; ( 5 − ∆h avg < h avg < 5 + ∆h avg .It is a natural tuning parameter, and we can choose ∆h avg = 1 to balance the sensitivity versus the robustness) and p i = f i / N i=1 f i is the normali

d frequency.For this
tudy, we select ∆h avg = 1 ; therefore, we exclude words with scores between 4 and 6 from the analysis following the work of Cody et al. [27].Wordlists exist for several languages, including Spanish, Portuguese, Arabic, Indonesian, Russian, Korean and Chinese [28].In this study, we focus only on the Arabic and the English wordlists.The Arabic wordlist consist of 10,000 words with a mean happiness score of 5.35.There are differences in the mean happiness ratings for similar words across the Arabic and the English wordlists.For example, the word Allah (God) rates 8.38 in Arabic, while the word God scores 7.28 in English.


Valence Shift Word Graph

The Valence Shift Word Graphs [10] can help analyze the impact that pecific events have on sentiment/happiness.This technique allows the identification of changes in the valence and the specific word use that might have con ributed to the shift.This Null Hypothesis, H 0 :
µ d = 0 Alternative Hypothesis, H 1 : µ d � = 0 (1) h avg (T ) = N i=1 h avg (w i )f i N i=1 f i = N i=1 h avg (w i )p i ,
analysis is presented as a graph which ranks the words in absolute decreasing order, based on the relative frequency and valence of the words in the text detailing their contribution to shifts in valence between t

target and referen
e text [10,27].The percentage contribution of the word i to the difference between the text and reference is computed as shown in Eq. 2:

where δ(b, a) = v b − v a , the valence difference.It can take a positive or negative value; p i,b and p i,a are the fractional abundances of the word i in the texts a and b, respectively; v i is the valence of the word i; and v b and v a are the valences of the texts a and b. Figure 1 presents an example of a valence shift word graph.(Note: it is just a generic example).The graph is divided into two sections: right section (positive values) a

left sectio
(negative values) upon the ∆ i (b, a) value.The values in the right section represent the words which correspond to the increase in the happiness and those in the left section the words that correspond to the decrease in the happiness.Increases in the happiness are influenced by two conditions: Increases in the positive word uses or decreases in the negative word uses and visa versa.The orange bars in the graph repres nt the positive (happy) words while the green bars represent the negati e (less happy) words.For instance, the word "love" is represented by a orange bar, which indicates that it is a happy word, and its use has decreased as "love" is shown on the left side of the graph to represent a decrease in the happiness.

(2)
∆ i (b, a) = 100 × (p i,b − p i,a )(v i − v a ) δ(b, a) ,

Results and discussion

The UAE Twitter dataset for five years (January 1, 2013 to August 31, 2017) was used in the present analysis.Each tweets metadata can be classified into two main categories: attributes which are related to the user, and attributes related to the post or tweet itself.User attributes include: number of friends, device display name, number of followers, verification status, summary, user language, preferred user name, and user display name.

The tweet attributes include: tweet text, favorite count, text language, geolocation, location name, and posted time.

The UAE Twitter dataset allow us to explore the demography and tweet patterns of the UAEs Twitter users.Table 1 summarizes the general information about the dataset.There are 70 different languages identi

ed; 45.7% of all the twee
s are written in Arabic with English accounting for 31.5% of the tweets.These are the two most commonly used languages in the dataset.The diversity in the languages reflects the UAEs diverse expatriate population, many of whom do not speak Arabic [12].W rabic and English), 2. Geographical comparisons, exploring the differences in happiness between the whole UAE and the seven UAE emirates: Abu Dhabi, Dubai, Ajman, UAQ, Fujairah, Sharjah and RAK, 3. Temporal comparisons, studying the hourly, daily, and monthly happiness patterns, and 4. Analysis of happiness associated with specific sociopolitical and religio-cultural events.Language comparisons: Arabic vs. English

In this section, we analyze the Arabic and the English Twitter users based on the following criteria: number of tweets, the number of followers, the number of following, and the mean happiness scores (computed using Eq. 1).Before we start our main analysis, we classify the users into two groups: monolingual and bilingual.The monolingual users are defined as those users who tweet exclusively in either Arabic or English.The bilingual users tweet in both English and Arabic.For this analysis, we use time-independent (cross-sectional) tests and compare the users in both groups.


Monolingual users

For the monolingual users, there are 1,829,404 Arabic tweets posted by 85,476 unique users and 3,852,677 English tweets posted by 218,277 unique users.For this analysis, we use the Mann-Whitney U test for the different categories.It is a nonparametric test for comparing two indep

abic users tweeted less
( mean: x = 21.40 , standard deviation: σ = 147.72 ) than the monolingual English users ( x = 17.65 , σ = 127.45).This difference is statistically significant at 95% CI (confidence inter- val) ( p < 0.05 ) with a small effect size ( d = 0.065).-In terms of followers, the monolingual Arabic users have a lesser number of followers on average ( x = 2970.30, σ = 54768.23 ) than their monolingual English counterparts ( x = 3449.24, σ = 78999.25 ).This difference is also significant at 95% CI ( p < 0.05 ) with a small effect size ( d = 0.061).-The number of foll wings also exhibits a similar pattern.The monolingual Arabic users followed more number of other users ( x = 700.80, σ = 8241.16 ) than the monolingual English users ( x = 507.05, σ = 4037.36 ) again with a significant dif- ferent at 95% CI ( p < 0.05 ) and a small effect size ( d = 0.018).

To explore differences in happiness between the monolingual Arabic users and their English counterparts, we employ an Unpaired t-test.This is a parametric test for testing two independent samples when the normality assumption is satisfied.The test reveal that the mon lingual Arabic users are happier/express more positive sentiment ( x = 6.81 , σ = 0.63 ) than their English counterparts ( x = 6.46 , σ = 0.51 ).This differ- ence is statistically significant: 95% CI ( t = 127.5 , p < 0.05 ) with a large effect size ( d = 0.63 ).We hypothesize that this apparent difference in sentiment is a reflection of the cultural and linguistic differences.For instance, in Arabic, positive adjectives are very frequently used as f

st names such as "Saeed" (happy), "Amal"
hope), and "Jameel" (beautiful).Similarly, there is an Islamic prohibition against using negative words as names, for example, "Harb" (war) and "Murrah" (bitter) [29].Furthermore, Arabic is also commonly written without the diacritic marks (tashkeel), used to indicate vowels, therefore there can be a great deal of orthographic polysemy, for example the popular boy name "Obaid" and the word "Abeed" (slave-often used as an insult) would be indistinguishable when written without diacritic marks (a common practice).


Bilingual users

The number of bilingual users, who tweet in bot

Arabic and English
is 60,643.We only analyze the number of tweets the bilingual users tweet in each language.We use the Wilcoxon signed rank test, a non-parametric test for two dependent samples when the normality assumption is not satisfied.The Wilcoxon test indicates that there is a significant difference at 95% CI ( p < 0.05 ) with a small effect size ( d = 0.02 ) with the Arabic tweets being more abundant ( x = 141.03, σ = 641.15 ) than the English tweets ( x = 54.64 , σ = 340.98 ) among the bilingual users.

For analysis of the mean happiness scores of each user's posts, we use a Paired t-test.The Arabic tweets were again associated with a higher happiness score ( x = 6.67 , σ = 0.40 ) than the English tweets ( x = 6.45 , σ = 0.50 ).The difference is significant at the 95% CI ( t = 74.2, p < 0.05 ) with a large effect size ( d = 0.49 ).Again, the reason for this difference can be attributed to the linguistic features of Arabic as mentioned in the previous paragraph.The mean happiness scores in both languages is above the midway point, which accords with the idea of the linguistic positivity bias [28].


Geographical comparisons: Whole UAE vs. Seven Emirates

In this section, we first investigate the patterns of happiness across the UAE as a whole, and then we focus our analysis on the seven emirates: Abu Dhabi, Dubai, Ajman, UAQ, Fujairah, Sharjah, and RAK.We use the word shift graphs to show how particular words contribute to the differences in sentiment between either time points or emirates.For the Arabic words, we use their English translations throughout this section of the paper; however, a glossary of the original Arabic words can be found in Additional

ile 1: Table S1.
Before our primary study, we first conducted a brief validity check for some commonsense words for both the Arabic and the English tweets in line with the previous studies [11,21].The details are given in section Validity Check in Additional file 1.

The primary analysis began by investigating the daily frequency of tweets in Fig. 2. Figure 2a presents Arabic tweet frequency; the order of the lines is expected because Abu Dhabi, Dubai, and Sharjah have the largest populations, as reported by World Population Review [30].We can also see that there is a relative dearth of tweets until mid-2014; which indicates the time that the Twitter platform gained popularity in the area.Using Arabic, for the UAE as a whole, peaks in tweet volume correspond to important events such as Eidul Fiter, Eidul Adha and, Ramadan, and UAE National Day.A large contributor to the elevated frequency of the tweets on these days can be attributed to the trending of hashtags: for example, on July 06, 2016 the hashtag #eaydakum_mubarak accounted for 201 tweets, and on September 12, 2016 the hashtag #Eid_Adha accounted for 56 tweets.Additional file 1: Table S2, presents the highest peak days for

he Arabic tweets and the most popular hashtags for Whol
UAE as well as for each emirate.

Figure 2b shows the frequency of the English tweets.The order of the lines is as expected because Dubai has by far the most expatriate-dense population as indicated by the larger number of English monolinguals in our dataset and confirmed by the demographic statistics from the Dubai Statistics Center (DSC) [31].The UAE also has a large Filipino population and the popularity of a Filipino TV show seems to have had a large impact on English tweet frequency.A large contributor to the elevated fr quency of the tweets on these days can be attributed to the trending of hashtags for example on January 1, 2017 the hashtag #happynewyear accounted for 255 tweets, and on October 24, 2015 the hashtag #aldubebtamangpanahon elucidated for 6884 tweets.Ad itional file 1: Table S5, presents the highest peak days for the English tweets and the most popular hashtags for the UAE and each of the constituent seven emirates.Now, we analyze the wellbeing of the UAE after averaging the users' daily happiness scores to remove biases.Figure 3 shows the mean happiness computed using Eq. 1 for both the Arabic and the English tweets over the days throughout the five-year period.The row represents the happiness for the whole UAE and each emirate separately.The pattern is similar across all emirates.The average happiness has high fluctuation dissipation relations for a lower number of tweets as indicated in Fig. 2. Figure 3a presents the daily mean happiness scores for the Arabic tweets.The overall average score for the whole UAE is 6.611, and fairly similar for Abu Dhabi, Dubai, Ajman, UAQ, Fujairah, Sharjah, and RAK with scores of 6.614, 6.607, 6.621, 6.644, 6.665, 6.624, and 6.643, respectively.

Figure 3b shows the daily mean happiness scores for the English tweets.The overall average score for the whole UAE is 6.24, and for Abu Dhabi, Dubai, Ajman, UAQ, Fujairah, Sharjah, and RAK are 6.261, 6.312, 6.24, 6.344, 6.38, 6.22, and 6.329, respectively.Theses slight differences are not statistically significant.

The Arabic tweets are associated with higher mean happiness scores than the English tweets, and this is most likely due to the same linguistic reasons mentioned above in the analysis of the monolingual(Arabic/English) users in "Results and discussion" section.Even though there are no statistically significant differences between the seven different emirates or the UAE as a whole, it is still worth exploring their word shift graphs to identify the words that provide the greatest contributions to the respective happiness scores.The word shift graphs in Figs. 4 and 5 compare the happiness of the tweets of the seven emirates against the whole UAEs tweets as a reference.These graphs show the words that contribute to the shift in the mean happiness of each emirate with reference to the UAE as a whole.

Figure 4 depict the contributions of the words to the mean happiness score of Abu Dhabi, based on Arabic (Fig. 4) and English (Fig. 5) tweets in comparison with the UAE as a whole.

Focusing on the Arabic tweets of Abu Dhabi, it is observed that Abu Dhabi happiness on average is similar to other emirates.The positive (right) side of the graph in Fig. 4 shows that Abu Dhabi's attenuated happiness is most largely attributable to a increase Arabic Tweets Wordshift Fig. 4 Arabic word shift graph (with reference to whole UAE) for the seven emirates in the positive word "Mohammed", "Happiness", "Paradise", and "Good".The UAE is a Muslim nation; in Islam this word is frequently used in the context of religious devotion and also well wishing (e.g., God reward you with good).The increased happiness is also attributable to decreased uses of the negative words such as "against", "die", and "disadvantage".

Regarding the English tweets of Abu Dhabi, Fig. 5 lists the top 20 wor s contributing to the change in the mean happiness, relative to the UAE as a whole.The positive (right) side of the graph shows that Abu Dhabi's increased happiness is attributable to the less frequent uses of the negative words such as "don't", "hate", "miss", and "shit" as well as to increased uses of the positive words such as "great" and "amazing".Reduced usages of profanity and expletives, such as "shit", may be related to the introduction of a new law (as a part of the cyber-crimes law) enacted in June 2015.The new law threatens fines and custodial sentences for the use of indecent, inappropriate, or abusive language online [32].The pattern is very similar across all emirates.


Temporal comparisons: diurnal, day of the week, and monthly patterns

In this subsection, tweets are analyzed from a temporal perspective focusing on diurnal, day of the week, and monthly patterns.


Diurnal analysis

Exploring the shifting sentiment (happiness) patterns across a 24-hour period r veals a picture consistent with the previous research [11,14].Figure 6 presents the hourly English Tweets Wordshift 5 English word shift graph (with reference to whole UAE) for the seven emirates average happiness scores for both Arabic and English across the whole UAE, and in the different cities.

In the context of Arabic, Fig. 6a shows a decrease in happiness for all locations at 12:00 am (midnight).The night time decrease in happiness fits with much of the previous research exploring circadian rhythms, circadian rhythm disruption, and mood disorders [33,34].Conversely, peak happiness occurs at around 7:00 am.Examples of words accounting for this positive shift include "Allah", "The good", and "Happiness" (Additional file 1: Fig. S2).Circadian rhythms may also explain this positive pattern, furthermore, words such as "faith" and "paradise" may be related to devotional religious morning practices common amongst Muslims (e.g., Fajr prayer).Previous research has linked religiosity with positive emotional states [35].

Between the two languages we can see diurnal differences.For the Arabic tweets (Fig. 6a), there appears to be a rapid decrease in happiness after 7:00 am, whereas the English sentiment patterns (Fig. 6b) remain relatively stable throughout the day with no major peaks or troughs until around midnight.These patterns are difficult to interpret; however, it might reflect demographic (e.g., age, employment status etc.) variables differentially associated with Arab

vs. English use (citizen vs. expatriate status).Additional file 1: F
g. S3, presents the word shift of the happiest hour of UAE and each emirates for English tweets.


Day of the week analysis

Our

ay of the week an
lysis also accords with the previous studies [11,14], whilst simultaneously reflecting the religio-cultural context of the UAE. Figure 7 shows the mean happiness over the 7 days of the week.We can see similar patterns for almost all the seven emirates and the UAE as a whole, with the exception of Tuesday in Arabic.The most intuitively consistent finding here is the fact that Friday is the day associated with the highest levels of happiness in both English and Arabic.Friday is the first day of the weekend in the UAE and much of the Muslim world.Friday is the holiest day of the week and the time of the major congregational prayer.This can be seen in the Arabic word Fig. 6 Tweets hourly average happiness shift analysis (Additional file 1: Fig. S4), with increased uses of the words such as "Allah", "the paradise" and "Mohammed", which are positive with religious connotation.Furthermore, many people h ve Friday as a day off (schools, etc. close).It obvious from english word shift analysis (Additional file 1: Fig. S5), with increased uses of the words such as "photo", "weekend" and "beach", which are positive with holiday connotation.Also in accord with the previous sentiment analytic twitter research, the early part of the week appears to be associated with a relatively low level of happiness [14].In this study, Sunday is generally the low sentiment point of the week and this accords with Sunday also being the first day of the UAE's working week.


Monthly analysis

Figure 8 present the monthly mean happiness for both the Ar

ic and the English tweets
From the graph, there is a clear difference in the happiness trends in the two  languages.The summertime (June) spike within the Arabic tweets is almost certainly associated with the religious holiday celebrated the Eid.The word shift analysis confirms this, as the keyword associated with June's elevated positive sentiment is "Allah" (God) and "Eid" (Additional file 1: Fig. S6), Allah word ranked second in positivity in the Arabic sentiment lexicon [28].Through July and August, happiness appears to decrease.This is in line with the English tweets' pattern, which is low for the whole of the summer.The September spike in Arabic is arguably attributable to Eid al Adha, a widely celebrated feast at the end of the Haj season.The idea of decreased happiness occurring during the summer contradicts several previous findings like [36].However, the UAE climate is such that summer is unbearably hot and humid with temperatures reaching 48 °C and humidity reaching 90%.Previous studies have also found the UAE summer to be associated with seasonal increases in depressive symptoms and reduced levels of vitamin D [12].


Analysis of specific events

Hedonometer was also used to identify days associated with particularly positive or negative sentiment scores.We then attempted to link these days with events of sociocultural significance.The identified days and associated events, for English and Arabic, are detailed in Table 2.We excluded the tweets from 2013 and 2014 since there is not enough tweets.In Arabic, all of the happiest events, highest happiness score computed using Eq. 1, are religious festivals such as Eidul Fitr, Eidul Adha, and the day before Ramadan.Conversely, the saddest days, lowest happiness score computed using Eq. 1, were associated with major incidents and unpleasant news stories.For the English tweets, the happiest events are religious, National Day, and New Year and there is not a clear events associated with the saddest ones.


Conclusion

This study presents the first comprehensive analysis of the UAEs happiness using both Arabic and the English Twitter data.We observe differences between the Arabic and the English users in various areas.For example, monolingual Arabic users produce relatively fewer tweets and have more followers than there monolingual English user counterparts.Also, Arabic users demonstrate different seasonal and diurnal sentiment patterns.We observed that there is a one-hour difference in the morning peak between t

nd the English (6:00 am) twe
ts.However, 7:00 am is the happiest hour during the day for both languages.Moreover, Friday is found to be the happiest day in both languages likely due to Fridays public holiday status in UAE.June and September are the happiest months based on Arabic tweets, attributable to the Eidul and summer holiday.

Based on English tweets, October and December are the happiest months.This probably attributable to the more tolerable weather (drop in temperature and humidity), and the happiness in December is arguably attributable to Christmas Eve and the New Year celebration.These explorations can potentially have fairly widespread societal applications, such as:

1. Exploring geographic variables and social media sentiment to inform urban planning and social policy.

2. Exploring reactions and sentiment to particular ev

ts to help
nform government policy and decision-making.

One limitation in the present study is that the UAE is composed of a highly international population, with expatriates hailing from many nations.Being unable to determine user nationality, gender and age limit our ability to fully interpret the data.Future studies may be able to apply heuristics or machine-learning algorithms to deduce these essential demographic variables.However, despite these limitations, this study demonstrates that cross-linguistic sentiment analysis in the context of the UAE produces findings that are consistent with expectations based on sociocultural norms and with those of previous research in other nations [11,14].Future sentiment analytic techniques applied to Arabic need to consider the languages orthographic polysemy (written words, especially common names, having many meanings).One way to address this would be to remove all polysemous first names from the lexicon, for example Saeed is widely used as a males name in the UAE and it also means happy, Amal means hope and it is also a very common female name.Furthermore, written in Arabic, common male names such as Saifullah and Abdullah would contain the word Allah written separatel from the qualifying prefix e.g.Abd Allah, Saif Allah, Dhikhr Allah.Such religo-culturally rooted naming