Social media analysis of car parking behavior using similarity based clustering

This paper investigates car parking users’ behaviors from social media perspective using social network based analysis of online communities revealed by mining the associated hashtags in Twitter. We propose a new interpretable community detection approach for mapping user’s car parking behavior by combining Clique, K-core and Girvan–Newman community detection algorithms together with a content-based analysis that exploits polarity, relative frequency and dominant topics. Twitter API was used to collect relevant data by tracking popular car-parking hashtags. A social network graph is constructed using a similarity-based analysis. Finally, interpretable communities are inferred by monitoring the outcomes of clique, K-core and Girvan–Newman community detection algorithms. This interpretability is linked to the aggregation of keywords, hashtags and/or location attributes of the tweet messages as well as a visualization module that enables interaction with users. In parallel, a global trend analysis investigates parking types and Twitter influence with respect to both sentiment polarity and dominant trends (extracted using KeyBERT based approach) is performed. The implementation of this social media analytics has uncovered several aspects associated to car-parking behaviors. A comparison with some state-of-the-art community detection methods has also been carried out and revealed some similarities with our developed approach.

Loosely speaking, the presence and/or absence of car parking infrastructures in dense areas can affect city traffic, transportation ecosystem and emissions [4], which increases pollution [5], and can cause driver's frustration. Moreover, location and availability of parking lots can have significant impact on the surrounded businesses ecosystem of the city [6]. For instance, the location of car parking infrastructure and its scale are found to affect the urban life in the vicinity area [7]. With the emergence of online platforms that enable user's generated content with a single click, users' encountered parking problems can be reported through online platforms and social media services such as online reviews, tweets, or posts. Analyzing the content of these posts can unfold various aspects of user's car parking behavior and preferences such as parking time, length of stay, payment preference, car sharing potential, business incentives, opinion about public transportation system, among others. Similarly, reading consumer's reviews about parking can influence user's future choices, company's planning and reputation as well as city planning [8].
In this context, social media grants a new class of communication models that allow people to express their thoughts freely about any subject/topic, to create and build communities or groups in an interactive and participator manner [9], which provides useful insights for community, policy-makers and researchers [10,11]. The development of hashtag [constituted of a keyword or a phrase following the symbol (#)] based community construction, initiated by micro-bloggers to create a flow of information around a particular topic or trend, seeking contributions from other users, offers an appealing framework to discuss car-parking issues and users' behaviors. This partly motivates our work in this paper, which aims to investigate the car parking ecosystem by analyzing the structure of online communities induced by appropriate hashtags in Twitter. The choice of Twitter is justified by its ease access data using various Twitter APIs as well as the fact that many professional organizations maintain active presence in Twitter together with the maturity of related data analytical tools [12,13]. The collected dataset includes attributes like tweet messages, user ID, screen names and hashtags, which are then processed and adapted for applying social network and graph theory techniques in order to detect and identify relevant communities using an innovative interpretable social mining based strategy. The outcomes enable us to uncover hidden latent variables and parking issues that cannot be known straightforwardly to policy-makers and urban planners. Specifically, the motivation grounds for this work are at least threefold. First, as pointed out in [14], empirical knowledge about habitual behavior in the transportation literature is limited and mostly restricted to mode choice behavior and repetitive behavior in comprehensive activity-travel patterns, which calls on further research on the issue. Second, there are a variety of stakeholder groups that will benefit from this research. Indeed, any new knowledge in terms of drivers' parking behavior would enable (i) policy planners to better monitor and refine policy accordingly; (ii) law enforcement officers to better identify likely scenarios of parking violation occurrence; (iii) city planners to better optimize existing resources; among others. For instance, it matters to know how the users are willing to park their car far from their destination, what factors that motivate users to accept multi-modal transportation, car-sharing or incentives, how user's demographic attributes impact their parking choices and decisions. Third, the use of hashtag-based analysis enables us to further scrutinize the penetration of such social media data into transportation research. This can serve as a useful tool to generating traffic-relevant cues that help understand the root causes that prevent the public from and explore strategies for achieving the Target Zero goal [15]. The main contributions of the paper are fourfold.
• First, an approach for constructing a social network graph [16] from the hashtag dataset is constructed and analyzed in terms of characteristics of the underlined communities. The construction makes use of similarity score among tweet messages in the sense of Jaccard measure at token level and a threshold value inferred from the analysis of the giant component of the corresponding network graph. • Second, an approach for revealing interpretable communities that makes use of distribution of common keywords, hashtags and location of users is revealed and implemented. • Third, a global trend analysis that investigates both different parking types and engaging social media data is proposed. The analysis makes us of both polarity trend and discussion topics as revealed by KeyBERT-based method [17]. • Fourth, the application of the developed approach enables us to identify factors that affect user's decision in terms of parking and elicit driver's preferences in a way to help policy-makers design appropriate urban planning policies.
"Background" section of this paper describes the background of this research and some state-of-the-art approaches. "Data and method" section emphasizes data collection approach and the data analytics method. In "Results and discussion" section, the results are highlighted and discussed with respect to some related work in literature. Finally, a conclusion and perspective work are reported in "Conclusion" section.

Background
Work in parking behavior has been promoted after the pioneering and seminar work of Polak and Axhausen [18], who, based on interviews with drivers, formulated eight tactical heuristics of parking search that drivers choose depending on available parking facilities, occupation rate, prices and expected dwelling time. The multiplication of transportation policies and urban development with the associated high demand for parking supply has opened the door wide for further research in the field. The sensitivity of parking behavior to pricing has been subject to several studies to comprehend the association between parking cost and user's behavior [19,20] who considered the aggregate consequences of changes in parking prices. Lehner and Peer [21] summarized the results of more than 50 studies regarding the price elasticity of three factors-parking occupancy, parking dwell time and parking volume. The authors showed that drivers are more sensitive to parking prices when alternative transportation modes are available, whereas daily commuters are the least sensitive among driver groups to parking prices. On the other hand, with the availability of large scale dataset, gathered through questionnaire, mobile apps, social media or smart parking systems, 1 research on applications of data analytics, machine learning or statistical inferences to parking behavior analysis has seen a renewal interest. Some of these interesting and related studies are summarized below.
Mondschein et al. [22] evaluated users' sentiment that characterizes existing car parking supply as reflected by the online reviews related to parking collected from Yelp restaurants reviews in Phoenix, Arizona regions. In their study, the sentiment analysis revealed that a negative emotion is often associated with parking reviews. In addition, this has shown to affect many businesses in the region where low rating scores have been recorded in the vicinity of car parking locations. Zhang et al. [23] adapted the Bayesian network approach to analyze the individuals' parking behavior by standing on the multi-information. They focused on investigating the impact of some parking factors that influence the parking search decision, such as parking fees, discounts, and drivers' preferences when choosing a parking space. They found that younger vehicle owners and women are more likely to select parking lots with a parking fee discount. Spiliopoulou et al. [24] analyzed the parking behavior from the legacy perspective using a staged dataset, which allowed them to perform multiple timely comparisons in order to identify the factors that cause and increase/decrease the illegal parking phenomena in Greece. The study revealed the tendency of people to park as near as possible to destination regardless of legal or illegal parking spaces, encouraged by inadequate lot capacity and low enforcement level. Likewise, Aljoufie [25] investigated illegal parking topic and its behavior in the Jeddah region to identify sites and periods of the days where illegal parking cases occur. In the same spirit, Meng et al. [26] considered Wuhu region in China as a case study to investigate the parking behavior and its characteristics through a set of field observations about parking spots utilization. Their findings revealed issues of high cost in space renting and a lousy parking management system. The study has also proposed some protocols and solutions to overcome the already detected problems, such as optimizing parking layouts and smart car parking management systems. Zong et al. [27] investigated drivers' preferences in choosing specific parking lots and the impact of fee discounts in Beijing area using a Bayesian network based approach. Especially, they applied structural equation modeling method to reveal the impact of some parking attributes such as family influence and parking fees on the parking decision. Their results showed the importance of family ties and preferences on parking choices. They also showed a direct correlation between the parking cost and parking duration. Feng et al. [28] studied the possibility of predicting parking behavior in Ningbo, Zhejiang city of China using 396-day parking data from shopping mall. They showed that random forest classifier achieves best parking behavior prediction accuracy of 89%. In [29], the authors focused on the on-street parking in Rajkot city in India aiming at identifying parking rates between various land-use patterns using some empirical analysis based approach. The data were collected using license plate inventory at different time intervals. In [30], the authors investigated parking occupancy with respect to user's choice and preference using a questionnaire like analysis that involves a number of social and demographic patterns (e.g., parking price, trip purpose, on-street versus off-street). Using a linear regression-based method, the authors provided an estimate of the parking demands and related parking characteristics that impacts the drivers decisions such as the distance from parking place to the destination, parking lot availability, among others. Chen et al. [31] analysed the choice behaviour of people for surface parking lot using fuzzy multiple attribute decision making process for optimal parking space choice. Ni and Sun [32] advocated agent-based modelling approach to assess the impact of parking reservation system (PRS) on parking behaviour. Gaming theory has also been applied to uncover some insights regarding parking behavior. In this context, Bonsall and Palmer [33] developed traffic simulator to estimate drivers' reactions to parking prices and off-street parking facilities. While, Ben-Elia and Avineri [34] proposed the PARKGAME serious game platform to gain in-depth insight into driver parking behavior. The preceding demonstrates the usefulness of data mining technologies, including machine learning, social media analytics and gaming to understand public opinion associated to parking behavior and transportation research. Table 1 summarizes some of existing work in the field of big-data car parking analytic are summarized in Table 1.
Apart from car-parking domain, the application of social network analysis to uncover user behavior and patterns. For instance, Kanavos et al. [35] explored the relationship between user behavior and their emotions using Twitter data and social network analysis. Their method evaluate the influence of user actions and behaviors by modeling and identifying communities based on the level of influence. Similarly, Li et al. [36] applied social network principles and hierarchical clustering to identify various communities associated with distinct facets of user behaviors. Their approach uses the followers and following relationships to create social network graph and then track personnel tags posted by the users. Opinion community and opinion leader detection are explored in [37]. In the opinion community leader model, a social network is constructed to map users' thoughts and interactions with opinion community. Various competing models were tested in a cloud environment where the results demonstrate the performance of opinion detection communities.

Data collection and preparation
The dataset used in this study is collected using Twitter Streaming API. The GetOldT-weets3 2 python library is used for data scraping. Three leading car parking related hashtags were used in the queries made to Twitter API: #parking, #parkinggarage, and #parkingspot. The choice of these hashtags is motivated by their high exposure rate (as quantified by https:// best-hasht ags. com/ hasht ag/ expos ure/) and relevance in terms of their car parking content. Next, multiple attributes were collected for approximately four months, starting from 1st January 2020 until 11th April 2020. The dataset includes the user's Identifier (ID), the screen name, the tweet text, the hashtags, the location (if available), and the time of the tweet. In overall, the dataset contains 10551 tweets related to parking. It should be noted that although specific hashtags were used for Twitter data, They revealed that individuals tend to use their cars in business areas, and in a regular way Only some parking issues and aspects are treated. Limited to analyse the parking behavior in business area. The size of the data used in the study is small and does nor reflect official national statistics it often occurs that the collected Twitter data include mentioning to other hashtags as well, which explains the large-scale dataset of hashtags collected as well.
In order to utilize the collected Twitter dataset and explore its content, an initial preprocessing stage is necessary in order to filter out noisy terms and normalize the content in a way to maximize the outcomes of standard NLP modules. This task follows the standard text mining approach, which starts by converting to lowercase characters all tweet text, screen names and hashtags, then a tokenization task was used to distinguish various tokens in tweet text message. Next, noisy terms including stop-words 3 were removed, together with punctuation and non-desired characters. This process excludes some important characters (like @) and User-IDs as this is required to distinguish retweets and Tweet identities.

Parking global trend analysis
In this part of the approach, the aim is to explore the parking trends and preferences of the users and deliver a global view of the users' demands, likes, and dislikes regarding parking search decisions. For this purpose, two techniques have been utilized. The first one makes use of sentiment analysis using SentimentVader [43] capitalizing on the valuable insights that can be inferred from tweet message content in terms of positive and negative polarity. The second method explores the content of tweet messages in terms of generic trends and topical description. For this purpose, we used the deep-learning architecture provided by KeyBERT [17], which uses the pretrained model of BERT for a keyword extraction from textual sources. In essence, KeyBERT creates N-gram elements, then uses cosine similarity to measure the similarities between each candidate answer and the tweets document, so that only highly scored candidates are preserved.
Furthermore, we considered two-level of analysis: 1. Parking type-based analysis. In this case, we shall gather all data associated with an individual parking type and perform both Sentiment-based analysis and KeyBERTbased analysis. The former allows us to extract user's feelings and opinion about the given parking type. While, the application of KeyBERT expects to shed light on vital sentences and keywords that could point out some users' demands, likes, or dislikes. The considered parking types are On-street, Off-street, Underground, and Airport parking. This choice is motivated by the dominance of these parking types in literature as well as by an initial exploration stage of our dataset. 2. Engagement-based analysis. In this case, we shall consider only those tweets that convey high engagement from users, and then apply the sentiment and KeyBERT to unfold the polarity and topical content of a such data. This aims to identify important factors that influence parking decisions from discussions that convey high level of users' interactions and engagements. For this purpose, in the same spirit as [44], we shall will assume that a given tweet conveys high engagement if it is either retweeted or liked by at least one user. Figure 1 provides a high level diagram description of this global trend analysis.

Construction of the similarity network
The essence of the method pursued in this paper consists in building a social network graph from the collected Twitter dataset. In this respect, the nodes of the network correspond to user's IDs while a link between nodes, say, ID 1 and ID 2 , is established whenever there exists at least one tweet message (excluding retweet) generated by ID 1 that is found sufficiently similar to that of ID 2 . This textual similarity is quantified using standard Jaccard similarity [45] score, 4 which computes the amount of overlapping among the two texts. In other words, an edge between ID 1 and ID 2 is established if and only if, there exist (non-retweet) Twitter messages T 1 from ID 1 and T 2 from ID 2 such that: where γ stands for the Jaccard similarity threshold beyond which the assertion "the two Twitter IDs ( ID 1 and ID 2 ) are deemed to share sufficient textual content" is valid. See also Algorithm 1 for a detailed algorithmic description. Strictly speaking, as part of the dataset domain structure, since each tweet message, after the preprocessing stage, is represented by a list of tokens/words, the calculus of Jaccard similarity score turns out to a simple count on the total number of common words/tokens among the two texts T 1 and T 2 over the total number of distinct tokens among T 1 and T 2 . This yields a similarity score ranging in the unit interval where zero would indicate no overlapping token, while a value one corresponds to a fully matching content in terms of tokens. Although this does not necessarily entail similar tweet messages due to potential impact of preprocessing stage and the negligence of token ordering information. On the other hand, the choice of the threshold γ should be very much dependent on the prior knowledge about the frequency and the nature of textual communications held by ID 1 and ID 2 . Therefore, a cautious and a contextual analysis should be followed to select appropriate threshold to ensure a rational graph construction. This selection process is performed by monitoring the size of the giant component (see next subsection) to ensure a critical network size is reached. We shall mention that alternatives to Jaccard similarity measure are also studied elsewhere. Gali et al. [46] provided an extensive comparison of potential similarity measures at character, token, n-gram and semantic level together with their associated implementation toolkits. Their findings highlights the importance of nature of dataset as the key factors that guides the selection of appropriate measure.

Community detection algorithms
In order to analyse the constructed social graph and provide interesting interpretations of the results, common community detection algorithms whose implementations are available in popular machine learning libraries were explored, and then we restricted to three algorithms who exhibited good satisfaction at the exploration phase: Clique, K-core and Girvan-Newman, briefly described below.
Clique is a type of community, which corresponds to a sub-graph where each node is connected to every other node of this sub-graph. Namely, a clique of size m is such that each of its nodes has a degree equal to m − 1 [47]. This corresponds to a high constraint in the community construction.
K-core is a less restriction than clique-like community and corresponds to a maximal connected subgraph in which each vertex has a degree at least equal to k. The higher the value of k, the higher the tendency of the underlined community towards a clique. The construction method for k-core identification is based on repeating deletion process of all nodes with less than k vertices connected to them [48].
Girvan Newman: is one of the most popular construction algorithm for online community detection. It is based on measuring edge betweenness values in the graph and involves several runs. The first step determines the edge betweenness value for each edge of the graph. Second, one selects the highest edge betweenness value, and deletes all edges (and nodes) that are associated with it. Third, one calculates the edge betweenness scores again for the remaining edges in the graph. This process is repeated for phases 2 and 3 until no edge remains [49].

Interpretation of communities generated by the similarity graph
The communities induced by the application of clique, k-core, and Girvan Newman algorithms were analyzed, visualized and interpreted. The core idea in this matter is to restrict the community detection to only those that can easily be interpretable according to three specific aspects: frequent keywords/topics, list of hashtags and location information as inferred by user profiles. This process follows a semi-automated reasoning where the communities generated by (k-core, for different choice of k), Clique and various levels of Girvan-Newman algorithm are scrutinized by monitoring the most common keywords, hashtags and locations and see whether they can be assigned to a common umbrella, and therefore validate the underlined community. For this purpose, the tweet messages pertaining to the same identified community are compiled together, and histograms of the ten most frequent keywords, hashtags and locations are constructed. Then, a human annotator scrutinizes these histograms to find out whether a common characteristic in terms of either a prominent (sub) topic (from either frequent keywords or hashtag list) can be recognized; or whether the users of the same community belong to the same location. The identified communities using the above process are visualized using appropriate visualization tools provided in Python library NetworkX. 5 Especially, this analysis was performed for multiple subgraphs generated by Clique, k-core and Girvan-Newman algorithms. This semi-automated process for generating interpretable (sub) communities is illustrated in Fig. 2.

Results and discussion
In this section, the experimental results of the above analysis are presented. We distinguish global trend based analysis results and community-based analysis results.

Parking type results
In this part, we present statistics and findings related to users' preferences and concerns regarding each parking type. Table 2 presents, for each parking type, the statics in number of supporting tweets, sentiment polarity and the main issues raised by the users as identified by KeyBERT-based analysis. Besides, we separated the KeyBERT results for positive polarity and negative polarity cases to comprehend the concerns of each user category. As shown in Table 2, airport parking is by far the most popular parking type (81.6%), which was identified using string approach like method. Besides, looking at the opinion of the users who discussed airport parking, revealed that the majority 63% have rather positive opinion, compared to 11% negative opinion. In terms of issues raised by the users, it turns out the discussions focused on the need of airport parking, its ease access with multiple modalities and availability. Besides, the negative polarity users were mainly concerned by the limited services offered at parking facility and the possibility of ticket refunds and sudden price raise. This finding overlaps to a large extent with the  [50] about airport parking where parking providers needed to build more parking lots even at small airports. This research also highlighted the parking operating revenues that can cover to some extent the increasing safety and maintenance cost of such infrastructures.
The second parking type in terms of popularity is the on-street parking (8.6%), shows the majority of users (50%) have positive opinion compared to 20% negative opinions. While the discussed topics turn around enforcement, regulations, increase in parking supply and closeness with respect to city centre. Besides, there were demands and critics concerning to the applied fines. The popularity of this parking type is explained by its efficiency as it often enables users to reach their destinations more quickly than indoor or off-street parking type, despite the critics on fine regulation and sudden price raise. This finding is in agreement with work reported in [51], which, after reviewing the stateof-the-art of on-street parking, concluded that a such parking category should only be provided in minor/secondary roads and avoid main roads to enhance pedestrian security. In parallel, the report also highlights the need to provide alternative parking supply to offset the pressure on limited on-street parking availability. The third parking type is underground parking (7.4%). Similarly to airport and on-street parking, the majority of users have positive opinion (47%). Their discussions revolved around the pricing, ease of access location, underground available facilities and the quality of the environment in the vicinity area. Moreover, in this type of parking, multiple tweets were more businesscentred by offering special fairs, reduced parking prices, and advertisement to new infrastructures. Finally, the less discussed parking type-off-street parking-is characterized

Negative KeyBERT elements
On-street 40  by the overwhelming dominance of positive sentiment reflecting the users' overall contentment and satisfaction. The discussed topics are centred around free parking availability, especially on weekends as well closeness to city centre and the availability of activities in the vicinity. Some of these outcomes have been discussed in other related work. For instance, in [52], the authors showed that the parking cost has the dominant influence on users' parking decisions. Table 3 shows the results of the most engaging tweets according to like/retweet assessment as well as the corresponding users' reflections, parking demands, critics, likes, or dislikes. In total 3486 tweets were found to belong to this category, where each tweet has at least one interaction, either through like or retweet. Looking to Table 3 reveals that the positive sentiment is pretty dominant (55% of the tweets), and only 15% were negative. The KeyBERT analysis exposed some of the key sentences that characterize the parking from users' point of view. We distinguished the views of users with positive opinion and those with negative opinion as well as the view of overall users regardless their polarity. For instance, the overall population analysis shows an interest to availability of parking in commercial areas, residential areas, on-street and nearby their destinations. Mapping positive polarity tweets revealed the users' interest to share good locations and free parking availability. Interestingly, in negative polarity case, we notice the tendency of users to propose solutions in areas such as residential parking and referred to it as a crisis. Other push towards industry-based solutions and review of regulatory framework and new management schemes. These user's based reflections found from analyzing these tweets can be categorized into three main categories as follow:

Engagement-based analysis
• Need for more parking infrastructure, particularly in businesses or downtown areas.
• Lack of parking supply in both residential and business areas, which caused stress and frustrations. • Poor management and regularization of land and parking lots.

Similarity graph construction
Following the methodology highlighted in "Parking global trend analysis" section, using a fine-grained tuning of Jaccard similarity threshold by varying the parameter γ from 0.1 till 0.9 at incremental step 0.1 and monitoring the size of the giant-component of the induced graph, it turns out that setting γ = 0.4 yields the largest giant component before its starts to shrink drastically (smaller values of the threshold yield unattractive scenarios where most of nodes were connected). We therefore adopt this choice in subsequent analysis. Figure 3 displays this generated network-based similarity, while Table 4 summarizes the main attributes of this graph, which include network's size, average path length, average degree centrality, clustering coefficient, path length. One notices that the graph has a modest number of connections between different nodes and sub-graphs, which show that the graph is somehow stretched and not tied. The giant component's size represents 31% of the graph's size, which will be further decomposed into various subcommunities. In overall, the average degree centrality and betweenness centrality values are rather low, which reflects the low connectivity between the graph nodes. In contrast, the relatively high average clustering coefficient value suggests some potential for extra (sub) communities and clusters.

Computational complexity
Regarding the time complexity of the method and construction of the network, it should be noted that the data processing time was pretty reasonable for graph construction methods. Indeed, tested on a laptop HP with Processor Intel I5 5300U CPU 2.30 GHz, it takes 230 s to iterate through the different steps and complete the data processing. However, the graph construction takes exactly 16 min and 15 s to choose the nodes and build the connections. This time complexity could be significantly improved by adopting a multi-thread architecture and using GPU like machine ( Table 5).

Interpretable community based analysis
Following the reasoning highlighted in "Construction of the similarity network" section, we apply the three community detection algorithms (clique, k-core for various values of k, Girvan-Newman), and monitor the interpretability in terms of most common keywords/hashtags and location of the Twitter IDs. For instance, using a simple frequency based analysis of the keywords constituting the tweet messages involved in one of the identified (sub) community generated by Girvan-Newman algorithm revealed the two most frequent keywords be "parking", "free". These can be cast into the generic topic of "free parking". Similarly, in another community, the most two frequent words are "bad" and "parking", which can be cast into a "bad parking" like community. This process enables us to discover communities related to airport parking, public parking, parking spot repainting, street parking, city parking, parking cost and fine, parking maintenance, traffic and corona-virus, among others. The choice of our topics is also motivated by the desire to elicit users' parking preferences and concerns. The first central aspect is related to the factors that influence parking search decision, such as free parking, cost of parking, and parking spaces, which reflect an interest in knowing the availability of spaces. The traffic and transportation is another significant aspect, reflected by airport parking, public parking, street parking, city parking. Surprisingly, the impact of artistic flavor is also noticed through topics like "street arts", "painting" and "street photography". Regarding location scrutinizing, we noticed for instance that common locations of the users as revealed by the users profiles were: India, The United States, Canada and The United Kingdom. Nevertheless, the analysis did not show up a clear path towards location-based (sub) community. Below are described distinguished communities identified Average betweenness centrality 0.0009 using the process described in "Construction of the similarity network" section where four distinct communities were distinguished. Especially, the analysis of the identified communities showed that the communities acquired with clique and k core algorithms were associated with almost the same topics. Also, many of the (sub) communities were so small (i.e. just from 2 to 4 users) that analyzing them alone did not seem meaningful.
That is why we restricted to the most representative ones only that we present next.

k-Core community
This community is illustrated in Fig. 4. The community is a 21 core visualization. It shows a group of people participating in a competition hosted by a parking provider company that provides automated parking solutions with an app that helps vehicles move in and out of the buildings. It also provides free parking spots. Table 6 summarizes the most frequent words and the hashtags retrieved from the tweets which form this community. Based on this, the topic of discussion was mainly about using the mobile app for parking reservations, and inviting friends to participate in the competition hosted by the company. The participant' locations are all from India, but multiple cities have participated, 16 different locations were mentioned. The intensive positive social interaction around the community in terms of the topic of discussion, the number nodes or distinct participants and community size is extremely high, which symbolize the critical interest of the individuals in using intelligent solutions such as mobile applications either to quickly find parking lots in congested areas or to make reservations and communicate efficiently. Also, the community size and their interest overlap with the outcomes of the research work conducted by Siuhi et al. [53] and their results about the necessity of a mobile application for parking and the significant potential and impacts either on individuals parking search decisions or the environmental impact by reducing the car emissions and traffic congestion.  Table 7 presents the most frequent words and the hashtags within this community. These results reflect the negative impact of the pavement parking and the coronavirus pandemic on England's traffic and parking infrastructures. It indicates the pavement parking has provoked and caused residents' frustration. The users tweeted about potential solutions to this effect. Usually, pavement parking creates problems for pedestrians and vulnerable groups such as people with limited mobility, disabled people, individuals with limitations in visibility. Moreover, it affects the pavement length by reducing the space for pedestrians. This joins the increasing research findings about parking violation [48] and illegal parking [24], which are found to be among significant issues and factors affecting the parking search decision of individuals and traffic congestion in cities. For instance, Wang et al. [48] built a datatset by collecting the daily police department reports in one of the China cities, and concluded that 35% of the registered claims in the city were about the parking problems including pavement parking. The authors also proposed a solution using a mobile application and an online platform for reporting illegal parking in the city.

Marketing community
The community presented in Fig. 6 was identified using Girvan-Newman algorithm. The social interactions and the type of influencers in this community expose the marketing and commercial aspect of parking in social media networking. According to the most frequent words, and the hashtags occurred within the community shown in Table 8, the topic of discussion sounds associated to parking near hospital premises during corona time, where the pandemic situation has caused problems with parking pot availability. Some people discussed the need to free parking spaces near hospitals. Others were trying to report the illegal parking caused by some individuals utilizing the parking reserved for disabled people. The community's central node is a company and a big influencer called PSRltd, 6 a marketing specialist and a parking provider in the United  Kingdom. The company played a role in the pandemic by providing free parking for hospital workers. They were motivating and calling people to contribute by facilitating the parking for the care workers to park easily. Moreover, they have provided a website specialized in helping the care workers to reserve parking easily.

Event community
The last community to be presented from the similarity network is shown in Fig. 7, along with its most frequent words and hashtags tabulated in Table 9. This community is related to social events and traveling. People in this community talked about the booking and pre-booking of parking spaces before going to the event. This interest indicated the individuals' behavior regarding parking reservations in an event or travel. The pre-booking is the tendency by utilizing a mobile app for parking reservations and proceeding with an online payment rather than traditional parking approaches and on-site  payments. This community shows an interest in using the new IoT and web application features such as E-payments, E-parking, automated parking systems, and parking reservation systems. All these factors and characteristics were identified by Revathi et al. [54], and Lin et al. [55] as essential factors that influence the individual's parking reservations and decisions.

Alternative community detection algorithms
In this section, we have tested alternative state-of-the-art algorithms for community detection to compare the results with our approach that used only Clique, k-core and Girvan-Newman. We restricted our implementation to those algorithms whose software is publicly available. About ten different algorithms were therefore considered, although some returned void communities or failed to deliver desired patterns. Among  Table 9 Hashtags, and most frequent words within the event community

Most frequent words Hashtags
Saturday, prebook, attending, concert, june, game, heading, forward, seeing, disappointment #concert, #greenday, #mobility #hobby, #filmphotography, #light, #parking algorithms tested positively one shall mention principled_clustering [56], belief [57], Fluid Communities [58], leiden [59] and chinesewhispers [60]. Other tested algorithms that failed to deliver interesting outcomes include Diffusion Entropy Reducer graph clustering algorithm (Der) [61], gemsec [62], walktrap [63], cpm [64], kcut [65], edmot [66] and lswl_plus [67,68]. The outcomes of each algorithm from the above list is as fellow: • Der, returned only two communities with very diverse discussions and locations. which render the interpretation of the results rather difficult, if not impossible. • Gemsec, returned an error. • Walktrap, returned 13 communities where three were fond similar to our findings (marketing, event, and pavement communities), while the remainder were very redundant and difficult to ascertain any interpretation. However, these results were not added to the discussion table. • Cpm, returned an error. • Kcut returned 6 communities; 5 of them contain a maximum of 2 nodes only, and one giant community. The results were almost ad hoc from interpretation perspective. • Edmot returned error. • Lswl_plus, returned error.
We shall mention the prospects of recent work by Sieranoja et al. [69] using k-means algorithm for graph clustering where high performance results were reported. This provides insights to our future in the field. Table 10 presents the results of those algorithms that delivered non-void communities. We presented also in the table the overlapping with our community-detection algorithm in terms of detected communities as well as the characteristic in terms of size, main hashtags and top frequent words in the associated tweet documents, of any additional community detected. It is easy to see that all the algorithms bear some similarity with our (k-core, clique, and Girvan Newman) method and detect some extra community as well. For instance, Chinesewhispers and Leiden performed well on the network dataset by identifying three communities that agree with our algorithm and multiple other communities. For the other communities, Principled clustering presented one interesting community corresponding to some parking providers helping the NHS (National Health Service) staff to get free parking during corona pandemic. With Belief and Leiden, the additional communities were about artisans and artists that paint the parking spaces and correct the lining of the spaces. Fluid and Chinesewhispers showed communities of people who lease and sell parking spaces for some price. This comparison confirmed that our choice of utilizing three detection algorithms was pretty rational and its result agree to a large extent with some state-of-the-art algorithms when applied to the same dataset where essential and critical communities have been identified.
Discussions Table 11 provides a summary of the four communities discussed in this work, highlighting their size, most frequent words, hashtags, and sentiment analysis that accompany the tweets forming each community. This sentiment analysis is added to provide more valuable insights about the polarity direction that dominates each community and to better comprehend the impact of the parking behavior.
In terms of community size, the 21 core community found using the k-core method is the largest one and it is characterized by overwhelmingly positive polarity opinion reflecting the users' positive attitude that accompanied the parking experience. The second community in terms of size is the pavement parking, characterized by a rather negative sentiment. This is not surprising as the users showed dissatisfaction about the pavement usage for parking. Third community in terms of size, is the event community, which is dominated by positive opinion where users are interested in sharing, attending or engaging with concerts, exhibitions and game events. Finally Marketing community also exhibits positive polarity in overall with a focus on smart apps, health issues and transportation at wide as well as the associated costs and regulation. The findings pointed out in this paper should not be hide some inherent limitations associated to the nature of data employed and the methodology. This is summarized into the following.
• From the data collection perspective, there is a boundary limitation in terms of number of tweets that can be collected by a single API call. Although, we systematically repeat the process several times to maximize the number of posts collected, there is still a limit in terms of how far in the past the search operation can be performed. In fact, the Rest API search in Twitter can only include a list of tweets that have been shared in the last seven days approximately. Despite this limitation, a such Twitter API search is commonly employed by researchers [70,71]. • The use of textual similarity using Jaccard index has inherent structural limitation in the sense that it ignores other linguistic constructs that may be conveyed by the text message, which includes semantic similarity, dialogue act, entailment, negation, among others. Strictly speaking the use of Jaccard index is only motivated by the tendency to share key events by the users where the associated keywords or tokens are explicitly replicated in their tweets. This also bears some similarity with the popularity of string-based metric in sentence-to-sentence similarity measures [72]. Besides, the introduction of thresholding on Jaccard index based on network attribute (giant component) can be seen as a relaxation on the full "post" similarity according to Jaccard value. • The use of Jaccard similarity, although popular in natural language processing applications, can be questioned. Alternatives metrics like Dice measures, cosine measures can also be valuable. Although it is difficult to identify relevant theoretical premises for choosing one specific measure over the other one, see, for instance the review paper of Gali et al. [46], we believe the impact of a given similarity measure would rather impact the choice of the threshold value but not the community result findings. A more in-depth analysis would be required to assess this observation. • This research joins other researches in transportation, which stress on the importance of social media data to leverage the various travel user's experiences. In this respect, Welch and Widita [73], for instance, suggest that public transport research can significantly benefit from pairing of transport big data sources with social media to infer customer satisfaction and validate hypothesizes about travel behavior. • It should also be noted that the use of token based similarity is well-motivated in the context of our study. This is because the use of short text messages in tweets make the use of advanced semantic analysis somehow less relevant since most of the users do not elaborate on their opinions and thoughts so that stressing on common terms, like name of events, parking infrastructures and organisations seems a more cautious attitude to grasp the similarity among tweet messages. • It should be noted that in the era of social media, many key players in the car parking industry have already active Twitter account with several followers. This includes operators working on car parking mobile apps, digital parking installation operators, construction operators and many other associated services. Therefore, it is not excluded that many of the populated hashtags are also created and populated by these operators in order to reach wider audience. On the other hand, one shall also mention the growing importance of bots in the data collection process as many Twitter IDs were mainly interested to create a buzz around the defined topic to increase the number of followers for business perspective. Although the full detail of the impact of the bots on the collected data is beyond the scope of this paper, extrapolating from previous research findings (see, for instance, European Commission report [74]), it is estimated a reasonable percentage of the Twitter ID are bots originated. • The use of engagement assessment in our trend analysis presents some opportunity to handle the above bots or echo-chamber effect because it is highly believed that bots messages will be less subject to retweets or Likes. Therefore, one can rationally claim the findings of this global trend analysis are less obstructed by echo-chamber effects. • The availability of the twitter dataset at various levels of mobile car parking apps can provide a rough indication to tackle the problem of technology adoption from social media perspective in line with research carried out in [75]. Although this research is part of our future agenda, there are sufficient ingredients to believe that a such approach is tenable and can be benefit both the service suppliers and policy-makers. Various other works were done at our research group concerning parking behavior analysis. A recent work [76] investigates the parking behavior in Finland using news articles mining approach. Some key differences between these two works concerns both the nature of input and the methodology. In Arhab et al. [76], the inputs were collected from News API, where long text documents were collected. This provides opportunity to apply more in-depth natural language processing techniques exploiting the semantic aspect and discourse to derive insights regarding user's parking concerns and experience.

Conclusion
In this paper, the parking behavior was examined based on social network analysis, utilizing a parking dataset gathered from twitter when tracking popular car parking related hashtags. A graph-based on similarity was constructed using Twitter user's ID, taking into account the similarity of their tweet messages according to Jaccard similarity score. Several community detection algorithms; namely, Clique, k-core, and Girvan-Newman were combined with rational interpretation based approach that makes use of frequency of common keywords, hashtags as well as location of users in order to generate interpretale (sub) communities. This expects us to provide insights into identifying individuals' parking behavior and factors influencing their parking search decisions. In parallel, a global trend analysis that investigates different parking types and the most engaging discussion in terms of presence of retweets or Like has been carried out. This analysis makes use of sentiment polarity and dominant discussion trends according to Key-BERT model. Some of the findings confirmed some already established results in terms of influence of discount, free parking availability on the parking search decisions. Factors related to events occurring in city have also found to influence the online pre-booking with mobile apps rather than traditional systems. Furthermore, surprisingly, individuals' parking skills were found to constitute a big topic as well as malicious behavior such as pavement parking. In addition, marketing behavior about parking revealed to have an important impact as well. Another community revealed that the big influencers in the parking domain are most likely to be parking providers and marketing specialists. Finally, the corona-virus pandemic has affected the traffic and the functioning of parking systems, especially near hospitals, where there was much solidarity between people to help care workers access parking lots. Besides, the developed approach for mining hastags through semi-automated process of community detection can be extrapolated to several other domain applications, with a potential high societal impact. It also joins the recently promoted concept of Explainable AI, where explanability and interpretability are seen as critical for further AI application development. Therefore, areas of future research include the hybridization of some known explainable AI approaches in model approximations, visualization and community detection in social networks.
in the graphs) that passes through the target node u. The average in-betweenness centrality provides the average across all nodes of the graph [77]. Formally, this boils down to: are the counts of the shortest paths of s and t passing through v i . σ st , and n total number of nodes.
Clustering coefficient provides an estimate of how much a node tends to form triangles with other nodes in the graph. This yields a measure of the density of the connection; that is, higher the clustering coefficient, high the connection density [79]. More formally, it can also be expressed in terms of local clustering coefficient for individual nodes: . T (v i ) indicates the number of triangles connected to node v i . d i is the degree of node v i Diameter to get the diameter, first shortest paths between node pairs is calculated, after that, the average is taken, then the diameter is defined [80] by where V be the set of nodes in the social network, and d(s, t) be the shortest path between nodes s and t.