Using social media for sub-event detection during disasters

Social media platforms have become fundamental tools for sharing information during natural disasters or catastrophic events. This paper presents SEDOM-DD (Sub-Events Detection on sOcial Media During Disasters), a new method that analyzes user posts to discover sub-events that occurred after a disaster (e.g., collapsed buildings, broken gas pipes, floods). SEDOM-DD has been evaluated with datasets of different sizes that contain real posts from social media related to different natural disasters (e.g., earthquakes, floods and hurricanes). Starting from such data, we generated synthetic datasets with different features, such as different percentages of relevant posts and/or geotagged posts. Experiments performed on both real and synthetic datasets showed that SEDOM-DD is able to identify sub-events with high accuracy. For example, with a percentage of relevant posts of 80% and geotagged posts of 15%, our method detects the sub-events and their areas with an accuracy of 85%, revealing the high accuracy and effectiveness of the proposed approach.


Introduction
Social media platforms have become an important source of information that can be exploited to understand human dynamics and behaviors.Social media posts can be geotagged, that means they are marked with geographic coordinates that allow a program to identify the location where the post was created.In some cases, such information can be combined with the textual content of the post to understand what was happening in that location.This information is extremely useful in many application contexts, such as understanding the movement of tourists within cities [1] or the behaviours of fans following important sporting events [2], discovering the best areas to open new businesses [3], analyzing the purchasing trends of users in a specific area [4].
Data elements contained in social media posts are often unstructured and require advanced analysis in order to extract useful knowledge.For example, the textual content of a post may contain information about the discussion topic [5], the sentiment of the user who wrote the posts [6], the place where the post was written [7], user opinion on a certain argument [8] and risk prevention [9].To obtain this information, advanced machine learning techniques, such as Natural Language Processing (NLP), neural networks and deep learning techniques, must be exploited [10,11].
In the context of natural disasters, the very large use of social media platforms has enabled eyewitnesses and other disaster-affected people to share information about their damages, risks and emergencies in real time.As an example, during Hurricane Harvey in 2017, when 911, the emergency telephone number in the US, was overwhelmed by thousands of calls from those in need of immediate aid, people turned to social media to ask for help [12].Research studies show the importance and usefulness of the information shared during disasters, both through traditional infrastructures [13] and social media [14,15].
Despite these advantages, the use of social media posts to help rescue and intervention activities remains an open challenge as users often publish posts containing inaccurate information, slang or abbreviated words, or without using geolocalization.While extensive research work has been done on the classification of posts to understand their high-level informational categories [14], little focus has been given to understand and extract small-scale events that affect small communities.In fact, every disaster creates a series of small-scale emergencies (sub-events), such as family members stranded, power outage, damage to buildings, school closure, or damage to bridges.Normally, these subevents affect only a small portion of the population in the disaster area and thus receive less attention and delayed response.Among other causes, the lack of information about these events causes a slow response from the authorities, especially during an ongoing disaster.
In this work, we aim at identifying small-scale events that occurred after a natural disaster or catastrophic event.For this purpose, we present a new method, namely SEDOM-DD (Sub-Events Detection on sOcial Media During Disasters), for detecting sub-events during disasters from social media data.Specifically, the proposed method addresses two important issues: understanding whether a post is relevant about a disaster and discovering the sub-events that occurred in the disaster area.SEDOM-DD performs these tasks in four main steps: (i) collection of posts that are potentially related to the disaster; (ii) filtering of posts to keep only the relevant ones; (iii) data enrichment by using information contained in posts to increase the number of posts for which it is possible to estimate their geolocalization; and (iv) use of clustering techniques on geotagged relevant posts for detecting sub-events.
SEDOM-DD has been evaluated with datasets of different sizes that contain real social media posts related to different natural disasters (e.g., earthquakes, floods and hurricanes).Furthermore, starting from such datasets containing real posts, we generated synthetic datasets with different features, such as different percentages of relevant posts and/or geotagged posts.Several experiments performed on both real and synthetic datasets showed that SEDOM-DD is able to identify sub-events with high accuracy both in detecting the area where they took place and in understanding the type of problem (e.g, collapsed buildings, broken gas pipes, flooding).Specifically, with a percentage of relevant posts of 80% and geotagged posts of 15% the method correctly detects the sub-events and their location areas with an accuracy of about 85%.Also in all the other configurations, our method was able to detect sub-events with high accuracy, revealing its effectiveness even dealing with noisy data.
Differently from other existing techniques, SEDOM-DD focuses on discovering subevents that can occur as secondary effects of a disaster.For this reason, it can be integrated with existing systems for coordinating and enhancing emergency response.The detected sub-events, together with the posts and photos that made it possible to detect such events, can be analyzed and validated by a group of experts to establish the type and the priority of interventions to be carried out.
The remainder of the paper is organized as follows.Section "Related work" discusses related work.Section "Proposed method" describes the proposed method.Section "Experimental evaluation" presents the experimental evaluation of different case studies, and Sect."Conclusions" concludes the paper.

Related work
A recent study carried out a comprehensive literature survey on the use of social media as a tool for improving damage estimation and better organizing relief operations during disasters [14].The study also discussed the main issues in the use of social media data in disaster scenarios, such as the difficulty of processing huge amounts of data in a timely manner, the presence of unwanted or fake information, and the difficulty of collecting data describing the different stages of a disaster.Other surveys have addressed the issue of processing social media posts during mass emergencies [14,[16][17][18] by focusing on different aspects, such as coordinating evacuation operations [19], combining data from different sources like satellite imagery [16], and understanding how information spreads during such events [20].
Some researchers have analyzed social media traffic data for detecting earthquakes and estimating their impact area [21,22].For example, Avvenuti et al. [23] developed a system, namely EARS, which analyzes streaming data from Twitter for detecting seismic events.Such a system exploits a burst detection algorithm to identify earthquakes from tweets, and processes the corpus of each message for determining the impact of the seismic events on people and infrastructure.Other works focused on collecting and providing information about earthquakes currently in action.LastQuake [24] is a system that has been developed in collaboration with the European Mediterranean Seismological Centre (EMSC) that provides eyewitnesses with visual information on felt earthquakes and, at the same time, it allows to collect user feedback on the main seismic shock and its subsequent aftershocks.Sangameswar et al. [25] proposed a sentiment analysis approach for identifying the places of natural disasters (e.g., earthquakes), which could be a region, country, or continent.
An important aspect of disaster management is identifying sub-events that can take place at different locations during or after a disaster (e.g, collapsed buildings, broken gas pipes, flooding).Different studies have tried to discover sub-events from social media data using different approaches, based on both supervised and unsupervised techniques.Some supervised techniques have been proposed for discovering sub-events after disasters.Most of them exploit weighted graph-based structures [26], TF-IDF (Term Frequency-Inverse Document Frequency) vectors [27], while others exploited neural networks for discovering, classifying and summarizing sub-events from social media data [28][29][30].Supervised techniques require a manual definition of features and parameters used by the discovering algorithms.For some events, such techniques can achieve good results, but in many other cases the effort required to configure and optimize the algorithms could be very high and the obtained results could not be effective.For these reasons, many studies have focused on event detection techniques based on unsupervised approaches.
In fact, most unsupervised techniques that have been proposed for discovering subevents in natural disasters are usually based on clustering algorithms coupled with similarity metrics.With regard to social media data, each textual feature (e.g., text or hashtags) is modeled as a weight vector by using TF-IDF in which the cosine similarity is used as distance metric among features [31,32].Other unsupervised techniques are based on topic model approaches, such as LDA (Latent Dirichlet Allocation) and HDP (Hierarchical Dirichlet Processes), which extract sub-events by analyzing the semantic representations of documents [33,34].Nolasco and Oliveira [35] used LDA for event mining from raw text and topic labeling methods to assign representative labels to them.Instead, Rudra et al. [36] proposed a technique based on ILP (Integer Linear Programming) and exploited a natural language processing approach for identifying and summarizing sub-events from Twitter data.
Differently from existing techniques, SEDOM-DD focuses on discovering sub-events that can occur as secondary effects of a disaster.Specifically, our method is specialized in searching and displaying sub-events on a map from social media data, even in presence of noise.The proposed method tries to use as many posts as possible by including posts that are not geotagged but that contain textual information from which geographical position can be deduced.Compared with other work that finds sub-events from social media, such as Rudra et al. [36], our method exploits a spatial clustering algorithm to identify the geographical areas where the sub-events occurred.Then, by analyzing the texts and keywords of posts in each cluster, it identifies the types of sub-events that occurred.Several experiments on different datasets related to different types of natural disasters (i.e., earthquakes, floods and hurricanes) demonstrated that SEDOM-DD is able to detect sub-events with high accuracy, revealing the effectiveness of the proposed approach.

Proposed method
To identify sub-events during a disaster, the proposed method mainly relies on four important steps.Figure 1 shows these steps together with their inputs and outputs: 1 Data collection: given a disaster event and its impact areas, all the posts generated in the event's area are collected.These posts can be collected from social media platforms (e.g., Twitter) through queries based on keywords or locations.2 Filtering of posts: we use supervised machine learning techniques to identify relevant posts.Posts that refer to the disaster and that come from users who live in the affected area are relevant for analysis, and thus are maintained.3 Enrichment of posts: since many posts are relevant for analysis but are not geotagged, the information contained in the text is used to estimate the coordinates of the location where such posts were created.For example, if a post refers to a specific location (e.g., by reporting in the text the name of a road or a monument), it is possible to use a geocoding service for estimating its coordinates.4 Finding sub-events: geotagged posts are analyzed and aggregated for finding clusters of posts mentioning a common problem (i.e., a specific sub-event that occurred in a certain area).This step involves the use of a spatial clustering algorithm to identify the areas where the sub-events occurred.Then, by analyzing the texts and keywords of posts generated inside each cluster, it is possible to understand the type of subevent that occurred.
Figure 2 shows an example of how our method works.Specifically, it was built starting from an earthquake that really happened.On May 21, 2019, the province of Barletta-Andria-Trani in Apulia (Italy) was affected by an earthquake of magnitude 3.9 having an epicenter at 34 km of depth 4 km from the city of Barletta.After the shock warning in several municipalities across the province, many public institutions had to be evacuated, including schools, judicial offices and other facilities.The discomforts have also spread on public transport, in fact, many railway lines have been interrupted for a few hours, in order to guarantee passengers safety.The old town of the city of Trani turned out to be one of the most affected areas.
During these panic hours, we collected posts from social media focusing on the catastrophic event that occurred in the area (see Fig. 2a).Starting from such posts, those that do not explicitly refer to any sub-event have been filtered out (Fig. 2b).Posts regarding sub-events have been clustered and their text analyzed to understand the type of subevent that occurred in each cluster's area (Fig. 2c).In particular, Fig. 2c highlights four significant sub-events that happened in the old town of Trani: the fall of material from the church of San Domenico, a structural problem with the Liceo De Santis, a water outage in the St. Mary district, and people evacuated from the judicial offices.More details on the algorithms used in the different steps of our method are provided below.

Filtering of posts
During this step, posts collected from social media are processed and filtered for keeping only the ones that are relevant for the analysis.A post is relevant when it contains text concerning a catastrophic event (e.g., earthquake, flooding) that happened in the area under analysis.Relevant posts can be further divided into two categories: (i) generic, which generically refer to a catastrophic event, without mentioning any specific sub-event (e.g., "yesterday there was an earthquake, we were very scared!"); (ii) notgeneric, which explicitly refer to problems/sub-events that occurred as secondary effects of a catastrophic event (e.g., "we have been without electricity since yesterday").We are mainly interested in relevant posts and, in particular, non-generic ones that mention some sub-events that have occurred.
It is evident that the classification of posts is a crucial step for obtaining accurate subevent results.In Sect."Experimental evaluation" we described the data we collected on Twitter and the results of some classification algorithms for separating relevant tweets from not-relevant ones.The results show that classification algorithms are able to correctly detect relevant tweets with high precision.

Enrichment of posts
The proposed method uses geotagged posts to identify the areas where the sub-events occurred.The main problem with posts from social media is that they are not always geotagged, which makes them not always useful for the analysis.The data enrichment step aims at estimating the coordinates of relevant but not geotagged posts through the analysis of the text.In this way, it is possible to increase the volume of geotagged data to be analyzed, which should lead to better accuracy in the identification of sub-events.Posts that are not geotagged can include textual information that allows to estimate their geographical coordinates.For instance, users often report in the text the name of the street or the district where the event occurred (e.g., "Washington Street in Cork closed to traffic following the partial collapse of a building").Several studies have proposed techniques for geotagging posts exploiting the textual information they contain [37,38].In addition, different geocoding services, such as Google Map 1 or Nominatim, 2 can be exploited for converting an address, even partial, into coordinates.In some cases, natural language processing techniques, e.g., based on CoreNLP, 3 can also be used for identifying the locations mentioned in the text of a post.
Our method uses the following approach for estimating the coordinates of a post.Given a geographical area to be analyzed, we exploit geocoding web services for retrieving Points-of-Interest (PoIs) in the area and the most common names used to refer to them.Then, we extract street and district names from a text through textual patterns.
Once we have identified this information in the text, we translate it into coordinates with four levels of accuracy: PoI, street, district, or city.For example, a post that refers to a PoI is geotagged with the coordinates of such PoI.While a post that refers to a street (without a house number) or district is assigned to a point randomly chosen inside the street/district where the sub-event occurred.A post about a catastrophic event that cannot be associated with a specific point or area is placed at the city level.

Finding sub-events
Following the same approach proposed in [39], we used a clustering algorithm to aggregate the posts that refer to the same sub-event and discover the area where it occurred.In particular, DBSCAN [40] has been chosen for its ability to detect clusters with different sizes and shapes, tolerate noise, and be applicable on small or large data sets.Moreover, in the context of the extraction of areas or regions-of-interest, it is one of the most used algorithms in the literature [41].
For each cluster identified by DBSCAN, a procedure is carried out for identifying the sub-event that occurred in the cluster's area.In particular, we extract the keywords (and their frequency) contained in the posts from such a cluster.The keywords are then sorted by frequency.A high frequency does not necessarily denote high representative keywords, but it is a useful starting point.As an example, the keyword "earthquake" may have a higher frequency than "building collapse", although "building collapse" is evidently more representative as a sub-event that occurred in an area.The most representative keywords are then compared with a manually trained dictionary, which contains a list of terms that are commonly used to report specific sub-events that occurred.The dictionary associates a term, representing a type of sub-event, with some synonyms.As an example, for the sub-event "collapsed house" we also consider a list of similar terms, such as "destroyed house", "house collapse", and "unsafe house".As stated in [36], the terms used to report a sub-event in the text are usually composed of a pair subject entity, action happened , such as "bridge collapsed" and "power outage".

Experimental evaluation
Several experiments were carried out to evaluate the performance of SEDOM-DD, using datasets related to different types of natural disasters (i.e., earthquakes, floods and hurricanes) that occurred in the period 2009-2019.Moreover, for evaluating the performance of SEDOM-DD using data with different characteristics and levels of precision, we started from such real data and generated a few synthetic datasets.
This section is organized as follows.Section "Collected data and classification of relevant one" describes the data collection process and the algorithms used to classify posts in relevant and not relevant.Section "Detection of sub-events on synthetic data" discusses the synthetic data generator, the algorithm for detecting sub-events, and results obtained in our tests.Finally, Sect."Detection of sub-events on real data" presents the results obtained by SEDOM-DD on a large collection of posts about Hurricane Harvey, a Category 4 storm that hit Texas in 2017.

Collected data and classification of relevant one
In this paper, we used social media messages posted on Twitter during catastrophic events.Although our system is able to use data from other social media (e.g., Facebook or Flickr), Twitter has been chosen because it is widely used in this application context as it allows to download large amounts of data through public APIs.Other social media, although more widespread and used than Twitter (Facebook and Instagram), do not allow researchers to download users' posts on a certain topic and therefore appear to be unusable.
We used Twitter APIs for searching and collecting tweets matching keywords related to earthquakes, including those that occurred in Barletta (May 21, 2019) and Peru (May 26, 2019).From the analysis of the collected data, we noticed that some tweets report the earthquake and the problems/sub-events it generated (relevant), while others do not refer to the catastrophic event (not relevant).
Starting from the collected data, we created a manually classified dataset ( D 1 ) com- posed of 5000 tweets, half relevant and half not relevant.Such data have been used to train different machine learning algorithms, which are Naïve Bayes (NB), K-Nearest Neighbors (KNN), Support Vector Machine (SVM), Logistic Regression (LR), Decision Tree (DT), Random Forest (RF), XGBoost (XGB), and Neural Networks (NN).In particular, we used the implementations included in the scikit-learn library 4 , together with Keras 5 , TensorFlow 6 and Word2Vec [42] for creating neural networks.
The obtained classification models take into account different features of tweets, such as length and presence of keywords, hashtags or bi-grams that are typically used to refer to disasters.Let P = {p 1 , p 2 , ..., p n } be a set of social media posts, where a generic post p i is a social media content (e.g., a tweet) posted by a user after a catastrophic event E. Specifically, a generic post p i includes: • user_id, containing the identifier of the user who posted p i ; 4 https:// scikit-learn.org/ stable/. 5https:// keras.io/. 6https:// www.tenso rflow.org/.
• timestamp, indicating when (date and time) p i was posted; • text, containing a textual description of p i ; • tags, containing the tags associated to p i ; • coordinates, which consists of latitude and longitude of the place from where p i was created (often this field is undefined); • profile_geo, containing public location information provided by the user in its profile; • length, indicating the length of the text of p i ; • numKeywords, indicating the number of relevant keywords (e.g., earthquake, flooding, magnitude, lack of water, electrical problems) contained in the text of p i ; We performed several experiments for tuning the input hyperparameters used to control the training process.Table 1 reports the values of the main hyperparameters used for the different algorithms.
For the different algorithms, the classification models have been trained using dataset D1.Then, such models have been tested on five datasets [43], different from D1, which are related to different natural disasters (i.e., floods and earthquakes) that occurred in the period 2009-2019 (see Table 2).In such a way, the training and testing datasets are completely decoupled, which enables to evaluate how well the models are generalized to deal with new unseen data.It is worth noting that some datasets are unbalanced because the two classes, relevant and not relevant, are not equally represented.In order to correctly evaluate the classification models, the training datasets have been balanced before building the models [44].
With all the datasets, the classification algorithms were able to separate relevant tweets from non-relevant ones with high accuracy.As an example, Table 3 shows  the results obtained by the different algorithms on the D 2 dataset (similar behaviors we obtain with the other datasets).The algorithm based on neural networks was the most accurate with an accuracy of 83%, followed by the algorithms XGBoost (81%) and Random Forest (80%).Figure 3 reports the classification results obtained with the other four datasets ( D 3 , D 4 , D 5 , D 6 ), which assess the high accuracy obtained by neural networks in all four tests.For this reason.such a model has been used for classifying posts into relevant and not relevant with high accuracy.

Detection of sub-events on synthetic data
To evaluate the performance of SEDOM-DD, we generated several synthetic datasets, each with different characteristics and levels of precision [45].In particular, such datasets were generated starting from real social media posts published during or immediately after catastrophic events.Some of these synthetic posts are marked with precise geographic coordinates, others are not geotagged but contain information that can be used to estimate their coordinates with a varying degree of precision, and the remaining ones generically refer to the main disaster but not to any sub-events.
In the next sections we describe the algorithms used for generating synthetic data and detecting sub-events.

Synthetic post generator
Algorithm 1 shows the pseudo-code of the procedure used to generate synthetic posts.The input parameters are reported in Table 4 along with the values that were used for the experimental evaluation described in Sect."Results".The output is composed of a set of sub-events S and a set of posts P. At the beginning, the two sets, S and P, are initialized (line 1).A given number of sub-events (numSubEvents) are generated and added to S (lines 2-19).In particular, a generic sub-event s is created (line 3) and its type (s.type) is randomly chosen from subEventTypes, a list of predefined sub-event types (line 4).Such a list contains different types of problems/sub-events that occur after catastrophic events, such as "damaged building", "sewerage breakage", "wall collapse", "power outage", and so on.
A random point (i.e., a pair of coordinates) in the area under analysis is chosen as the center of the sub-event (line 5).Since the effects of a sub-event propagate in the surrounding area, the propagation has been modeled with four levels of precision: Point-of-Interest (PoI), street, district, city.The level PoI specifies the area where the sub-event occurred (i.e., the exact area of a collapsed building) and it is represented as a circle with center in s.coordinates and a radius equal to subEventRadius (line 6).The other levels have been introduced to take into account that the effects of a subevent propagate in the surrounding area.The area of a level contains the areas of the lower levels, that means: PoI ⊂ Street ⊂ District ⊂ City .For the sake of simplicity, starting from a sub-event at the PoI level, the Street and district levels are automatically generated and represented as circles with a greater radius.The area outside the district represents the city level.The generator establishes the number of posts to be created for a sub-event (line 7).The sub-event s is then added to S. After that, through an iterative process, the posts associated with the event s are generated so that they contain information with different degrees of precision (line 9-19): • First, it is established at which level (PoI, street, district, or city) the post p must be geolocated (line 12).Based on this choice, and on the propagation levels defined for the sub-event, appropriate coordinates for the post are chosen (line 13).It should be noted that these coordinates are saved as hiddenCoordinates, because they are only used to validate the accuracy of the results, which means they are not visible to the analysis algorithm if the post is not marked as geotagged.• Since only a certain percentage of posts is geotagged (percGeotagged), we randomly determine if a generated post is geolocated or not (line 14).If the post is geotagged, the hidden-Coordinates are saved in the coordinates field (line 15), which can be read by the analysis algorithm.Otherwise, the coordinates remain undefined (lines 16-17).• We generate a text for each post (line 18).Specifically, such a text can include terms related to the type of sub-event, which are taken from a pre-built dictionary which contains a certain number of terms for each type of sub-event.Moreover, the text can contain information on where the sub-event happened with varying levels of accuracy (it depends on the distrGeoInfoText parameter).The post p is then added to P (line 19).
Eventually, a set of generic posts are generated and added to P, according to the parameter percGeneric.In such a way, it is possible to add some noise into the data to be analyzed (line 20).

Sub-event detection
Algorithm 2 shows the pseudo-code of the procedure used to discover sub-events from social media posts.The input is a list of posts P and the parameters of a clustering algorithm.In particular, DBSCAN was chosen as a clustering algorithm since it is resistant to noise and it can find clusters of different sizes and shapes.DBSCAN requires the following parameters as input: eps, the radius of the neighborhood of a point; and minPts, the minimum number of points that are required to form a cluster.The output is a list of sub-events S found that have been discovered in the area under analysis.Regarding the resources required to run DBSCAN instances, its computational complexity is where n is the number of points, which drops to O(n log n) if a spatial index is used [46].We point out that, in order to obtain a real situation, not all generated posts are geotagged: only a small part of them include a geographic position or contain textual information that allows to estimate, with a certain precision, where the sub-event occurred.Therefore, due to the way the synthetic datasets have been generated, it is reasonable to expect that the sub-event detection algorithm will never reach 100% accuracy as some data is missing and cannot be reconstructed.
The algorithm analyzes the posts P by performing some preprocessing and data enrichment operations (lines 1-11).First, both not relevant and generic posts are filtered out (lines 2-3).This means they are not considered during the clustering phase.Then, all posts that are not geotagged are processed in an attempt to estimate their coordinates based on the textual information they contain (lines [4][5][6][7][8][9][10][11].According to a certain distribution, the geolocation can be estimated at the PoI level, which allows the estimation of the post coordinates with the highest precision, or at the street/district levels.Otherwise, the posts that cannot be geolocated are discarded (lines [10][11].At the end of this process, the remaining posts are thus relevant and geotagged. In the second part of the algorithm, geotagged posts are transformed into coordinates and analyzed by DBSCAN so as to generate a set of clusters CP (lines 12-13).For each cluster cp ∈ CP , the following operations are carried out (lines [14][15][16][17][18][19].The most frequent words in the texts of posts belonging to cp are extracted (line 15).From such words, the most representative ones are selected by using the TF-IDF algorithm [47] (line 16) and compared with a dictionary containing information that allows to identify the type of event that occurred (line 17).The points included in cp can be converted into a convex polygon, which represents the area where the sub-event occurred (line 18).The detected sub-event s is added to S found (line 19).
To evaluate the accuracy of the sub-event detection algorithm, we compare the subevents found by Algorithm 2 ( S found ) with those generated by Algorithm 1 (S).In par- ticular, each sub-event found is compared with the one in the initial dataset that provides the largest match.Then, some performance metrics (i.e., precision, recall, and F1-score) are measured by calculating the posts that have been successfully classified as part of the sub-event.Figure 4 shows an example of synthetic data generated in the city of Trani (Apulia, Italy).Figure 4a shows some sub-events that have been represented using the fourlevels model described above.Specifically, the level PoI is depicted in green, the street in yellow, the district in pink, and outside the pink circle we have the level city.Figure 4b illustrates synthetic posts (green dots) that have been generated for the different sub-events in the area.Five examples of posts have been reported in Table 5: one is geotagged (tweet ID 1), two contain texts that allow to estimate geotagging information (tweet IDs 2-3), while others are generic and do not allow to deduce the geographical coordinates (tweet .

Example of generated/processed data
By applying DBSCAN on collected posts, it is possible to discover clusters that represent the geographical areas where the sub-events occurred.Then, a textual analysis of the posts of each cluster permits to find the main keywords used in that area so as to understand the sub-problem that has occurred.As shown in Fig. 4c, each cluster that is found is represented as a purple polygon.A label describing the occurred subevent is also assigned to each cluster.

Results
The evaluation was carried out on synthetic datasets by using different configuration values for the parameters reported in Table 4, some of which were extracted from real Twitter data as described in [48].
Since such datasets are characterized by significant variability in the density of points (number of posts per area unit), we made several preliminary tests to determine the optimal input parameters of DBSCAN.In particular, the maximum distance between points (eps) has been set to 7 meters and the minimum number of cluster points (minPts) to 150.
During our experiments, we used a reference configuration C ref that has been made up with the parameter values shown in bold in Table 4 (e.g., percGeotagged = 15% , percGeneric = 20% ).Subsequently, some parameters of such configuration have been varied to understand the behavior of our method with data more or less precise.For each parameter configuration, we performed ten tests by using different seeds.Figure 5a shows the F1-score obtained by varying the number of posts generated for each sub-event.As shown, the F1-score grows up as the number of posts for each event increases.Considering the configuration with 10 events, we obtained an F1-score of 0.58 by using the configuration with the minimum and maximum number of posts for subevent postPerSubEvent = [10, 50] (blue bar), 0.87 for [20,100] (orange bar), and 0.89 for [30,150] (green bar).The greater precision is due to the fact that there are more posts for each sub-event and therefore the cluster is more precise.
Figure 5b shows per F1-score obtained by varying the value of the parameter poiRadius.By increasing the mean radius of an event, the F1-score tends to decrease because the recall tends to be smaller (i.e., the clusters tend to be smaller).As the radius increases, the points are distributed over a larger area.As a there is a reduction in the density of points that could reduce the ability of DBSCAN to find larger clusters.
Other experiments have been performed to study how the results vary with the percentage of generic (percGeneric) and geotagged (percGeotagged) posts.
In particular, as the percentage of generic posts increases the F1-score decreases since there are fewer points that can be processed by DBSCAN (Fig. 6a).Instead, as the percentage of geotagged posts increases, the overall F1-score improves since DBSCAN can exploit a higher number of points to find more accurate clusters (Fig. 6b).It should be also noticed that the performance decreases as the number of sub-events in the area increases.This behavior is mainly due to the fact that the presence of multiple events in the area produces clusters that are overlapped and, for this reason, not so representative for the single sub-event.
Figure 7a shows the results obtained by changing the distribution of the geotagged posts among the four levels (PoI/district/street/city).As shown, increasing the concentration of points at the PoI level leads to a better F1-score since the clustering algorithm is able to work on more defined clusters (see green bar).Instead, Fig. 7b presents the results obtained by varying the percentage distribution of textual information, which allows to estimate the coordinates of a post within one of the four levels (PoI/district/ street/city).Also in this case, the greater the accuracy of the information in the text (i.e., more points can be estimated at the level PoI) the greater the ability of the algorithm to discover accurate clusters for sub-events.However, increasing the number of sub-events in the area could result in a reduction of the F1-score due to the overlap of different clusters (see the case with 20 events).
The good results obtained by SEDOM-DD are most likely due to the data enrichment process, which allows DBSCAN to identify clusters with greater accuracy.In fact, as shown in Fig. 8, without using the data enrichment procedure, the DBSCAN uses only a few natively geotagged posts, obtaining a very low F1-score (e.g., 0.21 with 10 sub-events).

Detection of sub-events on real data
For assessing the usability of our method in other real study cases, we carried out an analysis of a large dataset containing tweets about Hurricane Harvey, a Category 4 storm that hit Texas in 2017, causing about USD 200 billion in damage, and at least 82 deaths according to the Texas Department of Public Safety.Such a dataset contains about 6.7 million of tweets, which have been collected from August 25, 2017 to September 5, 2017, using specific keywords (i.e., "Hurricane Harvey", "Harvey", "HurricaneHarvey") as described in [49].
The classification model discussed in Sect."Collected data and classification of relevant one" has been used for filtering posts so as to separate the relevant tweets from the not relevant ones.In particular, we classified 1,905,585 tweets as relevant (i.e., 29% of total data), but just a small part of them are geolocated (less than 1%).Through the post enrichment phase, we were able to deduce the location where the post has been created by analyzing the text of the tweet (15% of total data).Specifically, we used a name entity recognition algorithm based on CoreNLP for identifying the locations mentioned in the text of the tweets.A clustering algorithm is used for detecting interesting clusters on such geotagged posts.Then, a procedure is carried out for identifying sub-events by analyzing the keywords (and their frequency) contained in the posts that fall into each cluster.
Since the area under analysis is very large (several American cities were hit by the hurricane) and the posts do not provide detailed geolocation information, the clusters obtained coincide with the main cities of Texas.Considering all the clusters, the posts that report sub-events are approximately 113,346 (approximately 2 % of the total data), mainly reporting damages to infrastructures (e.g., roads or houses) or to utility services (e.g., power outages or water pipes).
Figure 9 shows some significant sub-events that were discovered.In particular, two large areas with a high density of sub-events (red areas) have been discovered in the cities of Houston and Rockport.Other areas, smaller and with a lower density of sub-events (blue-green areas), have been identified elsewhere.Table 6 reports the sub-events that have been identified in the main cities involved in the disaster.Notably, Houston was found to be the city with the highest number of sub-events that occurred after the passing of Harvey, including flooded houses and damages to toxic waste sites.Also Rockport reported a high number of sub-events, such as collapsed houses, power lines downs, and damaged boats.The obtained results confirm that SEDOM-DD is able to discover a high number of sub-events that occurred after a large-scale natural disaster.

Conclusions
The widespread use of social media allows people who are victims of disasters (e.g., earthquakes, fires) to share real time information about damages, problems, and subevents that can take place at different locations after a disaster (e.g, collapsed buildings, broken gas pipes).This valuable information is known only to people located where the events occurred and can be shared with rescue teams and authorities that are far away from the area.In this paper we presented SEDOM-DD, a new method  that combines text mining and clustering analysis for discovering critical sub-events from social media data during natural disasters.Several experiments have been carried out on both real and synthetic datasets for evaluating the performance of SEDOM-DD.In particular, an analysis of a large dataset containing real tweets about Hurricane Harvey showed that SEDOM-DD was able to discover a large number of sub-events that occurred after the disaster.Moreover, other experiments on synthetic datasets demonstrated that SEDOM-DD is able to identify sub-events with a very good F1-score (greater than 85%), which confirms the high accuracy and effectiveness of the proposed approach.
For this reason, SEDOM-DD can be integrated with existing systems for coordinating and enhancing emergency response.The detected sub-events, together with the posts and photos that made it possible to detect such events, can be analyzed and validated by a group of experts to establish the type and the priority of interventions to be carried out.

Fig. 2
Fig. 2 An example of using SEDOM-DD on posts collected during the earthquake in the old town of Trani (May 21, 2019) 2 https:// nomin atim.opens treet map.org.

Fig. 3
Fig.3 Comparative analysis among several machine learning algorithms, evaluating the F1-score obtained by our approach for each dataset used in this work

Fig. 4
Fig. 4 An example of post generation and analysis through SEDOM-DD

Fig. 5 F1Fig. 6 F1
Fig. 5 F1-score for different values of the number of posts per sub-event (postPerSubEvent) and radius in meters (poiRadius) of a sub-event

Fig. 8
Fig. 8 Comparison of the F1-score obtained by SEDOM-DD with and without the enrichment of posts

Table 2
Datasets specifications

Table 3
Evaluation of the classification models made on the D 2 testset

Table 4
Simulation parameters

Table 5
Example of generatedTweet#1 is geotagged, tweets #2 and #3 contain information in the text that can be used to estimate coordinates (highlighted in bold), the others cannot be geolocated

Table 6
Main sub-events detected in tweets about Harvey CityTypes of sub-events Houston Flooded houses, airports runways and highways, damaged toxic waste sites and electrical station, destroyed cars