Skip to main content
Fig. 1 | Journal of Big Data

Fig. 1

From: Search-engine-based surveillance using artificial intelligence for early detection of coronavirus disease outbreak

Fig. 1

The framework of keyword selection, containing data preprocessing, dataset split, model construction, and feature analysis. Specific searching words are filtered and standardized by population and the gross domestic product data of each province. Then, the standardized data are represented as graphs with connections between the nodes whose cosine similarity was > 0.9. The graphs are split into the training set (for the train feature learning model) and validation set (only for validating the performance of the feature learning model) corresponding to search engine data up to January 29, 2020. The graph convolution network (GCN) model is used as the feature learning model to learn the relationship between the searching data and epidemic situation. After validation of the GCN model, the importance of the searching words is decomposed by segmenting each node of the graph in turn and evaluating their effect on the result

Back to article page