 Brief Report
 Open access
 Published:
GctTTE: graph convolutional transformer for travel time estimation
Journal of Big Data volume 11, Article number: 15 (2024)
Abstract
This paper introduces a new transformerbased model for the problem of travel time estimation. The key feature of the proposed GCTTTE architecture is the utilization of different data modalities capturing different properties of an input path. Along with the extensive study regarding the model configuration, we implemented and evaluated a sufficient number of actual baselines for pathaware and pathblind settings. The conducted computational experiments have confirmed the viability of our pipeline, which outperformed stateoftheart models on both considered datasets. Additionally, GCTTTE was deployed as a web service accessible for further experiments with userdefined routes.
Introduction
Travel time estimation (TTE) is an actively developing branch of computational logistics that considers the prediction of potential time expenditures for specific types of trips [1, 2]. With the recent growth of urban environment complexity, such algorithms have become highly demanded both in commercial services and general traffic management [3]. Following this line, better TTE decreases logistic costs for different kinds of delivery [4], improves enduser experience for taxi services [5], and ensures the quality of adaptive traffic control [6].
Despite the applied significance of travel time estimation, it still remains a challenging task in the case of ground vehicles. This situation arises from the influence of different patterns of road network topology, nonlinear traffic dynamics, changing weather conditions, and other types of unexpected temporal events. The majority of the currently established algorithms [7, 8] tend to utilize specific data modalities in order to capture complex spatiotemporal dependencies influencing the traffic flow. With the recent success of multimodal approaches in adjacent areas of travel demand prediction [9] and journey planning [10], fusing the features from different sources is expected to be the next step towards better performance in TTE.
In this paper, we explored the predictive capabilities of TTE algorithms with different temporal encoders and proposed a new transformerbased model GCTTTE. The main contributions of this study are the following:

1.
In order to perform the experiments with the image modality, we extended the graphbased datasets for Abakan and Omsk [11] by the map patches (image modality) in accordance with the provided trajectories. Currently, the extended datasets are the only publicly available option for experiments with multimodal TTE algorithms.

2.
In order to boost further research in the TTE area, we reimplemented and published the considered baselines in a unified format as well as corresponding weights and data preprocessing code. This contribution will enable the community to enhance evaluation quality in the future, as most of the TTE methods lack official implementations.

3.
We proposed the GCTTTE neural network for travel time estimation and extensively studied its generalization ability under various conditions. Obtained results allowed us to conclude that our pipeline achieved better performance regarding the baselines in terms of several metrics. Conducted experiments explicitly indicate that the performance of the transformerbased models is less prone to decrease (in the sense of the considered metrics) with the scaling of a road network size. This property remains crucial from an industrial perspective, as the classic recurrent models undergo considerably larger performance dropdowns.

4.
For demonstration purposes, we deployed inference of the GCTTTE model as the web application accessible for manual experiments.
The web application is available at http://gctte.online and the code is published in the GitHub repository of the project. https://github.com/Eighonet/GCTTTE.
Related work
Travel time estimation methods can be divided into two main types of approaches corresponding to the pathblind and pathaware estimation, Table 1. The pathblind estimation refers to algorithms relying only on data about the start and end points of a route [12]. The pathaware models use intermediate positions of a moving object represented in the form of GPS sequences [13], map patches [14], or a road subgraph [7]. Despite the computational complexity increase, such approaches provide significantly better results, which justify the attention paid to them in the recent studies [8, 15, 16].
One of the earliest pathaware models was the widedeeprecurrent (WDR) architecture [17], which mostly inherited the concept of joint learning from recommender systems [18]. In further studies, this approach was extended regarding the usage of different data modalities. In particular, the DeepIST [14] model utilizes rectangular fragments of a general reference map corresponding to elements of a route GPS sequence. Extracted images are fed into a convolutional neural network (CNN) that captures spatial patterns of depicted infrastructure. These feature representations are further concatenated into the matrix processed by the long shortterm memory (LSTM) layer [19].
In contrast with the other approaches, DeepTTE [20] is designed to operate directly on GPS coordinates via geospatial convolutions paired with a recurrent neural network. The first part of this pipeline transforms raw GPS sequences into a series of feature maps capturing the local spatial correlation between consecutive coordinates. The final block learns the temporal relations of obtained feature maps and produces predictions for the entire route along with its separate segments.
The concept of modality fusing was first introduced in TTE as a part of the DeepI2T [21] model. This architecture uses largescale information network embedding [22] to produce grid representations and 3layer CNN with pooling for image processing. As well as DeppTTE, DeepI2T includes the segmentbased prediction component implemented in the form of residual blocks on the top of the BiLSTM encoder.
In addition to extensively studied recurrent TTE methods, it is also important to mention recently emerged transformer models [23, 24]. Despite the limited comparison with classic LSTMbased methods, they have already demonstrated promising prediction quality, preserving the potential for further major improvements [25, 26]. As most of the transformer models lack a comprehensive evaluation, we intend to explore GCTTTE performance with respect to a sufficient number of stateoftheart solutions to reveal its capabilities explicitly.
Preliminaries
In this section, we introduce the main concepts required to operate with the proposed model, Fig. 1.
Route A route r is defined as the set \(\{c^r, a^r, t^r\}\), where \(c^r\) is the sequence of GPS coordinates of a moving object, \(a^r\) is the vector of temporal and weather data, \(t^r\) is the travel time.
As the image modality \(p^r\) of a route r, we use geographical map patches corresponding to each coordinate \(c^r_i \in c^r\). Each image has a fixed size \(256 \times 256 \times 3\) across all of the GPS sequences in a specific dataset.
Road network Road network is represented in the form of graph \(G = \{V, E, X\}\), where \(V = \{v_1,\;...\;,v_n\}\) is the set of nodes corresponding to the segments of city roads, \(E = \{(v_i, v_j) \;  \; v_i \rightarrow v_j\}\) is the set of edges between connected nodes \(v_i, v_j \in V\), \(X: n \times m \rightarrow {\textbf{R}}\) is a feature matrix of nodes describing properties of the roads’ segments (additional information regarding available graph features is provided in Additional file 1: S1).
Description of a route r can be further extended by the graph modality \(g^{r} = \{v_k\,\,k=argmin_{j} \, \rho (c^r_i, v_j)\}^{c^r}_{i=1}\), where \(\rho (c^r_i, v_j)\) is the minimum Euclidean distance between coordinates associated with \(v_j\) and \(c_i^r\). Following the same concept as in the case of \(p^r\), the graph modality represents a sequence of nodes and their features aggregated with respect to the initial GPS coordinates \(c^r\).
Travel time estimation For each entry r, it is required to estimate the travel time \(t^r\) using the elements of feature description \(\{c^r, p^r, g^r, a^r\}\).
Data
We explored the predictive performance of the algorithm on two realworld datasets collected during the period from December 1, 2020 to December 31, 2020 in Abakan (112.4 square kilometers) and Omsk (577.9 square kilometers). Each dataset consists of a road graph and associated routes, Table 2. In the preprocessing stage, we excluded trips that lasted less than 30 s (273, 0.22% for Abakan; 1194, 0.15% for Omsk) along with the ones that took more than 50 min (223, 0.18% for Abakan; 3681, 0.47% for Omsk). The distributional statistics of both datasets are depicted in Fig. 2.
Since initial versions of Abakan and Omsk datasets did not have any relevant input data for imagebased models, we extended their road graphs with the map patches parsed from Open Street Map (OSM) https://www.openstreetmap.org. The parsing algorithm extracted patches from the OSM tile server URLs in accordance with the following request template: http://a.tile.openstreetmap.org/{zoom}/{longitude}/{latitude}.png. In order to utilize obtained data together with initial graphs, it was extended by mapping tables including the closest node id for each patch. The applied proximity measure was based on the Euclidean distance between location of image centroids and geographical coordinates of graph vertexes. Due to the limitations of the API throughput, the procedure of image extraction was distributed between several machines with a total execution time exceeding 1 week.
The provided extension consists of images dated July 2022: due to the absence of significant changes in the road network topology since 2020, image modality for Abakan and Omsk remains actual with respect to the original graphbased data. The content of the patches includes a full range of geographic objects useful for travel time estimation (e.g., road networks, landscape groups, buildings and associated infrastructural objects) and covers all of the routes provided in the initial datasets.
Depending on the requirements of the considered learning model, image datasets had to be organized regarding the fixed grid partitions or centered around the elements of GPS sequences. In the first case, a geographical map of a city was divided into equal disjoint patches, which were further mapped with the GPS data in accordance with the presence of coordinates in a specific partition. Trajectorybased approach to dataset construction does not require the disjoint property of images and relies on the extraction of patches with the center in the specified coordinate, Algorithm 1 (collect and split functions can be accessed in Additional file 1: S2, S3). The obtained gridbased image dataset consists of 96,101 instances for Abakan and 838,865 for Omsk while the trajectorybased dataset has 544,502 and 3,376,294 images correspondingly.
One of the crucial features of the considered datasets is the absence of traffic flow properties. The availability of such data is directly related to the specialized tracking systems (based on loop detectors or observation cameras), which are not presented in the majority of cities. In order to make the GCTTTE suitable for the greatest number of urban environments, we decided not to limit the study by the rarely accessible data.
Method
In this section, we provide an extensive description of the GCTTTE main components: pointwise and sequence representation blocks, Fig. 3.
Patches encoder
In order to extract features from the image modality, we utilized the RegNetY [27] architecture from the SEER model family. The key component of this architecture is the convolutional recurrent neural network (ConvRNN) which controls the spatiotemporal information flow between building blocks of the neural network.
Each RegNetY block consists of three operators. The initial convolution layer of t’th block processes the input tensor \(X^t_1\) and returns the feature map \(X^t_2\). Next, the obtained representation \(X^t_2\) is fed to ConvRNN:
where \(H^{t1}\) is the hidden state of the previous RegNetY block, \(b_h\) is a bias tensor, \(\mathrm {C_x}\) and \(\mathrm {C_h}\) correspond to convolutional layers. In the following stage, \(X^t_2\) and \(H^t\) are fed as input to the last convolution layer, which is further extended by residual connection.
As the SEER models are capable of producing robust features that are wellsuited for outofdistribution generalization [28], we pretrained RegNetY with the following autoencoder loss:
where \({\mathcal {L}}\) is the binary crossentropy function, f is an image flattening operator, and W is the projection matrix of learning parameters that maps model output to the flattened image.
Auxiliary encoder
Along with the map patches and graph elements, we apply additional features \(a^r\) corresponding to the temporal and weather data (e.g., trip hour, type of day, precipitation). The GCTTTE model processes this part of the input with the help of a trivial linear layer:
where W is a matrix of learning parameters.
Graph encoder
The graph data is handled with the help of the graph convolutional layers defined as follows:
where \(h_u^{(k)}\) is a khop embedding [29] of \(u \in V\), \(h_u^{(0)} = x_u\), \(W^{(k)}\) is a matrix of learning parameters of k’th convolutional layer, \({\mathcal {N}}(u)\) is a set of neighbour nodes of u, \({{\,\textrm{AGG}\,}}_{v \in {\mathcal {N}}(u)}\) is a sum aggregarion function, and \({{\mathcal {N}}_{uv}} = \sqrt{{\mathcal {N}}(u){\mathcal {N}}(v)}\).
To accelerate the convergence of the GCTTTE model, we pretrained the weights of the graph convolutions by the Deep Graph InfoMax algorithm [30]. This approach optimizes the loss function that allows learning the difference between initial and corrupted embeddings of nodes:
where \(h_u\) is an embedding of node u based on the initial graph \({\mathcal {G}}\), \({\tilde{h}}_u\) is an embedding of a node u from the corrupted version \(\tilde{{\mathcal {G}}}\) of the graph \({\mathcal {G}}\), D corresponds to the discriminator function.
The final output of the pointwise block constitutes a concatenation of the weighted representations and auxiliary data for each route r with k segments:
where \(H^r\) is the matrix of size \(k \times e_g\) of graphbased segment embeddings, \(I^r\) is the matrix of size \(k \times e_i\) obtained from a flattened RegNet output, \(\alpha\), \((1  \alpha )\), and \(\beta\) correspond to the weight coefficients of specific modalitites.
Sequence representation block
To extract sequential features from the output of the pointwise representation block, it is fed to transformer encoder [31]. The encoder consists of two attention layers with a residual connection followed by a normalization operator. The multihead attention coefficients are defined as follows:
where \(x_i, x_j \in P_r\), h is an attention head, \(d_k\) is a scale coefficient, \(W^T_{h, q}\) and \(W^T_{h, k}\) are query and key weight matrices, \(w_j\) is a vector of softmax learning parameters. The output of the attention layer will be:
where \(W^T_{h, v}\) is value weight matrix, H is a number of attention heads.
The final part of the sequence representation block corresponds to the flattening operator and several linear layers with the ReLU activation, which predict the travel time of a route.
Results
In this section, we reveal the parameter dependencies of the model and compare the results of the considered baselines.
Experimental setup
The experiments were conducted on 16 GPU Tesla V100. For the GCTTTE training, Adam optimizer [32] was chosen with a learning rate \(5\times 10^{5}\) and batch size of 16. For better convergence, we apply the scheduler with patience equal to 10 epochs and 0.1 scaling factor. The training time for the final configuration of the GCTTTE model is 6 h in the case of Abakan and 30 for Omsk.
The established values of quality metrics were obtained from the 5fold crossvalidation procedure. As the measures of the model performance, we use mean absolute error (MAE), rooted mean squared error (RMSE), and 10% satisfaction rate (SR). Additionally, we compute mean absolute percentage error (MAPE) as it is frequently applied in related studies.
Models comparison and evaluation
The results regarding pathblind evaluation are depicted in Table 3. Neighbor average (AVG) and linear regression (LR) demonstrated the worst results among the trivial baselines as long as gradient boosted decision trees (GBDT) explicitly outperformed more complex models in the case of the largest city. The MURAT model achieved the best score for Abakan in terms of MAE and RMSE, while GCTTTE has the minimum MAPE among all of the considered architectures.
Demonstrated variability of metric values makes the identification of the best model rather a hard task for a pathblind setting. The simplest models are still capable to be competitive regarding such architectures as MURAT, which was expected to perform tangibly better on both considered datasets. The results regarding GCTTTE can be partially explained by its structure as it was not initially designed for a pathblind evaluation.
As can be seen in Table 4, the proposed solution outperformed baselines in terms of the RMSE value, which proves the rigidity of GCTTTE towards large errors prevention. The comparison of MAE and RMSE for considered methods has shown a minimal gap between these metrics in the case of GCTTTE for both cities, signifying the efficiency of the technique with respect to dataset size. Overall, the results have confirmed that GCTTTE appeared to be a more reliable approach than the LSTMbased models: while MAPE remains approximately the same across topperforming architectures, GCTTTE achieves significantly better MAE and RMSE values. Conducted computational experiments also indicated that DeepI2T and WDR have intrinsic problems with the convergence, while GCTTTE demonstrates smoother training dynamics.
Performance analysis
In the case of both datasets, dependencies between the travelled distance and obtained MAE on the corresponding trips reveal similar dynamics: as the path length increases, the error rate continues to grow, Fig. 4b, d. The prediction variance is inversely proportional to the number of routes in a particular length interval except for the small percentage of the shortest routes. The main difference between the MAE curves is reflected in the higher magnitudes of performance fluctuations in Abakan compared to Omsk.
The temporal dynamics of GCTTTE errors exhibit rich nonlinear properties during a 24hour period. The shape of the error curves demonstrates that our model tends to accumulate a majority of errors in the period between 16:00 and 18:00, Fig. 4a, c. This time interval corresponds to the end of the working day, which has a crucial impact on the traffic flow foreseeability.
Despite the mentioned performance outlier, the general behaviour of temporal dependencies allows concluding that GCTTTE successfully captures the factors influencing the target value in the daytime. With the growing sparsity of data during night hours, it is still capable of producing relevant predictions for Omsk. In the case of Abakan, the GCTTTE performance drop can be associated with a substantial reduction in intercity trips number (which emerged to be an easier target for the model).
Focusing on higher levels of seasonality, day and weekbased temporal dependencies of error demonstrate explicit periodical behaviour, Fig. 5. The GCTTTE model performs better at the end of the week for both considered cities, with a pronounced error decrease in the case of Omsk. In contrast, the middle of the week (i.e. Wednesday for Abakan and Tuesday for Omsk) is the most challenging period, which has averagely 12.48% higher MAE compared to Saturday and Sunday.
Sensitivity analysis
In order to achieve better prediction quality, we extensively studied the dependencies between GCTTTE parameters and model performance in the sense of the MAE metric. The best value for modality coefficient \(\alpha\) was 0.9, which reflects the significant contribution of graph data towards error reduction. For the final model, we utilized 2 graph convolutional layers with hidden size 192, Fig. 6a, b. The lack of aggregation depth can significantly reduce the performance of GCTTTE, while the excessive number of layers has a less expressive negative impact on MAE. A similar situation can be observed in the case of the hidden size, which is getting close to a plateau after reaching a certain threshold value.
Along with the graph convolutions, we explored the configuration of the sequence representation part of GCTTTE. Since the transformer block remains its main component, the computational experiments were focused on the influence of encoder depth on quality metrics, Fig. 3c. As it can be derived from the Ushaped dependency, the best number of attention layers is 3.
Demonstration
In order to provide access to the inference of GCTTTE, we deployed a demonstrational application http://gctte.online in a website format, Fig. 7. The application’s interface consists of a user guide, navigation buttons, erase button, and a comparison button. A potential user can construct and evaluate an arbitrary route by clicking on the map at the desired start and end points: the system’s response will contain the shortest path and the corresponding value of the estimated time of arrival.
For additional evaluation of considered baselines, the limited number of predefined trajectories with known ground truth can also be requested. In this case, the response will contain three random trajectories from the datasets with associated predictions of WDR, DeepI2T, and GCTTTE models along with the real travel time.
Conclusion
In this paper, we introduced a multimodal transformer architecture for travel time estimation and performed an extensive comparison with the other existing approaches. Obtained results allow us to conclude that the transformerbased models can be efficiently utilized as sequence encoders in the pathaware setting. Our experiments with different data modalities revealed the superior importance of graphs compared to map patches. Such an outcome can be explained by the inheritance of main features between modalities where graph data represents the same properties more explicitly. In further studies, we intend to focus on the design of a more expressive image encoder as well as consider the task of pathblind travel time estimation, which currently remains challenging for the GCTTTE model.
Availability of data and materials
Considered models and datasets are available in the project’s GitHub repository.
References
Jenelius E, Koutsopoulos H. Travel time estimation for urban road networks using low frequency probe vehicle data. Transport Res Part B Methodol. 2013;53:64–81.
Wu X, Roy U, Hamidi M, Craig B. Estimate travel time of ships in narrow channel based on AIS data. Ocean Eng. 2020;202: 106790.
Xuegang J, Ban XJ, Li Y, Skabardonis A, Margulici J. Performance evaluation of travel time estimation methods for realtime traffic applications. Intell Transp Syst J. 2010;14:54–67.
Salehi S, Mahmoudabadi A. Estimating the reliability of travel time on railway networks for freight transportation. Urban Stud Public Adm. 2018;1:75. https://doi.org/10.22158/uspa.v1n1p75.
Shi C, Chen BY, Li Q. Estimation of travel time distributions in urban road networks using lowfrequency floating car data. ISPRS Int J Geo Inform. 2017;6(8):253.
NavarroEspinoza A, LópezBonilla OR, GarcíaGuerrero EE, TleloCuautle E, LópezMancilla D, HernándezMejía C, InzunzaGonzález E. Traffic flow prediction for smart traffic lights using machine learning algorithms. Technologies. 2022;10(1):5. https://doi.org/10.3390/technologies10010005.
Wang Q, Xu C, Zhang W, Li J. Graphtte: travel time estimation based on attentionspatiotemporal graphs. IEEE Signal Process Lett. 2021;28:239–43.
DerrowPinion A, She J, Wong D, Lange O, Hester T, Perez L, Nunkesser M, Lee S, Guo X, Wiltshire B et al. Eta prediction with graph neural networks in google maps. In: Proceedings of the 30th ACM International conference on information and knowledge management. 2021. pp. 3767–3776.
Chu KF, Lam AYS, Li VOK. Deep multiscale convolutional lstm network for travel demand and origindestination predictions. IEEE Transact Intell Transport Syst. 2020;21(8):3219–32. https://doi.org/10.1109/TITS.2019.2924971.
He P, Jiang G, Lam SK, Sun Y, Ning F. Exploring public transport transfer opportunities for pareto search of multicriteria journeys. IEEE Transact Intell Transport Syst. 2022;23(12):22895–908. https://doi.org/10.1109/TITS.2022.3194523.
Porvatov V, Semenova N, Chertok A. Hybrid graph embedding techniques in estimated time of arrival task. In: Benito RM, Cherifi C, Cherifi H, Moro E, Rocha LM, SalesPardo M, editors. Complex networks their applications X. Cham: Springer; 2022. p. 575–86.
Wang H, Tang X, Kuo YH, Kifer D, Li Z. A simple baseline for travel time estimation using largescale trip data. ACM Transact Intell Syst Technol (TIST). 2019;10(2):1–22.
Wang Y, Zheng Y, Xue Y. Travel time estimation of a path using sparse trajectories. In: Proceedings of the 20th ACM SIGKDD International conference on knowledge discovery and data mining. KDD ’14. New York: Association for Computing Machinery. pp. 25–34
Fu TY, Lee WC. Deepist: Deep imagebased spatiotemporal network for travel time estimation. In: Proceedings of the 28th ACM International conference on information and knowledge management. CIKM ’19. New York: Association for Computing Machinery; 2019.
Zhang H, Wu H, Sun W, Zheng B. Deeptravel: a neural network based travel time estimation model with auxiliary supervision. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence. 2018. pp. 3655–3661.
Sun Y, Fu K, Wang Z, Zhang C, Ye J. Road network metric learning for estimated time of arrival. In: 2020 25th International conference on pattern recognition (ICPR). New York: IEEE; 2021. pp. 1820–1827
Wang Z, Fu K, Ye J. Learning to estimate the travel time. In: Proceedings of the 24th ACM SIGKDD International conference on knowledge discovery and data mining. KDD ’18. New York: Association for Computing Machinery; 2018. pp. 858–866.
Cheng H.T, Koc L, Harmsen J, Shaked T, Chandra T, Aradhye H, Anderson G, Corrado G, Chai W, Ispir M, et al.: Wide & deep learning for recommender systems. In: Proceedings of the 1st workshop on deep learning for recommender systems. 2016. pp. 7–10.
Hochreiter S, Urgen Schmidhuber J, Elvezia C. Long shortterm memory. Neural Comput. 1997;9(8):1735–80.
Wang D, Zhang J, Cao W, Li J, Zheng Y. When will you arrive? estimating travel time based on deep neural networks. In: AAAI. 2018; 2018.
Lan W, Xu Y, Zhao B. Travel time estimation without road networks: an urban morphological layout representation approach. In: Proceedings of the 28th International joint conference on artificial intelligence. IJCAI’19. Washington: AAAI Press; 2019. pp. 1772–1778.
Tang J, Qu M, Wang M, Zhang M, Yan J, Mei Q. Line: Largescale information network embedding. In: Proceedings of the 24th International conference on world wide web. 2015. pp. 1067–1077.
Liu F, Yang J, Li M, Wang K. Mcttte: travel time estimation based on transformer and convolution neural networks. Sci Progr. 2022;2022:3235717.
Semenova N, Porvatov V, Tishin V, Sosedka A, Zamkovoy V. Logistics, graphs, and transformers: towards improving travel time estimation. In: Amini MR, Canu S, Fischer A, Guns T, Kralj Novak P, Tsoumakas G, editors. Machine learning and knowledge discovery in databases. Lecture notes in computer science, vol. 13718. Springer, Cham; 2023. https://doi.org/10.1007/9783031264221_36.
Shen Y, Jin C, Hua J, Huang D. Ttpnet: a neural network for travel time prediction based on tensor decomposition and graph embedding. IEEE Transact Knowl Data Eng. 2022;34(9):4514–26. https://doi.org/10.1109/TKDE.2020.3038259.
Fan S, Li J, Lv Z, Zhao A. Multimodal traffic travel time prediction. In: 2021 International Joint Conference on Neural Networks (IJCNN). 2021. pp. 1–9. https://doi.org/10.1109/IJCNN52387.2021.9533356.
Radosavovic I, Kosaraju R.P, Girshick R, He K, Dollar P. Designing network design spaces. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition (CVPR). 2020.
Goyal P, Duval Q, Seessel I, Caron M, Misra I, Sagun L, Joulin A, Bojanowski P. Vision models are more robust and fair when pretrained on uncurated images without supervision. arXiv. 2022. https://doi.org/10.48550/arXiv.2202.08360.
Kipf TN, Welling M. Semisupervised classification with graph convolutional networks. In: Proceedings of the 5th international conference on learning representations. Palais des Congrès Neptune, France; 2017.
Veličković P, Fedus W, Hamilton W.L, Liò P, Bengio Y, Hjelm R.D. Deep Graph Infomax. In: International conference on learning representations. 2019.
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser LU, Polosukhin I. Attention is all you need. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R, editors. Advances in neural information processing systems, vol. 30. Red Hook: Curran Associates; 2017.
Kingma D, Ba J. Adam: a method for stochastic optimization. In: International conference on learning representations. 2014.
Acknowledgements
Authors are grateful to Vladislav Zamkovy for the help with application deployment.
Funding
Not applicable.
Author information
Authors and Affiliations
Contributions
VM, VC, and AI: software, data curation, validation, visualization; VP: software, visualization, conceptualization, methodology, writing (original draft); NS: conceptualization, methodology, supervision, writing (review , editing).
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Additional file 1:
Trajectorybased approach to dataset construction does not require the disjoint property of images and relies on the extraction of patches with the center in the specified coordinate, Algorithm 1 (collect and split functions can be accessed in Additional file 1: S2, S3).
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Mashurov, V., Chopuryan, V., Porvatov, V. et al. GctTTE: graph convolutional transformer for travel time estimation. J Big Data 11, 15 (2024). https://doi.org/10.1186/s40537023008411
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s40537023008411