 Research
 Open Access
 Published:
RTiSR: a reviewdriven time intervalaware sequential recommendation method
Journal of Big Data volume 10, Article number: 32 (2023)
Abstract
The emerging topic of sequential recommender (SR) has attracted increasing attention in recent years, which focuses on understanding and learning the sequential dependencies of user behaviors hidden in the useritem interactions. Previous methods focus on capturing the pointwise sequential dependencies with considering the time evenly spaced. However, in the real world, the time and semantic irregularities are hidden in the user’s successive actions. Meanwhile, with the tremendous increase of users and items, the hardness of modeling user interests from spare explicit feedback. To this end, we seek to explore the influence of itemaspect reviews sequence with varied time intervals on sequential modeling. We present RTiSR, a reviewdriven time intervalaware sequential recommendation framework, to predict the user’s next purchase item by jointly modeling the sequence dependencies from aspectaware reviews. The main idea is twofold: (1) explicitly learning user and item representation from reviews by assigning different weights, and (2) leveraging a hybrid neural network to capture the collective sequence patterns with a flexible order from aspectaware review sequences. We conduct extensive experiments on industrial datasets to evaluate the effectiveness of RTiSR. Experimental results demonstrate the superior performance of RTiSR in different evaluation metrics, compared to the stateoftheart competitors.
Introduction
In the big data era, recommender systems (RSs) play an important role in helping users find potential items of interest via sifting from massive choices [1, 2]. RSs have been widely applied in online shopping websites, contentshare platforms, social networking and etc. The traditional RSs with the representation of collaborative filter (CF) [3, 4] and contentbased [5] methods that focus on modeling the useritem interactions in a static view, thus these methods can only learn the general preferences of users. Compared to the traditional RSs, the emerging topic of sequential recommender systems (SRSs) aims to predict the next item or next few items successively based on users’ sequential interaction behaviors, which can be easy to capture the sequential dependencies for more accurate recommendations. As a result, the sequential recommendation has received increasing attention in recent years [6, 7].
The key challenge of SRSs is to dynamically estimate the user’s current preference by adaptive modeling the user’s sequential patterns in historical interactions. Following this line, various methods have been proposed to learn the sequential patterns in user historical interactions, such as Markov Chains (MC)based models [8, 9], and the recurrent neural networks (RNN)based models [10]. For MCbased methods, they utilize a Kgram Markov chain to model the interactions between user and item in a sequence, for predicting the next item the user will purchase. FMbased methods usually utilize matrix factorization or tensor factorization to factorize the observed useritem interactions into latent factors of users and items for recommendation. Regarding of DNNbased methods, CNNbased methods utilize the convolutional filter and sliding windows strategies to capture the shortterm contexts for prediction. RNNbased methods try to estimate the next possible interaction via modeling the sequential dependencies over the given interactions, which usually adopt the gatedrecurrent (GRU) or long shortterm memory (LSTM) units to learn the sequential dependencies hidden in the useritem interactions.
Although the previous solutions have achieved satisfactory results, they commonly suffer from the following defects:

The hardness of modeling the user preferences and item features from spare explicit feedback. In most past studies, ratings are used as the only criterion of feedback information to measure the degree of user preference for the specified item. However, the ratings only reflect the overall satisfaction over an item without further details. As shown in Fig. 1, on the website Amazon both users buy the album “California Girls” with positive reviews, while user1 also buys the album “Pet Sounds” but give a negative review. Without consideration of the reviews, existing ratingbased methods would recommend “Pet Sounds” to the user2 as it was rated by user1. However, it turns out improper recommendation. Thus, the improper recommendation is happened when only considering user connectivity but ignoring the reviews.

The semantic and item irregularities are hidden in the user’s successive actions. For the majority of users, the sequential dependencies of their interaction behaviors are not strictly ordered. Due to the uncertainty of user shopping behaviors, the main purpose behind the user behavior sequence is not clean. Thus, in the real world, some user behavior sequences are not strictly ordered, i.e., not all adjacent interactions are sequentially dependent in a sequence. For instance, as shown in Fig. 2, given the historical interaction sequence of a user as: S = {Music_Player 1, Music_Player 2, Music_Player 3, Sports Tracker}. It seems that the first three items in S indicate that the user has a higher probability of buying a music player next, than buying sports tracker software. However, it is not a valid recommendation, since the user chooses the sports tracker app after three months. Hence, this kind of temporal distance deserves specific handling.
Hence, the research question arises:

R.Q. How to capture the collective sequential dependencies with a flexible orders become a key challenge in sequential recommendation domain?
To tackle the above defects, we propose an Reviewdriven Time interval aware Sequnetial Recommendation framework, named RTiSR, to predict user’s next purchase item via modeling dynamical preferences of user and item extracted from aspectaware reviews:

First, compared to the ratings, review texts are not only much more expressive than ratings, but also provide a strong tool to explain the underlying dimensions behind users’ decisions. On that basis, we employ the reviews to build the sequential recommendation model with more accurate user and item representations. Specifically, userprovided reviews can be viewed from the aspects of user and item. From the user aspect, a user’s review set reflects the experience of buying diverse items. From the item aspect, an item’s review set includes the reviews written to an item, and usually exhibits the various features of the specific item. Therefore, learning the user and item representations from reviews has a strengthening effect on sequential recommendation. In Fig. 3, we can observe that user u has a diverse purchase behavior from the useraspect reviews, and reflects the user’s preference more accurately. And the itemaspect reviews exhibit the various features of the specific item v from different users. In this case, the matching aspect reviews significantly help recommend models to predict the user’s next purchase item, and u indeed marked a 5.0 score on v after purchasing it.

Second, we also notice that there is no strict order and time between user sequential behaviors due to the uncertainty of user behaviors. The existing timeaware sequential recommenders [11, 12] always assume that the items in sequence can be considered evenly spaced and semantically consistent. However, in the practical recommendation scenarios, the user’s behavior sequence is complex. As shown in Fig. 4, the time intervals between two adjacent reviews can be various, with the max interval of two adjacent reviews being about 160 days past. Intuitively, the two actions within a short time interval tend to share a closer relationship than two actions within a long time interval. Thus this kind of temporal distance deserves special handling. To do it, we leverage a couple of convolutional fitters with varying sizes to effectively learn the user and item latent factors with flexible order.
Specifically, RTiSR consists of two homogeneous hybrid neural networks named UNet and INet, which combine a recurrent neural network and a convolutional neural network. The UNet focuses on exploiting user representation based on the reviews written by the user, while INet captures item multiple features from the reviews written for items. Then, a useritem interaction model (e.g., Factorization Machine) is applied to model the complex useritem interactions and output the predictions. The contributions of this paper are threefold:

We propose RTiSR, a novel reviewdriven time intervalaware framework that exploits reviews with time interval information for a sequential recommendation, which captures sequence dependencies from the aspectaware review sequence respectively.

We introduce the flexible sequential pattern learning layer to learn the collective sequential dependencies with flexible order. We regard the embedded sequential reviews with explicit time interval information as an image, then employ multisize convolution fitters to capture the collective sequential dependencies with flexible order, rather than the pointwise way.

We conduct extensive experiments deployed on five realworld datasets. The experimental results demonstrate that the RTiSR achieves competitive and superior HR/NDCG results, compared to SOTA methods.
The remainder of this paper is organized as follows. Section summarizes the related work. Section gives the problem statement formally and describes the proposed RTiSR in detail. Section presents the experiment details. Section concludes our paper.
Related work
Sequential recommender
User’s shopping behaviors usually successively in a sequence, rather than in an isolated manner. Thus, the key to accurate recommendation is properly capturing user dynamic preferences based on their historical interactions. Many approaches have been proposed to model users’ interactions in a sequential manner for prediction or recommendation. The first line of research is that introducing a Kgram Markov chain to model the interactions between user and item in a sequence, for predicting the next item the user will purchase. For instance, Garcin et al. [13] adopt the firstorder Markov Chains to capture the sequential pattern from user’s browsing behaviors. Redle et al. [9] proposed FPMC to predict user’s nextbasket behaviors via factorizing the Markov Chains of user behaviors with tensor factorization method. He et al. [14] proposed Fossil that integrates a similarity model into the highorder MC for the next item recommendation. Recently, recent neural network (RNN) played a dominant role in sequential recommendation, due to their success in sequence modeling on the natural language process (NLP) domain. They try to estimate the next possible interaction via modeling the sequential dependencies over the given interactions, which usually adopt the gatedrecurrent (GRU) [15] or long shortterm memory (LSTM) [10] units to learn the sequential dependencies hidden in the useritem interactions. For example, GRU4Rec [15] first utilize the RNN network to learn the sequential patterns. Quadrana et al. [16] propose a hierarchical RNN network to capture the crosssession dependencies in user behavior sequence. However, the highorder Markov chain model involves limited historical information, while the RNNbased methods are built on the strict order assumption.
To cope with the shortcomings of MCbased and RNNbased models, several attentionbased RNN methods have been proposed to handle the noise or irrelevant interactions in user behavior sequence [7, 17, 18]. To model the unionlevel and pointlevel sequential patterns, Tang et al. proposed Caser, a convolutional sequence embedding recommendation model, which utilizes the convolutional filter with horizontal and vertical sliding windows strategies to capture the shortterm contexts for prediction [19]. Li et al. proposed TiSASRec [12] that leverages a selfattention model for sequential recommendation, which can adaptive assign varying weights to different items by considering the absolute position and time interval in a sequence. Although the above mentioned solutions have achieved satisfactory results, the rich semantic information hidden in userprovided reviews are seriously overlooked in sequential recommendation. However, the above mentioned methods with two aspects: (1) they focus on capturing the pointwise dependencies only while ignoring the collective dependencies; (2) Due to the strong assumption that any adjacent interactions must be dependent, the RNNbased methods perform well on dense datasets, but show poor performance on sparse datasets.
Similar to our work, Li et al. proposed a reviewdriven neural model that captures the unionlevel and individuallevel sequential dependencies by exploiting reviews [20]. Liu et al. [21] proposed an endtoend neural network model to capture users’ longterm and shortterm preferences. However, they focus on capturing the sequence patterns on the useraspect review sequence and ignore the importance of temporal dynamical of the itemaspect review sequence.
Reviewbased recommender
User reviews have been introduced as one of the important approaches to improve user and item representations in the recommendation domain, and have received lots of attention in RS research [20, 22, 23]. In earlier, some solutions focus on building a topic model to extract latent topics from reviews [24,25,26]. For instance, McAuley and Leskovec [24] adopt the Latent Dirichlet Allocation (LDA) model to discover users’ and items’ latent spaces from reviews. Tan et al. [26] tried to capture user preference with ratings and reviews, by mapping user preferences and item features into a latent topic space. However, these above methods consider the review in a bagofwords representation and ignore the semantic information hidden in the reviews.
Recently, several methods focus on employing deep learning techniques to capture user preferences from reviews for a recommendation. Zheng et al. [27] proposed DeepCoNN, a neural networkbased framework that learns the user preferences and item features from reviews jointly. Based on DeepCoNN, Catherine et al. [28] proposed TransNets that insert an additional latent layer to represent the useritem pair. Tay et al. [29] propose MPCN, a multipointer coattention neural network by exploiting reviews. Besides, Wu et al. [30] proposed CARL that derives two separate learning components from exploiting review data and interaction data. By leveraging the userprovided reviews, these abovementioned methods achieved competitive performance with a more accurate user and item representation. However, the abovementioned methods focus on the rating prediction task and do not consider enhancing sequential recommendation with userprovided reviews.
Methodology
In this section, we first give a formal problem statement of a reviewdriven sequential recommendation task and then depict the details of the proposed RTiSR.
Problem statement
For the convenience of expression, we first introduce some notations used in this paper, as shown in Table 1. Then we formalize the relevant definitions and problems as follows:
Definition 1
(Interaction record) Given user set \(\textbf{U}=\{u_1, u_2, \ldots , u_{\textbf{U}}\}\), item set \( \mathcal {V} = \{v_1, v_2, \ldots , v_{\mathcal {V}}\} \), an interaction record \( s= (u, v, D^u_v, y^u_v, t^u_v) \) is a fiveelements tuple, which means user u rated on item v with given a review \( D^u_v, \) and rating \( y^u_v \) at time \( t^u_v \).
Definition 2
(Interaction sequence) Let \( S^u \) be the chronologically ordered behavior sequence of a user u, then it can formally define as \(S^{u}\)= {\(s^u_1\), \(s^u_2, \ldots , s^u_{S^u} \)}, where \(s^u_t\) denotes the user u happens a interaction record at time t. Similarly, the behavior sequence happened on an item v can also be defined as \( S^{v} \) = {\(s^v_1 \), \(s^v_2 \), \( \dots \), \(s^v_{S^v} \)}.
Definition 3
(Reviewdriven sequential recommendation) Given the recently purchased L items of user u and the corresponding review documents \(\mathcal {D}^{u,L}\), the goal is to generate a recommendation list with top ranked candidate items via computing the likelihood she will purchase in the next.
The architecture of RTiSR
In this work, we seed to exploit userprovided reviews to enhance the performance of sequential recommendations with implicit feedback. Figure 5 depicts the network architecture of RTiSR. The objective of RTiSR is to derive the user and item collective sequential patterns with a flexible order from the aspectaware review sequences. To do it, RTiSR consists of two homogeneous neural networks named UNet and INet. RTiSR includes three layers: the time intervalaware review embedding layer, the flexible sequence pattern learning layer, and the prediction layer. Specifically, the review embedding layer organizes the userprovided review set to the aspectaware review sequences, then encodes each review into an embedding vector by utilizing a Bidirectional Long ShortTerm Memory (BiLSTM) network. After that, the time interval model is proposed to assign dynamical weights to different reviews in a sequence. Then the flexible sequence pattern learning layer assembles the learned time intervalaware review embedded as an “image”, and adopts a set of convolution kernels with varied sizes to learn the flexible sequential patterns. In the prediction layer, a factorization machine (FM) model is adopted to model the complex useritem interaction and obtain recommendations.s adopted to model the complex useritem interaction and obtain recommendations.
The time intervalaware review embedding layer
In traditional sequence recommendation models, the user and item are generally embedded with onehot vectors by using user and item ID, whose dimension is equal to the size of the item set. However, this representation suffers from serious dimensional disaster and data sparsity problems [31], especially when the size of the item set reaches millions or even larger.
To address this issue, we consider userprovided reviews as a strong supplementary to understanding user behavior. Thus, RTiSR learns the user and item representation via exploiting reviews. Suppose that a review has m words. By employing a word embedding matrix \( \textbf{E}^w \in \mathbb {R}^{d_w \times \mathcal {V'}} \) with a table lookup operation, each review can be encoded as a matrix \( \mathcal {D}' \):
where \( \textbf{wd}_i \) is the ith word embedding in a review, \( d_w \) is the dimension of a word vector, and \( \mathcal {V'} \) is the whole vocabulary of words. The pretrained \( \textbf{E}^w \) can be obtained via word embedding methods such as word2vec [32] and GloVe [33], which are widely used in NLP.
Compared to the basic LSTM unit, BiLSTM unit can capture the both of syntax and meaning of words well, which contains the forward and backward LSTM units. Given the input word \( \textbf{w}_k \) and previous hidden state \( \textbf{h}_{k1}\), the sequential updating process of a LSTM unit can be expressed as:
Let \( \overrightarrow{\textbf{h}}_k \) and \( \overleftarrow{\textbf{h}}_{k} \) represent the hidden state of the forward and backward LSTMs at the kth step, then the final hidden state of a BiLSTM unit can be formulated as:
where \( \widetilde{\textbf{h}}_k \) is the final hidden state of a BiLSTM unit, i.e., \( \widetilde{\textbf{h}}_k = \left[ \overrightarrow{\textbf{h}}_{k}, \overleftarrow{\textbf{h}}_{k} \right] \). Please note that consider the balance between efficiency and performance, we choose BiLSTM as our encoder. This function can be realized by other RNN networks, such BiGRU or Transformer.
Thus, the ith review embedding vector \( \textbf{e}_{i} \) can be obtained:
Time interval model. In realworld recommendation scenarios, the user’s behavior sequence is complex. As shown in Fig. 4, the time intervals between two adjacent reviews can be various. Intuitively, the two actions within a short time interval tend to share a closer relationship than two actions within a long time interval. Thus this kind of temporal distance deserves special handling. However, previous timeaware sequential recommendation methods [11, 12] always assume that the items in sequence can be considered evenly spaced. The only influence factor to the following items is position.
In this work, we view the time interval between two consecutive behaviors as the relation between two interactions. To capture the influence of time intervals on review embedding, we introduce a time interval model to assign dynamic weights to different reviews.
Let \(T = \{t_1,t_2, \ldots ,t_{n}\}\) denotes the timestamp sequence of user u or item v, the time interval of two items can be denoted as \( t_{i+1}  t_{i}, 1 \le i < n \). To obtain the personalized time interval, we divide it by the interval between the longest and the shortest time interval (except for 0) in an aspectaware sequence. Specifically, for user u, the personalized interval \( \bigtriangleup {t}^{u}_{i} \) is:
where \( min(\bigtriangleup t^u) \) and \( max(\bigtriangleup t^u) \) is the minimum and maximum value of all time intervals in the user u’s behavior sequence.
Thus, the final review embedding \( \textbf{e}'_i \) can be expressed as:
The flexible sequence pattern learning layer
To capture the multiple sequence patterns hidden in the user behaviors, we introduce several convolution kernels \( \textbf{F}_* \) to obtain user and item latent features with a flexible order. First, RTiSR constructs the \( L \times d_c \) (i.e., \( d_{c} = d_{l} \times m \)) matrix \( \textbf{E}^{s} \) as the embedding image of the previous n reviews of user u or item v in the sequence, which can be expressed as follow:
As a result, the collective sequence patterns can be considered as the local features of this embedding image \( \mathbf {E^s} \).
Then it utilizes a set of convolution filters with different size to search for sequential patterns. Figure 6 shows two horizontal filters that capture two unionlevel sequential patterns, which represented as \( h\times d_c \) matrices. Specially, one horizontal filter \( \textbf{F}^1 \) with size of \( 2\times d_c\), while the other filter \( \textbf{F}^2 \) with size of \( 3 \times d_c \). They search the hidden sequential patterns by sliding over the rows of \( \textbf{E} \). For instance, the recommended item “drinks” can be found, when adopting the first filter \( \textbf{F}^1 \) works on the sequential pattern “(movie, popcorn)”. Similarly, it picks up the sequential pattern “(iphone, ipad, iwatch) \( \rightarrow \) airpods”, since iphone, ipad and iwatch have large value in the latent dimensions via the second filter \( \textbf{F}^2 \). More details are explained below.
In this layer, we employ several convolution kernels \( \textbf{F}_* \) with different size to learn the sequence dependencies with a flexible order. A set of \( n_c \) horizontal filters \( \textbf{F}^k \in \mathbb {R}^{h \times d_c} \) (\( 1 \le k \le n_c \)) is utilized to achieve it, where \( h \in \{1, \ldots , L\} \) is the filter’s height. For instance, if \( L=5 \), one may choose to have \( n_c = 10 \) filters, two for each h in \( \{1,2,3,4,5\} \). In detail, \( \textbf{F}^k \) slides from top to bottom on \( \mathbf {E^s} \) and interacts with all horizontal dimensions of \( \mathbf {E^s} \) of the reviews i, \( 1 \le i \le Lh+1 \). Therefore, the ith convolution value of filter \( \textbf{F}^k \) is:
where \( \mathbf {E^s}_{i:i+h1} \) is the submatrix from the row i to row \( (ih+1) \) of \( \mathbf {E^s} \), \( \phi _c \) is the activation function.
The final convolution result of \( \textbf{F}^k \) is:
After that, the max operation is introduced to extract the maximum value from (9), where it represents the most meaningful feature extracted by filter \( \textbf{F}^k \). For the \( n_c \) convolution filters, the output value \( \textbf{x} \in \mathbb {R}^{n_c} \) is:
Finally, the user and item representations can be obtain by utilizing a fully connected layer on the output of convolutional layer:
where \( \textbf{o} \in \mathbb {R}^{d_o}\) is the latent vector of a user or an item with the size of \( d_o \).
In the flexible sequence pattern learning layer, the horizontal convolution filter interacts with every successive h review in the embedding matrix \( E^s \). With sliding horizontal filters of varied heights, a significant signal will be extracted from the review sequence including multiple sequence patterns, user preference, and item features with position and time interval information.
Here we put the user and item representations (i.e., \(\textbf{o}^u\), \(\textbf{o}^v\)) in the output of the fullconnected layer for two main reasons: (1) it can have the ability to generalize other recommendation models. (2) Our model parameters can be initialization with other generalized models’ welltrained parameters. As stated in [34], such pretraining is vital to model performance.
The prediction layer
After the flexible sequence pattern learning layer, we obtain the combined representation of review, time interval, and sequence patterns to user u and item v. To estimate the likelihood of next purchase item, we utilize a factorizationbased model [35] to estimate the user preference score of item v. The model details can be expressed as follows:
where \( \varvec{\Psi } \) is the concatenate result of \( \textbf{o}^u \) and \( \textbf{o}^v \) (i.e., \( \varvec{\Psi } = \textbf{o}^u \oplus \textbf{o}^v \)), \( \omega _* \) are the bias terms, \( \left<\textbf{v}_i,\textbf{v}_j\right>\) represents the interaction between the ith and jth variables, \( d_r =2 d_o \), k is the size of variable \( \textbf{v} \).
Model optimization
The model parameters of RTiSR include review embedding, two hybrid neural networks, and factorization machine. For each useritem pair (u, v) , we extract the L successive interactions of user u and item v as a training instance \( l = (S^{u,L},S^{v,L}) \). Following [20], for each training instance \( l_v \) with target item v, we randomly sample \( \zeta \) negative items (i.e., \( v' \notin S^u \)) with their L sucessive interactions, denoted as \( \mathcal {N}(v) \). Let \( C^u \) be the collection of user u’s all training instances. We transform model output scores into the range (0, 1) by a sigmoid function \( \sigma (x) = \dfrac{1}{1+e^{x}} \) and adopt binary crossentropy as the loss function:
\( \Theta = \{\textbf{W}_*,\textbf{U}_*, \textbf{b}_* \, \textbf{F}, \textbf{W}', \textbf{b}', \textbf{V}, \varvec{\omega }_*\}\) denotes all model parameters, \( \lambda \) is the regularization term. \( \Vert \cdot \Vert _F \) is the Frobenius norm. We adopt the Adam optimizer [36] to optimize our model. Meanwhile, the minibatch SGD is applied to speed up the training efficiency. To avoid the overfitting, dropout regularization method is used to the fullconnect layer.
In the recommendation phase, we take u’s latest L interactions \(S^{u,L}\), and the corresponding candidate item v with their latest L interactions \(S^{v,L}\) as input. We obtain the predict score of item v (\( \hat{y}^u_v \)) via calling functions UNet, INet and FM, and then pick the K items with the highest scores as recommendation list. The implementation details of UNet and RTiSR are shown in Algorithm 1 and Algorithm 2. The computation complexity of making recommendation to all user is \( O(U\Gamma (L \times T_{BiLstm} + n_c \times T_{Conv})) \), where the complexity of BiLSTM and Conv operation are \(T_{BiLstm}\) and \(T_{Conv}\).
Experiments
To comprehensively evaluate the performance of RTiSR, we conduct a set of experiments to answer the following research questions:

RQ1: How does RTiSR perform as compared with stateoftheart sequence and review based recommendation models;

RQ2: What is the influence of aspect reviews, time interval aware embedding and convolutional layer in RTiSR;

RQ3: How do the key hyperparameters affect the performance of RTiSR, such as the dimension (\( d_o \)) of user/item latent factors, the height size (h) of convolution filter.
Experimental settings
Dataset: We evaluate RTiSR on five public datasets with different characteristics. The four datasets from the subcategory of Amazon^{Footnote 1} including Musical Instruments (MIs), Automotive (Auto), Luxury Beauty (LB), and Beer, while dataset the Yelp^{Footnote 2} from Yelp Challenge 2019 contains data from Jan. 1, 2019, to Aug. 31st, 2019. Moreover, we filter all datasets that all users and items have at least 20 interactions. The basic statistics of each dataset are described in Table 2.
Baselines: To demonstrate the effectiveness, we compare our proposed RTiSR with the following methods:

BPRMF [37]: It is Matrix Factorizationbased ranking algorithm that optimizes the pairwise ranking loss with implicit feedback. It is a popular baseline for item recommendation.

DeepCoNN [27]: This is a stateoftheart reviewbased recommendation method, which leverages convolution neural network to jointly model the users and items from reviews.

SLRC [38]: It introduces Hawkes Process into Collaborative Filtering (CF), and explicitly addresses two itemspecific temporal dynamics: shortterm effects and lifetime effects.

CFKG [39]: It is a knowledgebased representation learning approach that embeds heterogeneous entities for personalized recommendation, which incorporates the defined useritem knowledgegraph structure to improve the recommendation performance.

CORE [40]: It is a sequential recommendation method with a representation consistency encoder for representing sequence embeddings and item embeddings in the same space. This method is the stateoftheart baseline for sequential recommendation.

SINE [41]: It is a stateoftheart sequential recommendation method that simultaneously considers multiple interests of a user and aggregates them to predict the user’s current intention.

LightSANs [42]: It is a novel transformerbased sequential recommendation, which introduces the lowrank decomposed selfattention, which projects the user’s historical items into a small constant number of latent interests and leverages itemtointerest interaction to generate the contextaware representation.
Evaluation metrics: To evaluate the performance, we adopt two wellknown metrics in TopN recommendation: Hit Ratio (HR) and Normalized Discounted Cumulative Gain (NDCG) [31]. In detail, HR is calculated as:
where a hit is defined as a test item appears in the recommendation list. GT denotes the groundtruth item set.
Compared to the recallbased metric HR, NDCG is a measure of ranking quality, where positions are discounted logarithmically. It accounts for the position of the hit by assigning higher scores to hits at the top ranks:
where \( Z_K \) is the normalization term. \( r_i \) is a binary value, where \( r_i =1 \) if the item is in the test, otherwise \( r_i =0 \). The larger values of HR and NDCG indicate better performance the model has. In the evaluation, we report the average value of both metrics as the final score of each user.
Implementation details: Following [17, 34, 43], we adopt the leaveoneout evaluation method to testing the above mentioned models. For user u, we leverage the latest interaction \( S^{u}_{S^u} \) as the test set, and the penultimate interaction \( S^{u}_{S^u1} \) as the validation set, while the remaining interactions are used for training. Specifically, we randomly sample 100 items that are not interacted with by user u, while ranking the test item among the 100 items. To tradeoff the performance and efficiency of RTiSR, the Adam optimizer is configuration with learning rate tuned in [0.0001, 0.0005, 0.001, 0.005] . The batch size is tuned in [32, 64, 128, 256] . Furthermore, we also tune the dimension of word embedding with the range of [100, 200, 300, 400, 500]. Moreover, the length of succesive order L is form \( \{3, \ldots , 9\} \), and the height h of horizaontial filters is from \( \{1, \ldots , L\} \). For each height h, the number of horizontal filters \( n_c \) is from \( \{4, 8, 12, 16, 20, 24, 28, 32\} \). Note that we tune hyperparameters using the validation set. For a fair comparison, we reused the hyperparameters of all mentioned models as reported. Otherwise, we carefully tune them to ensure that they achieve the best performance. We implement our RTiSR model in Pytorch^{Footnote 3}
All experiments are conducted on a tower server with configuration of Intel(R) Xeon(R) CPU E52680 v3 @2.50 GHz, RAM 128 GB. The operating system is Ubuntu 20.04.3 LTS. Meanwhile, JDK 1.8, Python 3.6, and PyTorch 1.8 are also installed.
Overall performance (RQ1)
We first evaluate the overall recommendation performance of all mentioned models on different datasets. The comparison results with varying the length of recommendation list (K) are shown in Table 3. To better understand these comparison results, the Friedman test [44] with the win/loss counts are also reported. These results are shown in the last two rows of Table 3. From the table, we have the following observations:

RTiSR consistently outperforms most baselines on all datasets, which achieves the best performance and the highest Frank value. Compared to the strongest baselines on HR or NDCG, RTiSR still achieves different degrees of improvement on the five datasets. In detail, RTiSR average outperforms the strongest baselines by 8.4%, 4.7%, 8.1%, 3.7%, 9.0% on dataset MIs, Auto, LB, Beer, Yelp, respectively. It demonstrates the effectiveness of RTiSR, which is attributed to better capturing the unionlevel sequential dependencies with a flexible order. Moreover, the superior performance of RTiSR reflects the rationality of utilizing user reviews and exact time interval information in improving the recommendation performance.

In most conditions, sequential recommendation methods like CORE and TiSASRec SINE perform better than the CFbased methods (i.e., BPRMF, CFKG) and reviewbased methods (i.e., DeppCoNN). The main reason stands for the significant performance gap: although DeppCoNN and CFKG utilize the user review and knowledgegraph to enhance the user and item representation, the capacity of BPRMF, DeppCoNN, and CFKG are limited in modeling user preference without considering the sequential dependencies hidden in the user behaviors.

Our proposed method shows significant improvement \((\textit{p}\hbox {value}\leqslant 0.05)\) on all datasets compared to the SOTA baseline. The reasons for this performance gap are (1) compared with CFbased methods (e.g., CFKG, and SLRC), RTiSR can capture users’ dynamic preferences by modeling users’ sequential patterns in historical interactions. (2) compared with sequential recommendation models (e.g., CORE, SINE, and LightSANs), RTiSR introduces the aspect review information to improve the user and item embedding quality. Based on it, RTiSR regards the embedded sequential reviews with explicit time interval information as an image, then employs multisize convolution fitters to capture the collective sequential dependencies with flexible order, rather than the pointwise way. (3) Compared with the reviewbased recommendation method (i.e., DeepCoNN), RTiSR improves the performance of sequence recommendation by considering time interval information to assign dynamical weights to different reviews in aspectaware review sequence.
Ablation study (RQ2)
RTiSR introduces two essential extensions for the sequential recommendation to capture the user and item representations. In this section, we conduct a set of ablation experiments to analyze different components’ impacts.
Analysis of different components. To analyze the influence of the time interval aware model and convolution operations on RTiSR, we compare the RTiSR with the following variants:

RTiSRNoTC: Both time interval model and convolution filter are not used in this variant.

RTiSRNoC: It only uses the time interval model to enhance the user/item representation.

RTiSRNoT: It only integrates the convolution filter into the RTiSR framework.
The performance of RTiSR and its three variants on five datasets are shown in Fig. 7. We can observe that incorporating a time interval model can improve the recommendation performance significantly (i.e., RTiSRNoC performs better than RTiSRNoTC). However, compared to the variant RTiSRNoC, RTiSRNoT achieves a better performance on all evaluation metrics (i.e., HR@10 and NDCG@10) over five datasets, except for RTiSR. It indicates the necessity of learning the collective dependencies flexibly in the sequential recommendation. Finally, by incorporating these two parts, the complete model RTiSR outperforms its three variations on all evaluation matrices. Due to the limited space, we only show the results on HR and NDCG for top10 recommendations, and other metrics show similar results.
Analysis of different time intervals. In RTiSR, the time interval information is introduced to guide RTiSR for capturing the temporal pattern in user sequential behaviors, which considers the impact from the user and item perspective. Thus, we also evaluate the effect of the aspectaware time interval model on the performance of RTiSR. We introduce two variants:

RTiSRUT: We introduce the time intervalaware model in the UNet, which is equivalent to only considering the temporal pattern of the user.

RTiSRIT: Instead of introducing a time intervalaware model for the UNet, the variant utilizes it on INet to explore the temporal pattern of item.
Figure 8 shows the performance of RTiSR with its three variants on five datasets. We can observe that the recommendation performance has a significant improvement, when inserting the time interval information into UNet or INet. In detail, RTiSRUT performs better than RTiSRIT on HR and NDCG over dataset MIs and Auto, while RTiSRIT shows better performance on dataset LB, Beer, and Yelp. Such a difference in performance improvement is due to the nature of the five datasets. The data sparsity of MIs and Auto is much better than that of the other three datasets, indicating that more interactions can be supported to capture the sequence dependencies. Thus, the complete model RTiSR performs better with both time interval models.
Analysis of different aspect reviews. In RTiSR, the useraspect and itemaspect reviews are used to infer more effective embeddings. To validate the importance of different aspect reviews on the recommendation performance, We introduce three variants:

RTiSRNoUI: Both time userside model and itemside model are not used in this variant, the user/item representations are generated from an MLP, without any aspect review information embedded.

RTiSRUR: It only uses the userside model to generate the user representations, which contain the useraspect review information.

RTiSRIR: It only uses the itemside model to generate the item representations, which contain the itemaspect review information.
Fig. 9 shows the performance of RTiSR with its three variants on five datasets. We can observe that the recommendation performance has a significant improvement, when embedding user/itemaspect information in the representation vector. In detail, RTiSRUT performs better than RTiSRIT, which is attributed to the useraspect reviews containing more dynamic preference information, while the itemaspect review is more about itself. Moreover, we observed that aspect review information improves even more on sparse datasets (e.g., Yelp, Beer, and LB), which shows that when there is a lack of effective interactive data, the reviews can provide the recommender model with finegrained information for refining user and item embeddings for improving the accuracy of recommendation.
Study of RTiSR (RQ3)
As the time interval aware convolutional layer plays a pivotal role in RTiSR, we investigate its impact on the performance. In this section, we start by learning the influence of review embedding size \( d_l \) (i.e., the input of this layer). We then study how the incremental depth size (\(\bigtriangleup h \)) of the convolution filters affects the performance. Furthermore, we analyze the influences of latent factor size (i.e., the output of this layer).
Effect of review embedding size \( d_l \). To investigate whether RTiSR can benefit from userprovided reviews, we vary the size of review embedding. Specifically, we search the review embedding sizes in the range of \( \{60, 80, 100, 120, 140\} \). Figure 10a, b show the experiment results. We have the following observations:

Increasing the size of review embedding substantially improves the recommendation performance on all datasets in terms of HR@10 and NDCG@10. In detail, RTiSR consistently improves dense dataset MIs and Auto over the other three datasets. We attribute the improvement to the effective exploration of user reviews for user and item representations: reviews contain valuable sentiment information about user preference and item features. Thus, RTiSR can significantly enhance the performance of sequential recommendations with user reviews.

When further increasing the size of review embedding with larger than 100, we find that it leads to overfitting over all datasets. This might be caused by applying a larger review embedding size that might introduce noises to the representation learning. Thus, it verifies that setting \( d_{l} = 100 \) is sufficient to represent reviews.
Effect of convolution fitter with varied depth size h. We vary h to explore how much of RTiSR can gain from the collective sequential dependencies with a flexible orders, while keeping other optimal hyper parameters unchanged. We set \( n_c=20 \). \( h = 3 \) denotes the RTiSR employs a set of conventional filter to study the effect of collective sequential dependencies with successive 3 interactions. Figure 10c, d show the experiments result on HR and NDCG respectively. We can find that increasing h substantially improves the recommendation performance on all datasets. On the dense dataset MIs and Auto, RTiSR utilizes the extra information provided by a larger h, and \( h = 3 \) performs the best, suggesting the benefits of learning the sequential dependencies in a collectiveorder. However, RTiSR does not consistently benefit from a larger h. This is reasonable, since it introduces extra information while more noises.
Effect of latent factor number. Finally, we study the effect of user/item representation with different sizes on recommendation performance. Specifically, we vary the size of latent factor in the range of \( \{5, 10,20,30, 40, 50, 60, 70\} \). Figure 10e, f show the experiment results on HR@10 and NDCG@10, respectively. We can observe that increasing the size of the latent factor substantially enhances the recommendation performance on all datasets. Clearly, on the dense MIs and Auto dataset, RTiSR with a larger latent factor size achieves consistent improvement over the other sparser dataset. We attribute the improvement to the effective user and item representation with a large latent factor size. The best performance of RTiSR can be achieved when the latent factor size is 50. Furthermore, on the dense dataset MIs and Auto, RTiSR with a larger latent factor size achieves consistent improvement over the other sparser dataset, contributing to the more extra information brought by the dense datasets.
Conclusion
In this paper, we propose a reviewdriven time interval aware neural network for sequential recommendation, which captures the aspectaware sequence dependencies from exploiting reviews. We notice that a few studies attempt to enhance the performance of sequential recommendation with capturing the temporal dynamical of itemaspect reviews. On this basis, we view the userprovided reviews set as the aspectaware review sequence, then introduce a time interval to assign dynamical weights to different reviews in the aspectaware sequence. We leverage a hybrid neural network with combined of BiLSTM and CNN to exploit the collective sequence dependencies with a flexible order from useraspect and itemaspect review sequences respectively. Based on these methods, it makes RTiSR more suitable to capture the dynamic changes of user preference and item features. Finally, extensive experiments are conducted to test the RTiSR performance. The experimental results show the effectiveness and superiority of RTiSR in terms of HR and NDCG, compared to several SOTA models consistently.
Availability of data and materials
The dataset has no restrictions that all the data can be acquired in the related site.
References
Resnick P, Varian HR. Recommender systems. Commun ACM. 1997;40(3):56–8.
Roy D, Dutta M. A systematic review and research perspective on recommender systems. J Big Data. 2022;9(1):1–36.
Linden G, Smith B, York J. Amazon.com recommendations: itemtoitem collaborative filtering. IEEE Internet Comput. 2003;7(1):76–80.
Widiyaningtyas T, Hidayah I, Adji TB. User profile correlationbased similarity (UPCSIM) algorithm in movie recommendation system. J Big Data. 2021;8(1):1–21.
Hidasi B, Tikk D. General factorization framework for contextaware recommendations. Data Min Knowl Disc. 2016;30(2):342–71.
Chen X, Xu H, Zhang Y, Tang J, Cao Y, Qin Z, Zha H. Sequential recommendation with user memory networks. In: Proceedings of the 11th ACM international conference on web search and data mining; 2018. p. 108–16.
Wang D, Zhang X, Xiang Z, Yu D, Xu G, Deng S. Sequential recommendation based on multivariate Hawkes process embedding with attention. IEEE Trans Cybern. 2021;2021:1.
Feng S, Li X, Zeng Y, Cong G, Chee YM, Yuan Q. Personalized ranking metric embedding for next new poi recommendation. In: 24th international joint conference on artificial intelligence; 2015.
Rendle S, Freudenthaler C, SchmidtThieme L. Factorizing personalized Markov chains for nextbasket recommendation. In: Proceedings of the 19th international conference on world wide web; 2010. p. 811–20.
Wu CY, Ahmed A, Beutel A, Smola AJ, Jing H. Recurrent recommender networks. In: Proceedings of the 10th ACM international conference on web search and data mining; 2017. p. 495–503.
Ying H, Zhuang F, Zhang F, Liu Y, Wu J. Sequential recommender system based on hierarchical attention networks. In: 27th international joint conference on artificial intelligence IJCAI18; 2018.
Li J, Wang Y, Mcauley J. Time interval aware selfattention for sequential recommendation. In: WSDM’20: the 13th ACM international conference on web search and data mining; 2020.
Garcin F, Dimitrakakis C, Faltings B. Personalized news recommendation with context trees. In: Proceedings of the 7th ACM conference on recommender systems; 2013. p. 105–12.
He R, McAuley J. Fusing similarity models with Markov chains for sparse sequential recommendation. In: 2016 IEEE 16th international conference on data mining (ICDM). New York: IEEE; 2016. p. 191–200.
Hidasi B, Karatzoglou A, Baltrunas L, Tikk D. Sessionbased recommendations with recurrent neural networks. Preprint; 2015. arXiv:1511.06939.
Quadrana M, Karatzoglou A, Hidasi B, Cremonesi P. Personalizing sessionbased recommendations with hierarchical recurrent neural networks. In: Proceedings of the 11th ACM conference on recommender systems; 2017. p. 130–7.
Kang WC, McAuley J. Selfattentive sequential recommendation. In: 2018 IEEE international conference on data mining (ICDM). New York. IEEE; 2018. p. 197–206.
Yuan W, Wang H, Yu X, Liu N, Li Z. Attentionbased contextaware sequential recommendation model. Inf Sci. 2020;510:122–34.
Tang J, Wang K. Personalized top\(n\) sequential recommendation via convolutional sequence embedding; 2018. p. 565–73. https://doi.org/10.1145/3159652.3159656.
Li C, Niu X, Luo X, Chen Z, Quan C. A reviewdriven neural model for sequential recommendation. In: Proceedings of the 28th international joint conference on artificial intelligence; 2019. p. 2866–72.
Liu Y, Zhang Y, Zhang X. An endtoend reviewbased aspectlevel neural model for sequential recommendation. Discrete Dyn Nat Soc. 2021;2021:1.
Gao J, Lin Y, Wang Y, Wang X, Yang Z, He Y, Chu X. Setsequencegraph: a multiview approach towards exploiting reviews for recommendation. In: Proceedings of the 29th ACM international conference on information and knowledge management; 2020. p. 395–404.
Dong X, Ni J, Cheng W, Chen Z, Zong B, Song D, Liu Y, Chen H, De Melo G. Asymmetrical hierarchical networks with attentive interactions for interpretable reviewbased recommendation. In: Proceedings of the AAAI conference on artificial intelligence, vol. 34; 2020. p. 7667–74.
McAuley J, Leskovec J. Hidden factors and hidden topics: understanding rating dimensions with review text. In: Proceedings of the 7th ACM conference on recommender systems; 2013. p. 165–72.
Bao Y, Fang H, Zhang J. Topicmf: simultaneously exploiting ratings and reviews for recommendation. In: Proceedings of the AAAI conference on artificial intelligence. vol. 28; 2014.
Tan Y, Zhang M, Liu Y, Ma S. Ratingboosted latent topics: understanding users and items with ratings and reviews. In: IJCAI, vol. 16; 2016. p. 2640–6.
Zheng L, Noroozi V, Yu PS. Joint deep modeling of users and items using reviews for recommendation. In: Proceedings of the 10th ACM international conference on web search and data mining; 2017. p. 425–34.
Catherine R, Cohen W. Transnets: learning to transform for recommendation. In: Proceedings of the 11th ACM conference on recommender systems; 2017. p. 288–96.
Tay Y, Luu AT, Hui SC. Multipointer coattention networks for recommendation. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery and data mining; 2018. p. 2309–18.
Wu L, Quan C, Li C, Wang Q, Zheng B, Luo X. A contextaware useritem representation learning for item recommendation. ACM Trans Inf Syst (TOIS). 2019;37(2):1–29.
He X, Chen T, Kan MY, Chen X. Trirank: reviewaware explainable recommendation by modeling aspects. In: Proceedings of the 24th ACM international on conference on information and knowledge management; 2015. p. 1661–70.
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J. Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems; 2013. p. 3111–9.
Pennington J, Socher R, Manning CD. Glove: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP); 2014. p. 1532–43.
He X, Liao L, Zhang H, Nie L, Hu X, Chua TS. Neural collaborative filtering. In: Proceedings of the 26th international conference on world wide web; 2017. p. 173–82.
Rendle S. Factorization machines. In: 2010 IEEE international conference on data mining. New York: IEEE; 2010. p. 995–1000.
Kingma DP, Ba J. Adam: a method for stochastic optimization. In: ICLR (Poster); 2015.
Rendle S, Freudenthaler C, Gantner Z, SchmidtThieme L. Bpr: Bayesian personalized ranking from implicit feedback. In: Proceedings of the 25th conference on uncertainty in artificial intelligence; 2009. p. 452–61.
Wang C, Zhang M, Ma W, Liu Y, Ma S. Modeling itemspecific temporal dynamics of repeat consumption for recommender systems. In: The world wide web conference; 2019.
Zhang Y, Ai Q, Chen X, Wang P. Learning over knowledgebase embeddings for recommendation. Preprint; 2018. arXiv:1803.06540.
Hou Y, Hu B, Zhang Z, Zhao WX. Core: simple and effective sessionbased recommendation within consistent representation space. In: Proceedings of the 45th international ACM SIGIR conference on research and development in information retrieval; 2022. p. 1796–801.
Tan Q, Zhang J, Yao J, Liu N, Zhou J, Yang H, Hu X. Sparseinterest network for sequential recommendation. In: Proceedings of the 14th ACM international conference on web search and data mining; 2021. p. 598–606.
Fan X, Liu Z, Lian J, Zhao WX, Xie X, Wen JR. Lighter and better: lowrank decomposed selfattention networks for nextitem recommendation. In: Proceedings of the 44th international ACM SIGIR conference on research and development in information retrieval; 2021. p. 1733–7.
Bayer I, He X, Kanagal B, Rendle S. A generic coordinate descent framework for learning from implicit feedback. In: Proceedings of the 26th international conference on world wide web; 2017. p. 1341–50.
Demšar J. Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res. 2006;7(Jan.):1–30.
Acknowledgements
This work is supported in part by the National Natural Science Foundation of China under Grants 62072429, in part by the Chinese Academy of Sciences “Light of West China” Program, and in part by the Key Cooperation Project of Chongqing Municipal Education Commission (HZ2021008, HZ2021017), and the "Fertilizer Robot" project of Chongqing Committee on Agriculture and Rural Affairs.
Author information
Authors and Affiliations
Contributions
All authors contributed to developing the ideas and writing and reviewing this manuscript. All authors read and approved the submitted manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Shi, X., Liu, Q., Bai, Y. et al. RTiSR: a reviewdriven time intervalaware sequential recommendation method. J Big Data 10, 32 (2023). https://doi.org/10.1186/s40537023007076
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s40537023007076
Keywords
 Recommender system
 Sequential recommendation
 Reviewdriven
 Deep learning
 Timeaware model