Analysing the impact of contextual segments on the overall rating in multi-criteria recommender systems

Krishna, Chinta Venkata Murali; Rao, G. Appa; Anuradha, S.

doi:10.1186/s40537-023-00690-y

Research
Open access
Published: 05 February 2023

Analysing the impact of contextual segments on the overall rating in multi-criteria recommender systems

Chinta Venkata Murali Krishna¹,
G. Appa Rao¹ &
S. Anuradha¹

Journal of Big Data volume 10, Article number: 16 (2023) Cite this article

2290 Accesses
4 Citations
1 Altmetric
Metrics details

Abstract

Depending on the RMSE and sites sharing travel details, enormous reviews have been posted day by day. In order to recognize potential target customers in a quick and effective manner, hotels are necessary to establish a customer recommender system. The data adopted in this study was rendered by the Trip Advisor which permits the customers to rate the hotel on the basis of six criteria such as, Service, Sleep Quality, Value, Location, Cleanliness and Room. This study suggest the multi-criteria recommender system to analyse the impact of contextual segments on the overall rating based on trip type and hotel classes. In this research we have introduced item-item collaborative filtering approach. Here, the adjusted cosine similarity measure is applied to identify the missing value for context in the dataset. For the selection of significant contexts the backward elimination with multi regression algorithm is introduced. The multi-collinearity among predictors is examined on the basis of Variance Inflation Factor (V.I.F). In the experimental scenario, the results are rendered based on hotel class and trip type. The performance of the multiregression model is evaluated by the statistical measures such as R-square, MAE, MSE and RMSE. Along with this, the ANOVA study is conducted for different hotel classes and trip types under 2, 3, 4 and 5 star hotel classes.

Introduction

The tourism industry plays a major role for the growth of country’s economy. In order to scatter the tourism information the internet plays a major role in most of the countries. Currently everyone wishes to energize themselves in the vacation by visiting the locations all around the globe in the categories of middle and upper sections of users [1]. Once in a year the users plan their vacations due to an increase in socioeconomic factors. To fulfil their aspirations online travel platform is one of the great opportunity. To resolve the information overload issue, the recommender system was introduced to help the users by analysing the user preference information [2]. Based on recommendation approach the recommender systems can be categorised in to content-based, collaborative filtering (knowledge-based), and hybrid [3, 4]. The content based recommendation only believe on users past preferences to construct their profile and select suggested items. To identify the candidate items, the collaborative filtering approaches examine the behaviours of similar users [5].

Recommender systems are extensively utilized in most of the multimedia RMSE in order to improve personalization capabilities by focusing media products to the corresponding customers [6, 7]. Due to excess recommender systems, many customers receive non-detailed, non-personalized recommendation services such as old spam emails. Hence from the hotel’s opinion, it is essential to precisely recognize and increase the customer visit. When considering the customer’s opinion, they require recommendations only from the suitable hotels instead of gaining promotions from various hotels [8, 9]. Hence, the promotion of hotel can be done efficiently through personalized recommendation with the available customers at the hotel. The order rate of the customer, credibility and recognition of the hotel can also be maximized. The shopping cites inspire users to write review text for the purchased products. The reviews given by the previous reviewers are a useful understanding of the users and it enhances the recommendation ability of a website [10, 11].

Collaborative filtering (CF) is one of the most frequently used method in various fields to recommend items. In CF approach the recommendation had been done based on users and items [13]. The user to user or item to item similarity can be evaluated based on ratings. Most frequently CF technique utilizes single-criterion rating for recommendation however due to some limitations the users failed to grasp accurate recommendation results. Therefore to overcome this issues and to improve the recommendation accuracy multi-criteria recommender systems (MCRSs) have been progressed [14, 15]. The user opinion or preference regarding an item can be recommended by recommendation systems. In tourism industry, the most important and widespread online activity is the information searched by the travellers. Many tourism studies specify that most of the users undergo trip planning on the basis of information present in the online tourist RMSE. The user information plays a vital role in online travellers’ decision making. Most of the travellers plan their trip based on the reviews generated by the previous users. Regarding the accommodation experience in hotels, the users can expose their views and ratings through one of the leading travel opinion platform called Trip Advisor.

Hence, CF based item-item filtering along with the multiple regression backward elimination based multi recommendation system is introduced in this research. The goal behind the development of the recommendation system is to provide the more accurate recommendation based on the user preferences. Several approaches were devised for recommending the user preference based on the multiple criteria; still the computational overhead and inaccurate recommendation prevails. Thus, for obtaining the accurate recommendation by considering the multi-criteria based on CF is proposed. The major contributions of the research are:

Proposed Filtering technique: The pre-processing of the input data is employed by two various factors like missing values imputation based on adjusted cosine similarity and filtering. Here, item-item collaborative filtering (CF) is proposed for filtering the significant context based on the user preference, which helps to enhance the prediction rating.
Proposed Multiple Regression Backward Elimination: The multi-recommendation is employed based on the multiple regression criteria, in which the backward elimination is employed for the elimination of the inappropriate features based on the significant level. Here, the inappropriate feature elimination helps to minimize the computation overhead and enhance the accuracy of recommendation.
Analysis: The recommendation of the hotel classes and trip type based on the significant context is analyzed based on R-square, MAE, MSE and RMSE to depict the superiority of the introduced recommendation system.

The highlights of proposed approach are:

Multi-criteria recommendation system based on contextual segments.
Proposed Item-item collaborative filtering for filtering the significant context based on the user preference.
The multi-recommendation is introduced based on the multiple regression criteria, in which the backward elimination is employed for the elimination of the inappropriate features based on the significant level.
The analysis based on R-square, MAE, MSE and RMSE evaluation measures to depict the superiority of the introduced recommendation system.

The organization of the paper is described as follows: “Related work” Section describes the related work. “Proposed Methodology” Section describes the proposed methodology. Finally the results and conclusion parts are described in “Results and discussions” Section and “Conclusion” Section.

Related work

The review of conventional methods of multi criteria recommendation systems are: Hong and Jung [16] had proposed multi-criteria tensor model for tourism recommender systems. Several tourism recommender systems have been proposed and those systems reflect the multi-criteria ratings and the cultural differences. Higher Order Singular.

Value Decomposition (HOSVD) was utilised to predict missing values of the model. The author in [16] had developed two single tensor models and the tensor model is illustrated with four dimensions such as, user, items, multi-criteria rating (food, service, price and overall) and cultural groups. The integer value or the rating score ranges from 1 to 5 and it denotes the most positive and negative reviews. In addition to this, tensor factorization had introduced to predict the unobserved users’ preferences for restaurants. In the experimental section the performance measure of root mean square error (RMSE) and mean absolute error (MAE) are evaluated.

Nilashi et al. [17] had proposed multi criteria collaborative filtering approach for eco-friendly hotels recommendation. In this research the author had developed soft computing model by the integration of machine learning model to identify best matching eco-friendly hotels with the aid of several quality factors in TripAdvisor. Here, both the dimensionality reduction and prediction had been done to progress the scalability of the model. For dimensionality reduction, theHOSVD model was introduced. In addition to this the clustering of data in Trip Advisor dataset had been executed by Self-Organizing Map (SOM). The next stage is feature selection and it is an essential stage. This stage has been executed by decision trees technique. Adaptive Neuro-Fuzzy Inference Systems (ANFIS) model was intended for an accurate prediction.In the experimental section, the predictive model was measured by two statistical performance such as, RMSE and adjusted coefficient of determination. Along with this the recommendation quality had been proved by evaluating the precision, recall and f-measure.

Quasi Shambour [18] had developed deep learning approach for multi criteria recommendation system. Compared to single criterion recommender system, multi criteria recommended system attains more accurate outcome. Deep auto encoder model was introduced for multi-criteria recommender systems. The author built auto encoder based multi-criteria recommendation algorithm (AEMC) in which it employs deep feed forward neural network. In the experimental section the TripAdvisor multi-criteria datasets is used and compared with existing methods to prove the efficacy of the deep learning recommendation model. The prediction performance is evaluated by MAE and RMSE statistical metrics and also it yield an outcome of 0.64 and 0.72 consecutively.

Nassar [19] had proposed hybrid deep multi criteria model for recommender system and the deep learning models are the achievable remark in many fields. In this research the model was insisted with two major stages. In the first stage, the prediction had been done based on user ID and item ID. In the next stage, the prediction had been done based on deep neural network model. Five-fold cross validation test was conducted and the performance metrics of MAE, recall, precision and f-measure are evaluated.

Sagar et al. [20] had proposed collaborative and regression model for travel recommender system based on social media reviews due to COVID-19 pandemic. To examine the final score and find out the guest type, and also to replace the missing values, the collaborative filtering approach is introduced. In this research, the author had analysed only the Asian continent user opinion. Krishna et al. [21] had analyse the context with high significant of user with regression model. Here, the author had collected reviews from most popular tourist location all around the globe i.e. Singapore city from different star hotels. In the experimental scenario the statistical tests, co-relation between the users and ANOVA test are conducted.

Zhuang and Kim [22] had proposed Bidirectional Encoder Representations from Transformers (BERT) based multi criteria recommendation model for hotel promotion management. This study introduced BERT recommendation model to predict six criteria ratings. The proposed model is insisted with three stages namely data collection, BERT fine tuning and multi-criteria recommendation. In the experimental scenario the evaluation metrics of Hit Ratio (HR) and Normalized Discounted Cumulative Gain (NDCG) are evaluated. Singh et al. [23] had proposed item based collaborative filtering technique for enhancing recommendation using Bhattacharyya coefficients. Here, the data processing is emphasised by similarity metrics. In the experimental scenario the performance measure of RMSE and MAE are evaluated.

Samad [1] had introduced supervised and unsupervised machine learning model for analysing the customers’ online reviews. Author had introduced fuzzy rule machine learning model and clustering approach for recommendation. The intrusion of clustering technique will boost up the scalability and accuracy of the recommendation system. Self-Organizing Map (SOM) approach was introduced to cluster data in the Trip Advisor. In the experimental section the performance measure of MAE, precision and f measure are evaluated. From the surveys taken there is a research gap when the user’s preferences and priorities are insufficient. So in this research the multiple regression backward elimination with item-item collaborative filtering approach is introduced to identify the significant context.

Designing the personalized recommendation method which is useful for the users of IoT services was mandatory, such techniques needs to enhance the user experience. For such, an algorithm which combines the trusted relevance with matrix factorization was introduced by Li, W., et al. in [24]. An effective trust model that carefully integrates the social information of each user into recommendation algorithm for the recommendation based on user preferences. For that, initially the trust relationship in direct or indirect manner was considered to introduce the concentric hierarchical architecture related to social network. Then, the matrix factorization based recommendation algorithm was introduced which integrates most trust information within it. Finally, the trust and similar interest factors were comprehensively considered for developing trust relevance. This architecture achieved better prediction accuracy.

Two different aspects were considered as a major challenging issue for traditional recommendation algorithms they were, achieving high QoS parameters during recommendation and managing historical QoS data To overcome such issues, the LSH (Locality-Sensitive Hashing) and the location information of user/service were considered by Lin, W., et al. in [25]. These information were used for the location-aware recommendation framework which also enhances the privacy of the data. WS-DREAM dataset was used to prove the efficiency of this architecture.

Thus, the review of the prior recommendation systems has faced the challenges like:

The failure in considering the significant attribute limits the performance of the model, which provides the inaccurate recommendation [16].
The recommendation of the better hotels with eco-friendly characteristics devised by [17] fails to reduce the dimension of the data that elevates the computation overhead. The utilization of the significant attribute selection has the capability of reducing the computation complexity.
The deep learning based approach introduced by [18] has the capability of getting the inaccurate outcome due to the minimal data utilized for training the model that limits the generalization capability.
Content based recommendation system devised by [21] recommends the trip type based on the user preference; still, the scalability of the method is challenging due to the requirement of information updation for the new preference.

Thus, the challenges like inaccurate recommendation along with the enhanced computation overhead limit the performance of the traditional methods. The utilization of the CF filtering along with the multiple regression backward elimination enhances the recommendation accuracy with minimal computation complexity.

Proposed methodology

Enormous benefits are achieved in the human society on account of digital technology and social media. The traveller platform is rendered by the Trip Advisor who proceeds user generated content to share the opinions with respect to different aspects of hotel. In tourism domain, recommendation agents perform a significant role for hotel recommendations. In this study the work is processed under three stages namely data extraction, data pre-processing and rating prediction. The first stage is data extraction and in this stage the data are extracted from the Trip Advisor. Here, the data pre-processing is executed by item-item collaborative filtering approach. In addition to this the similarity is measured by the adjusted cosine similarity metrics. The final stage is prediction. In this stage based on multi regression backward elimination approach is introduced to analyse the impact on contextual segments. Here, the backward elimination is introduced to discard the irrelevant context and based on remaining context the prediction takes place. Finally the performance measure of MSE, RMSE and MAE and also the ANOVA test are conducted and evaluated. The global architecture of the proposed model is shown in (Fig. 1).

Data pre-processing

The essential aspect of most recommendation systems is that each recommendation influences what is learned about the users and items, which decides the promising accuracy of future recommendations. The recommendation system insists on two approaches, namely content and collaborative filtering. Collaborative filtering (CF) is based on observed user preferences. Usually, the random value is contemplated to identify the nearby neighbor from the item-item similarity matrix. However, the deliberation of random value is not a rational approach since different items may have different values. Another challenging issue in collaborative filtering is sparsity in the dataset.

So in this research, instead of using random value, the adjusted cosine similarity measure is applied in the item-item CF approach. If computing the similarity between two items, initially, the users who have rated for both the items are isolated. After pinpointing the users, the similarity measure is applied.

There are several measures to calculate the similarity between two items, and in this research, the adjusted cosine similarity measure is utilized. The primitive difference between user and item-based CF is, in user-based CF, the similarity is measured based on matrix rows, and in item-based CF, the similarity is measured based on matrix columns. Similarity computation using traditional cosine based approach for item-based case has shown major demerit, i.e. it fails to account the rating scale exist between various users. This demerit is overcome by adjusted cosine similarity by subtracting the particular user average from the several co-rated pairs. The similarity between the two items ($n$ and $m$) in the item-item CF approach is calculated based on the below equation. The similarity measure between item $n$ and item $m$ are found to be maximum.

$$Sim\left( {n,m} \right) = \frac{{\sum\nolimits_{u \in U} {\left( {R_{u,n} - R_{u} } \right)\left( {R_{u,m} - R_{u} } \right)} }}{{\sqrt[2]{{\sum\nolimits_{u \in U} {\left( {R_{u,n} - R_{u} } \right)^{2} } }}\sqrt[2]{{\sum\nolimits_{u \in U} {\left( {R_{u,m} - R_{u} } \right)^{2} } }}}}$$

(1)

Here, $R_{u}$ represents the average of user’s ratings.

Usually, the similarity can be computed in several ways, such as user ratings, product descriptions, and co-occurrence of the items of the product purchased in the past. C1, C2, C3, C4, C5, and C6 are the contexts named Cleanliness, Location, Value, Rooms, Service, and Sleep-Quality. The general item-item collaborative filtering is shown in Algorithm 1.

Algorithm 1 in this research intended a context-context collaborative filtering algorithm to find similar contexts. Here, similar contexts can be computed via adjusted cosine similarity measure. After applying this similarity measure, address the difference in rating scale between different users. The proposed context-context collaborative filtering algorithm is shown in Algorithm 2.

Regression-based multi-criteria recommendation

Regression techniques are easy for processing as it is designed using the basic statistical principles. It takes less time to achieve best result. Moreover, the error attained during processing is also less. Due to this merits, the proposed architecture has introduced regression techniques for recommendation. Finally, the evaluated results also indicate that the proposed regression model has attained better performance than other regression models.

Multiple Regression Backward Elimination (MRBE) algorithm is introduced to identify the significant contexts, which have a high impact on overall rating. Backward elimination is a feature selection model that first eliminates the least important variables and leaves only the major essential variables in the model. In the regression model, all the variables are initially tested with a significance level of 0.05. If the p-value of the feature or context is greater than the significance value, $p > 0.05$, the elimination takes place. The same process gets repeated until all features become significant, $p < 0.05$. Finally, a set of features are defined, and this method increases the training time, diminish the complexity and improves performance.[12]

Multiple regression is a statistical technique used to explore the relationship between two or more variables. The multiple regression model with independent variables $\left( p \right)$ and size $\left( n \right)$ is represented by matrix notation, and it is given below:

$$y = \beta_{0} + \beta_{1} X_{1} + ... + \beta_{w} X_{w} + \xi$$

(2)

$$y = X\beta + \xi$$

(3)

Here,$y$ describes the dependent variable, $X$ describes the combination of $n \times p$ design matrix of independent variables. $\xi$ Symbolizes the residual terms or error vector of the regression model with the identity matrix $I$, $w$ signifies the number of observations or features and $\beta_{i}$ describes the regression coefficient or parameter of the model.

$$Y = \left[ {\begin{array}{*{20}c} {y_{1} } \\ {y_{2} } \\ \vdots \\ {y_{n} } \\ \end{array} } \right]\,\,\,\,\,X = \left[ {\begin{array}{*{20}c} 1 & {x_{11} } & \cdots & {x_{1p} } \\ 1 & {x_{12} } & \cdots & {x_{2p} } \\ \vdots & \vdots & \ddots & \vdots \\ 1 & {x_{n1} } & \cdots & {x_{np} } \\ \end{array} } \right]\,\,\,\,\,\beta = \left[ {\begin{array}{*{20}c} {\beta_{0} } \\ {\beta_{1} } \\ \vdots \\ {\beta_{{}} } \\ \end{array} } \right]\,\,\,\,\,\xi = \left[ {\begin{array}{*{20}c} {\xi_{1} } \\ {\xi_{2} } \\ \vdots \\ {\xi_{n} } \\ \end{array} } \right]$$

(4)

The cost function is emphasized to train the multiple regression model expressed in Eq. (1) to minimize the difference between the observed or true values and the fitted or predicted values. Root mean square error is the cost function used in this study, and it is expressed below:

$$RMSE = \sqrt {\frac{1}{n}\sum\limits_{j = 1}^{n} {\left( {y_{j} - \hat{y}_{j} } \right)^{2} } }$$

(5)

Here, $n$ defines the number of data points, $y_{j}$ and $\hat{y}_{j}$ describes the true and predicted values. The utilization of $R^{2}$ value or coefficient determines how well the predictor values fit the model and is expressed by Eq. (6).

$$R^{2} = 1 - \frac{{SS_{regression} }}{{SS_{total} }}$$

(6)

Here, $SS_{regression}$ and $SS_{total}$ sum of squares of the regression and a total number of squares. In linear regression, the use of $R^{2}$ is perfectly acceptable. When it comes to multiple regression, there will be slight variations in the formula due to the addition of the number of independent variables. Therefore, the $R^{2}$ value makes a significant difference in the multiple regression by added variables. The mathematical expression is defined below:

$$Adjusted\,R^{2} = 1 - \left( {1 - R^{2} } \right)\left( {\frac{n - 1}{{n - \left( {w + 1} \right)}}} \right)$$

(7)

$w$, signifies the number of predictors in the regression equation. In addition, the ordinary least squares (OLS) regression model is constructed to diminish the residual sum of squares, and the mathematical expression is given below:

$$RSS\left( \beta \right) = \left( {y - X\beta } \right)^{T} \left( {y - X\beta } \right)$$

(8)

Identifying the multi-collinearity between the independent variable is essential to identify the significant context. Here, the overall rating is selected as the dependent variable and the cleanliness, location, value, rooms, service, sleep quality are selected as the independent variables. In which the multi-collinearity can be checked among the independent variables to variance impact factor (VIF), and the mathematical expression is defined below:

$$VIF\left( {\beta_{i} } \right) = \frac{1}{{1 - R_{i}^{2} }}$$

(9)

Here, $R_{i}^{2}$ specifies the squared coefficient of the regression model. If the independent variables are uncorrelated, then $R_{i}^{2} = 0$ and in case of exact collinearity $R_{i}^{2} = 1$. Therefore, $VIF\left( {\beta_{i} } \right)$ tends to be one and infinity. The algorithm for multiple regression backward elimination is shown in Algorithm 3. Here, the backward algorithm is applied to identify the significant contexts for each hotel class and trip type. Also, this research mainly focused on the best independent variables to forecast the Predicted overall rating. The major steps of the backward elimination are shown below:

Results and discussions

To analyse the effectiveness of the proposed model, several experiments were conducted on Trip Advisor datasets provided by the Trip Advisor website (www.tripadvisor.com). Trip Advisor is the most important world’s largest successful social network. On this website, the users can prefer a hotel due to some aspects such as cleanliness, location, value, rooms, service and sleep quality. The dataset was extracted from Trip Advisor through web scrapping using beautiful Soup. 93 tourism cities across the world from different continents such as Asia, Europe, Australia, Africa, North America, and South America are chosen according to the tourism rankings given by Master Card and Visa. Totally, 60,215 records were collected from 2500 hotels across 93 cities. In this research, the analysis is carried out in two manners. The data set is classified into two categories: hotel class (two, three, four and five stars) and trip type (Business, Family, Friends, Couple, Solo, N.A (Not mention any trip-type)). Therefore, the results are analysed accordingly. For each Hotel class and Trip-Type significant contexts were identified upon overall rating on the continental, county, and city wise to identify the user opinions on hotel stays.

Analysis of performance measures based on different hotel classes and trip types

To evaluate the performance of the multi regression model, the statistical accuracy metrics are evaluated. The metrics used for evaluation are R-square, mean absolute error (MAE), mean square error (MSE) and root mean square error (RMSE). The measure of error between paired observations is termed MAE. RMSE is a standard way to compute the error of a model. The metrics MSE measures the average of the square of error. The mathematical expression of MAE, MSE and RMSE is defined below:

$$MAE = \frac{1}{N}\sum\limits_{u,i}^{N} {\left| {p_{u,i} - r_{u,i} } \right|}$$

(10)

$$RMSE = \sqrt {\frac{1}{N}\sum\limits_{u,i}^{N} {\left( {p_{u,i} - r_{u,i} } \right)^{2} } }$$

(11)

$$MSE = \sum\limits_{u,i}^{N} {\left( {p_{u,i} - r_{u,i} } \right)^{2} }$$

(12)

Here, $N$ specifies the total number of ratings, $p_{u,i}$ describes the predicted rating of user $u$ given on item $i$ and $r_{u,i}$ resembles the actual rating. Therefore, the statistical metrics are analysed for multi regression models such as decision tree regression (DTR), linear regression (LR), random forest regression (RF) and support vector regression (SVR) are analysed for different hotels classes and trip types. (Table 1) shows the overall accuracy of the initial and predicted results based on different hotel classes, namely, 5, 4, 3 and 2 stars consecutively. Tables (2, 3, 4, 5) shows the accuracy analysis of regression models based on trip types (business, couple, friend, family and solo) under hotel classes. The baseline details about the regression models are discussed below:

Table 1 Hyperparameters of different methods

Full size table

Table 2 Performance analysis for different hotel classes (Initial and predicted results)

Full size table

Table 3 Performance analysis on 5 star hotel class based on trip type (Initial and predicted results)

Full size table

Table 4 Accuracy analysis on 4 star hotel class based on trip type (Initial and predicted results)

Full size table

Table 5 Accuracy analysis on 3 star hotel class based on trip type (Initial and predicted results)

Full size table

DTR

DTR [27] is a regression model which obtains the predicted output by mapping the input with the attributes. The interior node present in tree is represented for attributes and an arc is formed between the parent and child those who having the possible values related to that attributes. Initially, the tree construction begins with input set and root node. For each root, an attribute is assigned and then a set of values are assigned for each arcs and sub-nodes. Then, the values of each input set are divided, therefore the child node receives only a specific portion of input set that matches with the attribute value (value specified by each arc corresponding to child node). Till reaching last split, the process recursively happens for each child node.

LR

LR [28] is a statistical process which is used to determine the relationship between the independent variable X and dependent variable Y. For simple linear regression, the independent variable is fixed as 1 and more than one independent variable is represented for multi-linear regression. Linear regression is a process which aggregate the similarities and determine the overall ratings based on the weights of each criterion.

RF

RF [29] is an efficient method which works effectively for huge datasets. It can effectively performs the recommendation with available data without causing any deterioration in system performance. DT based integrated individual learners are included in RF. A subset of random training data are used for tree generation. After training each forest, the test rows are introduced to each forest. An output class is generated from each tree and the mode related to each classes are taken as output from the RF.

SVR

SVR [30] will provide maximum fit points in the hyperplane, hence the regression line value could be obtained very accurately, and meanwhile it provide only discrete values cannot be used for continuous prediction problems. By using SVR the error between real and predicted data will be very low by having the capability of fitting within the subjected threshold value. The model have the ability to handle large scale dataset with faster response; this is achieved by considering kernel.

The hyperparameters of DTR, LR, RF and SVR are shown in Table 1.

The predicted and initial result obtained by different regression approach for different hotel classes are shown in Table 2. The achieved error value predicts the performance of proposed regression model over the other existing techniques. The predicted result indicate that the proposed architecture has shown better performance on recommendation. The proposed architecture learns the similarity between each items and performed recommendation with high accuracy. The different hotel classes are taken into consideration to show that the proposed architecture is feasible for all kinds of hotels. Moreover, this approach is developed with low cost and low architecture design. Therefore, it can be used by all classes of hotels. The predicted RMS value achieved by 5 – star hotel is found to be, 0.987 (DTR), 1.0 (LR), 0.996 (RF), and 0.992 (SVR) respectively.

The predicted and initial results achieved for 5 star hotel class based on trip type is shown in Table 3. The trip type that are considered in this proposed analysis are Business, couple, family, friend, solo, and Na. Based on these types, the performance metrics like RMS, MSE, MAE, and R-square are evaluated. The evaluated results for 2, 3, 4, and 5 star hotels are shown in Tables 3, 4, 5, and 6. The RMS value achieved by proposed approach for business, couple, family, friends, and Na are found to be 1.09241E−15, 1.01173E−15, 9.93639E−6, 6.03351E−16, and 8.55791E−16 respectively. The obtained values are found better than other existing techniques.

Table 6 Accuracy analysis on 2 star hotel class based on trip type (Initial and predicted results)

Full size table

The predicted and initial results achieved for 4 star hotel class based on trip type is shown in Table 4. The performance analysis is performed between the proposed and existing regression models. The models that are taken for comparison are DTR, RF, and SVR. While comparing with other techniques, the proposed approach has shown better performance than other methods. The R-square value achieved by proposed and existing algorithms in solo trip type is found to be 0.971336701 (DTR), 1.0 (LR), 0.99026373 (RF), and 0.985296617 (SVR). However, the predicted RMS performance achieved by LR for Business, couple, family, friends, and solo are found to be 1.80781E−15, 1.37259E−15, 1.91758E−15, 2.29432E−15, and 1.47653E−15 respectively.

The predicted and initial results achieved for 3 star hotel class based on trip type is shown in Table 5. The trip type that are considered in this proposed analysis are Business, couple, family, friend, solo, and Na. Based on these types, the performance metrics like RMS, MSE, MAE, and R-square are evaluated. The predicted RMS value achieved by proposed approach for business, couple, family, friends, Na, and solo are found to be 9.81162E−16, 1.24224E−15, 1.0953E−15, 1.24075E−15, 1.15922E−15, and 1.19088E−15, respectively. The obtained values are found better than other existing techniques.

The predicted and initial results achieved for 2 star hotel class based on trip type is shown in Table 6. The performance analysis is performed between the proposed and existing regression models. The models that are taken for comparison are DTR, RF, and SVR. While comparing with other techniques, the proposed approach has shown better performance than other methods. The R-square value achieved by proposed and existing algorithms in solo trip type is found to be 0.987618877 (DTR), 1.0 (LR), 0.992388949 (RF), and 0.896803043 (SVR). However, the predicted RMS performance achieved by LR for Business, couple, family, friends, Na, and solo are found to be 7.58532E−16, 1.10572E−15, 1.31364E−15, 1.19914E−15, 6.41408E−16, and 1.08903E−15 respectively. The achieved proposed values are found efficient than existing methods.

Overall rating of different hotel classes on continental, county, and city

For each hotel class and trip type, thesignificant contexts were identified upon overall rating on the continental, country, and city to identify the user opinions on hotel stays. Based on this, the overall ratings on the continent (Asia, Australia, Africa, Europe, North America, South America) and country (India, Singapore, Thailand, Germany, US and Brazil), under 5, 4, 3 and 2-star hotels are shown in Table 7 and Fig. 2.

Table 7 Final overall results for each hotel class upon country wise

Full size table

The overall rating achieved by 2, 3, 4, and 5 star hotels of different continents is analysed and the result is shown in Fig. 2. In this method, the Asia, Australia, Africa, Europe, North America, and South America continents are considered for analysis. Among all these, the 5-star hotels of Africa continent has attained high ratings than all other hotels in remaining continents. The service attribute of 2-star and 3-star hotels in Australia has attained higher ratings than other 2-star hotels of remaining continents. In all continents, the 3-star and 4-star hotels have attained almost satisfactory ratings. Africa continent has obtained ‘0’ rating for cleanliness and location attributes. However, the cleanliness of 5-star hotels of all continent have received few ratings, which describe that the 5-star hotels are showing huge importance to cleanliness. These attributes are analysed by proposed regression model to recommend the highly rated hotels to customers based on the hotel classes.

The country-wise hotel ratings achieved by 2, 3, 4, and 5 star hotels based on hotel attributes are shown in Table 7. The countries that are taken for analysis are India, Singapore, Thailand, Germany, US, and Brazil. Finally, the overall ratings achieved by each countries for various attributes is also determined. The attributes that are taken into consideration for hotel recommendation are cleanliness, location, value, rooms, service, and sleep quality. The attributes that are taken for analysis during recommendation is mandatory to help the customers to select best hotels during trips. In case if any of the attributes are not proper means the hotel recommendation rating will get reduced. Therefore, to guide the customers in proper area the ratings provided by all customers related to the visited hotels is mandatory to improve the recommendation process.

Overall ratings for different trip types under 5-star hotel class

The following Figs. 3, 4, 5, 6, 7 describe different trip types’ ratings on five-star hotel classes. The ratings of different significant contexts of continents, countries and cities are illustrated graphically. Based on the ranking in the below figures, the 5-star hotel class users under business trip type give first more importance to cleanliness. The figures below show the results on different trip types in the same star hotel.

The overall ratings achieved by different continent, countries, and cities for hotel class 5 and trip type (business) is shown in Figure 3. The business class peoples normally held important meetings in 5-star hotels. They plan to conduct meetings in 5-star hotels by spending huge amount. Therefore it is mandatory to maintain better cleanliness, rooms, services, sleep quality, and values. These suggested features needs to be better in all 5-star hotels to recommend the hotels to customers. For that, the reviews provided by customers is very much useful. These reviews are analysed by recommendation algorithms, based on the similarity analysis the hotel recommendation will happen. The service quality provided to all business people from India, Singapore, and US are found best than other countries. The 5-star hotels from such countries are highly recommended to all business running customers to conduct a standard and perfect official meetings.

The overall ratings achieved by different continent, countries, and cities for hotel class 5 and trip type (friends) is shown in Fig. 4. The countries that are taken for comparison are India, Singapore, Thailand, US, and Brazil. Finally, the overall performance is also described. The cities that are considered for performance analysis are Mumbai, and Vishakhapatnam (India). The parameters that are considered for performance analysis from different hotel class data are cleanliness, location, value, rooms, service, and sleep quality. These are the parameters considered in this work for rating and recommending the hotels. These all parameters needs to be satisfied by all classes of hotels, therefore such hotel can be recommended to number of users based on ratings.

The overall ratings achieved by different continent, countries, and cities for hotel class 5 and trip type (family) is shown in Fig. 5. The family normally expect high standard hotels and rooms, because the family is a group of children, aged persons, and health affected persons. Therefore it is mandatory to maintain better cleanliness, rooms, services, sleep quality, and values. These suggested features needs to be better in all hotels to recommend the hotels to customers. For that, the reviews provided by customers is very much useful. These reviews are analysed by recommendation algorithms, based on the similarity analysis the hotel recommendation will happen. The recommendation analysis is separate for all hotel classes. The overall rating provided by families for 5 star hotel in Vishakhapatnam is 0. This rating may degrade the recommendation of such hotels for customers. Rooms in Thailand hotels has attained high ratings than other countries, this rating makes the Thailand hotel popular among all family. This analysis has shown that the hotels in US (Country), North America (Continent), and Mumbai (city) are highly recommended to trip planning families.

The overall ratings achieved by different continent, countries, and cities for hotel class 5 and trip type (couple) is shown in Fig. 6. The couples normally expect high standard hotels and rooms, therefore it is mandatory to improve the quality of hotels. For that, the reviews provided by customers is very much useful. These reviews are analysed by recommendation algorithms, based on the similarity analysis the hotel recommendation will happen. The recommendation analysis is separate for all hotel classes. The overall rating provided by couple for 5 star hotel in Vishakhapatnam is 0. This rating may degrade the recommendation of such hotels for customers. For continent-wise comparison, the Europe has obtained less ratings than other continents. This analysis has shown that the hotels in US (Country), North America (Continent), and Mumbai (city) are highly recommended to trip planning customers.

The overall ratings achieved by different continent, countries, and cities for hotel class 5 and trip type (solo) is shown in Fig. 7. The countries that are taken for comparison are India, Singapore, Thailand, US, and Brazil. Finally, the overall performance is also described. The cities that are considered for performance analysis are Mumbai, and Vishakhapatnam (India). The parameters that are considered for performance analysis from different hotel class data are cleanliness, location, value, rooms, service, and sleep quality. These are the parameters considered in this work for rating and recommending the hotels. These all parameters needs to be satisfied by all classes of hotels, therefore such hotel can be recommended to number of users based on ratings.

ANOVA (Analysis of variance) test

The prime aspect of this research is to focus on trip types of users to analyse the significant context with various hotel classes. The ANOVA results of different hotel classes are shown in (Table 8). Also, the ANOVA results of different trip types under different hotels are shown in (Tables 9, 10, 11, 12). By performing ANOVA tests, significant differences between each predictors with overall user ratings are identified for each type of trip. Also, the multi-collinearity test is performed between independent variables to identify the collinearity between them. If the value of VIF lies 1–10, there will be no multi-collinearity, and if the VIF factor is < 1 or > 10, the multi-collinearity occurs.

Table 8 Analysis of Variance (ANOVA) based on different hotel classes

Full size table

Table 9 Analysis of Variance of 5 star hotel class based on trip type

Full size table

Table 10 Analysis of Variance of 4 star hotel class based on trip type

Full size table

Table 11 Analysis of Variance of 3 star hotel class based on trip type

Full size table

Table 12 Analysis of Variance of 2 star hotel class based on trip type

Full size table

The ANOVA analysis for 2, 3, 4, and 5 star hotels under different trip type is shown in (Tables 9, 10, 11, 12). These tables are provided to statistically analyse the performance of proposed recommendation system. The ANOVA analysis result shown by proposed approach for different trip type is found better and efficient. This analysis is carried out by considering different qualifying parameters, they are service, rooms, sleep-quality, location, and residuals. Based on these metrics, the statistical analysis is performed. Most of the parameters have shown significant value whereas few have come under NI. This analysis has conveyed the efficiency of proposed approach in efficient manner. Normally, the regression approaches are found efficient for similarity based processing. Regression approaches statistically analyse all data and perform the recommendation in perfect manner. Due to this merit, the regression model is introduced in this work and has attained better recommendation result with less error rate. This approach mainly concentrates on improving the recommendation accuracy by considering the error rate, however it fails to consider the cold-start and data sparsity issues.

Comparative analysis between proposed and existing techniques:

The comparative analysis for proposed and existing hotel recommendation architecture is explained in below section. The accuracy, precision, recall, f1-score, and MAE comparison is shown in Fig. 8.

The accuracy, precision, recall, f1-score, and MAE of proposed is compared with existing techniques to show the efficiency of proposed recommendation algorithm. The proposed algorithm analyses the cleanliness, service, value, room-quality, value attributes to perform the efficient recommendation. Using LR the recommendation process is performed, before that the similarity analysis is carried out which enhances the performances of proposed approach. The techniques that are taken for comparison are SVR, DT (decision tree), RNN (Recurrent neural network), and PCR (Principal component regression). Among all these techniques, the proposed architecture has achieved efficient performance.

The performance comparison between proposed and existing techniques is shown in Table 13. The performance of proposed architecture is found better than other existing techniques. The MAE of proposed is 0.068, whereas the MAE of DNN is 0.46. This comparison illustrates the efficiency of proposed regression technique. This is because the regression models will show efficient performance in analysing the statistical values and achieve a better performance. Due to this advantage, the LR regression model is introduced which has also attained an efficient performance in recommendation.

Table 13 Comparison between proposed and existing techniques

Full size table

Conclusion

Multi-Criteria travel recommender systems represent ratings of user views for different contextual segments. However, since user preferences vary from one another on tourism hotel stays due to their dynamic behaviors. It is a big challenge for online travel recommenders to judge accurate predictions of users. Moreover, due to sparsity and the curse of dimensionality, these recommenders still face many problems in generating accurate recommendations for every user since the user is interested in only a few segments. In this research the multi-criteria recommender algorithm is introduced to recommend hotels upon hotel classes and trip types. Initially the data was extracted from the Trip Advisor across different continents, countries and cities. The second stage is data pre-processing. The item-item-collaborative approach using Adjusted Cosine Similarity is introduced for the replacement of missing values. The multi regression backward elimination is introduced to analyse the impact of contextual segments on the overall rating. Here, ordinary least squares (O.L.S) regression model is designed to reduce the residual sum of squares. To identify the significant context checking multi collinearity among the independent variable is essential and this can be processed with respect to variance impact factor (VIF). In the experimental scenario the performance measure of R-square, MAE, MSE and RMSE are evaluated under several regression techniques. The results can be analysed under both the hotel (2, 3, 4 and 5 star) and trip-type (Business, Family, Friends, Couple, Solo, N.A) under continent and country wise. In this research the scalability issue of the multi criteria system had not examined and in future it will be conducted using an efficient algorithm. Along with that, few additional metrics will also evaluated in future to determine the efficiency of architecture in recommendation system.

Availability of data and materials

The dataset was collected from Trip Advisor through web scrapping from 93 cities across the world from Six continental tourism cities Asia, Europe, North-America, South-America, Africa and Australia.

Abbreviations

VIF:: Variance inflation factor
ANOVA:: Analysis of Variance
MAE:: Mean absolute error
MSE:: Mean squared error
RMSE:: Root mean squared error
CF:: Collaborative filtering
MCRS:: Multi criteria recommender systems
HOSVD:: Higher order singular value decompositon
SOM:: Self-organizing map
ANFIS:: Adaptive neuro-fuzzy inference systems
BERT:: Bidirectional encoder representations form transformers
HR:: Hit ratio
NDCG:: Normalized discounted cumulative gain
MRBE:: Multiple regression backward elimination

References

Nilashi M, Ibrahim O, Yadegaridehkordi E, Samad S, Akbari E, Alizadeh A. Travelers decision making using online review in social network sites: a case on TripAdvisor. J Computational Sci. 2018;28:168–79.
Article Google Scholar
Tsao H-Y, Chen M-Y, Lin H-C, Ma Y-C. The asymmetric effect of review valence on numerical rating: a viewpoint from a sentiment analysis of users of TripAdvisor. Online Information Rev. 2019. https://doi.org/10.1108/OIR-11-2017-0307.
Article Google Scholar
Pereira N, Varma SL. Financial planning recommendation system using content-based collaborative and demographic filtering. In: Panigrahi BK, Trivedi MC, Mishra KK, Tiwari S, Singh PK, editors. Smart Innovations in Communication and Computational Sciences. Singapore: Springer; 2019.
Google Scholar
Geetha G, Safa M, Fancy C, Saranya D. A hybrid approach using collaborative filtering and content based filtering for recommender system. J Phys Conf Ser. 2018. https://doi.org/10.1088/1742-6596/1000/1/012101.
Article Google Scholar
Patro SG, Krishna BK, Mishra SK, Panda RK, Long HV, Tuan TM. Knowledge-based preference learning model for recommender system using adaptive neuro-fuzzy inference system. J Intelligent Fuzzy Systems. 2020. https://doi.org/10.3233/JIFS-200595.
Article Google Scholar
Deldjoo Y, Schedl M, Cremonesi P, Pasi G. Recommender systems leveraging multimedia content. ACM Computing Surveys (CSUR). 2020;53(5):1–38.
Article Google Scholar
Weismayer Christian, Pezenka Ilona, Gan Christopher Han-Kie. Aspect-based sentiment detection: Comparing human versus automated classifications of TripAdvisor reviews. In: Stangl Brigitte, Pesonen Juho, editors. Information and communication technologies in tourism. Berlin: Springer; 2018. p. 365–80.
Google Scholar
Xiang Z, Qianzhou Du, Ma Y, Fan W. Assessing reliability of social media data: lessons from mining TripAdvisor hotel reviews. Information Technology Tourism. 2018;18(1):43–59.
Article Google Scholar
Borges-Tiago MT, Arruda C, Tiago F, Rita P. Differences between TripAdvisor and Booking com in branding co-creation. J Business Res. 2021. https://doi.org/10.1016/j.jbusres.2020.09.050.
Article Google Scholar
Renjith Shini, Sreekumar A, Jathavedan M. An extensive study on the evolution of context-aware personalized travel recommender systems. Info Processing Management. 2020;1:102078.
Article Google Scholar
Kim J, Choi I, Li Q. Customer satisfaction of recommender system: examining accuracy and diversity in several types of recommendation approaches. Sustainability. 2021;13(11):6165.
Article Google Scholar
Anitha J, Kalaiarasu M. Optimized machine learning based collaborative filtering (OMLCF) recommendation system in e-commerce. J Ambient Intell Humaniz Comput. 2021;12(6):6387–98.
Article Google Scholar
Aljunid Mohammed Fadhel, Manjaiah DH. Movie recommender system based on collaborative filtering using apache spark. In: Balas Valentina Emilia, Sharma Neha, Chakrabarti Amlan, editors. Data management, analytics and innovation. Singapore: Springer; 2019. p. 283–95.
Chapter Google Scholar
Al-Ghuribi SM, Noah SA. Multi-criteria review-based recommender system–the state of the art. IEEE Access. 2019;7:169446–68.
Article Google Scholar
Kaur G, Ratnoo S. Adaptive Genetic Algorithm for Feature Weighting in Multi-Criteria Recommender Systems. Pertanika J Sci Technol. 2019;27(1):123–41.
Google Scholar
Hong M, Jung JJ. Multi-criteria tensor model for tourism recommender systems. Expert Syst Appl. 2021;170:114537.
Article Google Scholar
Nilashi M, Ahani Ali, Esfahani MD, Yadegaridehkordi E, Samad S, Ibrahim O, Sharef NM, Akbari E. Preference learning for eco-friendly hotels recommendation: a multi-criteria collaborative filtering approach. J Cleaner Production. 2019;215:767–83.
Article Google Scholar
Shambour Q. A deep learning based algorithm for multi-criteria recommender systems. Knowl-Based Syst. 2021;211:106545.
Article Google Scholar
Nassar NS. A Novel Hybrid Deep Multi-Criteria Model for Recommender System. 2021. https://doi.org/10.21203/rs.3.rs-836949/v1
Sagar KD, Arunasri PS, Sakamuri S, Kavitha J, Kamesh DB. 2020 Collaborative Filtering and Regression Techniques based location Travel Recommender System based on social media reviews data due to the COVID-19 Pandemic. In: IOP Conference Series: materials science and engineering. IOP Publishing. 981 (2):022009
Krishna CV, Appa Rao G, AnuRadha S. A framework for the identification of significant contexts in tourism domain. Int J Adv Sci Technol. 2020;29(7):1007–29.
Google Scholar
Zhuang Y, Kim J. A BERT-based multi-criteria recommender system for hotel promotion management. Sustainability. 2021;13(14):8039.
Article Google Scholar
Singh PK, Sinha M, Das S, Choudhury P. Enhancing recommendation accuracy of item-based collaborative filtering using Bhattacharyya coefficient and most similar item. Appl Intell. 2020. https://doi.org/10.1007/s10489-020-01775-4.
Article Google Scholar
Li W, Zhou X, Shimizu S, Xin M, Jiang J, Gao H, Jin Q. Personalization recommendation algorithm based on trust correlation degree and matrix factorization. IEEE Access. 2019;7:45451–9.
Article Google Scholar
Lin W, Zhang X, Qi L, Li W, Li S, Sheng VS, Nepal S. Location-aware service recommendations with privacy-preservation in the Internet of Things. IEEE Transactions on Computational Social Systems. 2020;8(1):227–35.
Article Google Scholar
Li W, Mo J, Xin M, Jin Q. An optimized trust model integrated with linear features for cyber-enabled recommendation services. J Parallel Distributed Computing. 2018;118:81–8.
Article Google Scholar
Amin MM, Lan JYA, Makhtar M, Mamat AR. A decision tree based recommender system for backpackers accommodations. Int J Eng Technol. 2018. https://doi.org/10.14419/ijet.v7i2.15.11210.
Article Google Scholar
Jhalani T, Kant V, Dwivedi P. A linear regression approach to multi-criteria recommender system. In: Tan Ying, Shi Yuhui, editors. Data Mining and Big Data. Berlin: Springer; 2016. p. 235–43.
Chapter Google Scholar
Ajesh A, Nair J, Jijin PS. 2016 A random forest approach for rating-based recommender system. In: International conference on advances in computing, communications and informatics (ICACCI). IEEE. 1293-1297
Zhang K, Liu X, Wang W, Li J. Multi-criteria recommender system based on social relationships and criteria preferences. Expert Syst Appl. 2021;176:114868.
Article Google Scholar

Download references

Acknowledgements

Not applicable.

Funding

Authors did not receive any funding for this study.

Author information

Authors and Affiliations

Department of CSE, GITAM (Deemed To Be University), Vishakapatnam, A.P, India
Chinta Venkata Murali Krishna, G. Appa Rao & S. Anuradha

Authors

Chinta Venkata Murali Krishna
View author publications
You can also search for this author in PubMed Google Scholar
G. Appa Rao
View author publications
You can also search for this author in PubMed Google Scholar
S. Anuradha
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

MKC has found the proposed algorithms and obtained the datasets for the research and explored different methods discussed. AG contributed to the modification of study objectives and framework. Their rich experience was instrumental in improving our work. AS has done the literature survey of the paper and contributed writing the paper. All authors contributed to the editing and proofreading. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Chinta Venkata Murali Krishna.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no Competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Krishna, C.V.M., Rao, G.A. & Anuradha, S. Analysing the impact of contextual segments on the overall rating in multi-criteria recommender systems. J Big Data 10, 16 (2023). https://doi.org/10.1186/s40537-023-00690-y

Download citation

Received: 29 January 2022
Accepted: 21 January 2023
Published: 05 February 2023
DOI: https://doi.org/10.1186/s40537-023-00690-y

Analysing the impact of contextual segments on the overall rating in multi-criteria recommender systems

Abstract

Introduction

Related work

Proposed methodology

Data pre-processing

Regression-based multi-criteria recommendation

Results and discussions

Analysis of performance measures based on different hotel classes and trip types

DTR

LR

RF

SVR

Overall rating of different hotel classes on continental, county, and city

Overall ratings for different trip types under 5-star hotel class

ANOVA (Analysis of variance) test

Comparative analysis between proposed and existing techniques:

Conclusion

Availability of data and materials

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords