A Hybrid Recommender System Based-on Link Prediction for Movie Baskets Analysis

The proposed method in this paper consists of three steps: initial clustering of all users and assigning new user to appropriate clusters, assigning appropriate weights to users' characteristics, and identifying new user’s adjacent users using hybrid similarity criteria and adjacency matrix of adjacent users’ rating to the movie services and calculating new user’s rating to each movie considering adjacent users’ rating and the similarity level of each adjacent user to the new user. The results show that the mean squared error of the proposed model has decreased respectively 8.59%, 8.67%, 8.45% and 8.15% compared to the basic models such as Naive Bayes, multi-attribute decision tree and randomized algorithm. Also, MAE of the proposed method decreased by 4.5% compared to SVD and approximately 4.4% compared to ApproSVD and RMSE of the proposed method decreased by 6.05% compared to SVD and approximately 6.02% compared to ApproSVD.


Introduction
Due to the high importance of recommender systems in social networks, real life, ecommerce, shopping cart analysis, etc., a lot of research has been done in recent years [1][2][3].
Recommender systems are one of the most popular systems that have attracted the attention of various researchers during the past decade. Recommender systems are used to filter huge amount information, such as users' cart [4]. Recommender systems are used in a variety of fields such as shops, libraries, restaurants, tourism systems, shopping carts and other environments to provide attractive items such as movie services [5]. These systems play an important role in e-commerce [6]. Due to the huge amount of information that exists, providing the most appealing services with high accuracy and appropriate time is one of the important issues. The service recommender system enables users to review products having features such as product's name, manufacturer, production date, brand type, and so on. For users who are new and there is not enough information about them in the system (they have cold start problem), the recommender system offers a list of products which are rated by other users [7]. One of the most important challenges of recommender systems is the challenge of user's cold start [8]. The problem of cold start occurs when the user has no activity or transaction in the system. Due to the cold start problem of users, a variety of recommender systems have been proposed. In general, recommender systems are divided into two categories:

Modern Recommender Systems
A) Demographic-Base Approach [12] B) Knowledge-based Approach [13] The methods that have been studied by various researchers are collaborative and contentbased filtering systems. Content-based systems classify users based on their demographic information. Collaborative filtering systems are one of the most widely used recommendation techniques that offer users the items that have been rated or selected by other similar users [14]. For example, if two users have similar interests and behaviors, they recommend the purchased service system (film) to each other [15]. In this system, unlike content-based systems, similar users are identified and items which are highly rated are offered to them.
This method is used to present a list of products to a group of users using data mining (clustering) techniques [16]. Using similarity criteria in collaborative systems to find adjacent users or similar activities is one of the main requirements of making recommendations.
Similarity criteria in recommender systems make it possible to identify similar users or services based on their demographic activity, category and information. In this study, similarity criteria were used in collaborative recommender system to offer the similarity level of items that are rated by other users to the new user in different steps.
In general, this paper presents a hybrid system based on content-based and collaorative recommender system for analyzing user's cart in an online movie system. In the contentbased recommender system, DBScan clustering algorithm and deep neural network algorithm are used to determine basic categories for users based on demographic information and also to classify new users. One of the most important reasons for using DBScan algorithm for the initial clustering of users based on demographic information is its speed and the ability to support large amounts of information compared to other clustering algorithms. Also, the most important reason for using deep neural network algorithm to classify new users is its ability to support huge amount of information and hidden layers compared to other methods is classification. The deep neural network enables new users to be transferred to the target group with high accuracy. The collaborative recommender system uses a combination of similarity criteria and the improved FriendLink link algorithm to determine the similarity between new users and other users. With the hybrid similarity criteria, the similarity level of users and the new user is calculated in terms of a threshold. The improved FriendLink link algorithm is used to provide friend recommendations based on user communication in online movie system. The remainder of this paper will be presented as follows: Section 2 reviews the literature, Section 3 describes the proposed approach and architecture, and in Sections 4 and 5, the results are presented and the conclusions are discussed.

Review of the Literature
In this study, for making recommendations in movie systems, several researchers tried to solve the problem of cold start. Hung et al, (2011) mentioned the cold start problem concerning movies and users. They introduced an important traditional system of collaborative filtering. In this model, two matrixes of similarity were used, one of which showed the similarity between users and movies and the other one showed the similarity between users themselves. Then, concerning the mechanism of the discussed forecast, they made some recommendations to the users. One of the weaknesses of this study was the high memory usage concerning members (users) and movies which was due to the construction of several similarity matrixes [17]. Bobadilla et al, (2012) used the neural network as an RS of the collaborative filtering to reduce cold start issues for new users. They assessed the Movielens dataset and Netflix and due to the usage of non-numeric data, they used Jaccard Similarity Index [18]. Henc (2013) recommended movies to users by clustering movies and using k-means algorithm. He carried out it based on users' comments about movies. Henc studied famous Movielens dataset and implemented the presentation for data collection with 10109 movies that were assessed by 2113 users [19]. Kamvatsus et al, (2014) introduced a model in which classification algorithms such as Naïvebays, decision tree, and random classification algorithm were used as similarity metrics in order to recommend movies to users. Also, they evaluated movielens dataset [20]. In order to enhance the performance of the system and to solve cold start problem, Luize et al (2015) posed the hybrid method including both collaborative filtering and demographic information. In this study, they used the hybrid co-clustering algorithm and knowing the machine for solving the cold start problem and evaluated Movielens, Jester, and Netflix dataset [21], Sperlì et al, in [22], provided a recommendation system to improve social networking approach. In this paper, an RS which is designed for big data applications is used to provide useful recommendations on online social networks. The proposed technique is a collaborative and user-centric approach that exploits the interactions between users and creates multimedia content on one or more social networks in a new and effective way. Experiments on the data collected from several online social networks revealed the feasibility of the approach regarding the problem of social media proposition. Kutty et al, in [23], presented recommender systems for large social networks: reviewing challenges and solutions. This article states that social networks are crucial for networking, communication, and content sharing. Social networking applications generate a great deal of information on a daily basis, and social networks are subject to extensive research due to the heterogeneity of data and the structures within them, their size and dynamics. When such a large amount of data is used by recommender systems, the connection result can help to solve social business issues and to improve friends' recommendations. This article is a review article that has compared some trends with each other. Lin et al., in [24], developed a recommendation system based on neural network for recommending movies to users. Due to unimportant challenges like scalability, dispersion and user's confidence compared with cold start and movies which have been researched till now, the challenges have also been resolved with preprocessing, clustering and classification.
Walek et al, (2020) in [25] the main objective of this article to propose a hybrid recommender system predictor for recommending suitable movies. This system contains a recommender module combining a collaborative filtering system, a content-based system, and a fuzzy expert sys-tem.  [25] module combining a collaborative filtering system, a content-based system, and a fuzzy expert sys-tem.    The number of link prediction studies is about 0.5 times higher than that of recommender systems. Therefore, through reviewing the literature it has been observed that various methods have been proposed to provide users with recommendations in movie services. Each of these methods has its own challenges, such as inadequate accuracy, high error rate and lack of appealing services. This article presents a recommender system based on a combination of content-based and collaborative recommender systems which solves both the problem of cold start and addresses the challenge of users' trust.

The Proposed Method
In this section, we will describe and present the proposed method with regard to the flowchart that is being presented and in the following sections, the items specified in the flowchart will be described in full detail. As can be seen in Fig. 4, the Movielens dataset is first introduced into the hybrid recommender system. This database has three sections: datasets of user communications, demographic information and rated movies. The datasets of demographic information and rated movies are used in content-based recommender systems and the user communication dataset is used in collaborative systems. After entering the dataset into the proposed system, the first phase of the proposed method is executed. The first phase involves content-based recommendation system. In content-based recommender system, DBScan clustering algorithm is used for the initial clustering of users' dataset based on demographic information and deep neural network algorithm to classify new users. In this phase, all users are first clustered based on demographic information such as age, occupation and gender using DBScan algorithm.

Movielens Datset
New Users  Finally, the results are evaluated. Based on the phases described in the proposed method, the steps of the proposed method are as follows:

Phase 1: Content-Based Recommender System
After loading the dataset into the proposed system, the first phase of the proposed method is executed. The first phase involves a content-based recommender system. In the contentbased recommender system, DBScan is used for the initial clustering of users' dataset according to demographic information and deep neural network algorithm in order to classify new users. In this phase, all users are first clustered based on demographic information such as age, occupation and gender using the DBScan algorithm. For all users in the system, a label is specified as a cluster. In phase 1, the content-based recommender system clusters users and assigns them to categories.

Initial Clustering of Users with DBScan Algorithm
At this point, the DBScan clustering algorithm separates users and determines the clusters based on demographic information of users. One of the most important features of the DBScan clustering algorithm compared to other clustering algorithms such as K-Means and X-Means, etc., is that the algorithm identifies and separates heterogeneous data. This reduces the complication of the model and in addition to improving the processing speed, improves the initial clustering process of users in the online movie system. DBScan clustering algorithm is a density-based spatial clustering algorithm that can also define anomalies in the dataset. DBScan clustering algorithm requires two user-defined parameters: Epsilon Proximity Distance (EPS) and the minimum number of minpts. For a given point, points in the eps distance are called adjacent points of that point. If the number of adjacent points is greater than minpts, this group of points is called cluster.
DBScan clustering algorithm labels data points as prime points, boundary points and remote (anomalous) points. eps has the lowest rating. The pseudo-code of DBScan clustering algorithm is given in the following algorithm. The inputs of this algorithm are the userdefined datasets and parameter values of eps and minpts. The following is a pseudo-code of the DBScan algorithm.

Algorithm 1. DBScan algorithm for initial clustering of users based on their profile information
In algorithm (1) Table 2 shows the demographic information of 5 users. As can be seen in Table (2), the second column shows the gender, the third column shows the age and the fourth column shows the users' occupation according to the table below. Table 3 shows the values of the jobs defined in the users' demographic file.  After applying the clustering algorithm, all users are clustered based on demographic information (age, gender, and occupation). Each user is assigned a cluster label. These labels are used as categories for each user. After clustering, users should be divided into two groups.
The first category, which accounts for 70% of all users, is used to train deep neural networks and generate models. The second category includes new users who constitute 30% of all users.

Separation of training and test samples
One of the important phases to train the deep neural network algorithm is dividing samples into two main parts. The first part is used to train the deep neural network algorithm model and the second part is used as a test case for categorizing new users. Sampling is one of the stages of data mining which considered in the proposed solution.

Classification of New Users with Deep Neural Network
At this stage, using the demographic information of the new users and the clusters specified in the previous step, the appropriate category for the new users can be found. Fig. 5 shows the steps of cluster selection using deep neural network method.  Figure 5. Determination of the new user class of Error using deep neural network methods As can be seen in Fig. 5, the training data that is the output of the clustering stage is processed by deep neural network method and the desired model is generated. Then the new user enters the system as the test data and is assigned to a category. When the cluster or category of the new is determined, its adjacent users which include the users of that category are extracted. The comments of the adjacent users are taken into consideration in movie recommendations. One of the most important reasons for using deep neural network algorithm is to support a large number of hidden layers and high classification accuracy when having huge amount of data.

Phase 2: Collaborative Recommender System Based on Hybrid Similarity Criterion
After determining the category for new users, phase 2 of the proposed method begins. The  As can be seen in the first line, the number of users is entered in the algorithm as input.
Line three defines the list the users. In line four, a category is defined for the new user. In

Numeric features
For numeric features such as age, a similarity criterion is defined as follows (2): In the equation above, Diff represents the age difference between users and Diffmax is the maximum difference defined by the researcher. If the researcher wants to increase the value of wage to the value of Diff, he should simply set the value of β less than 1.

String features
In equation (3), the equation for calculating the similarity based on the string features is shown.
Considering that whether the feature values of 1 and 2 are the same or not, the value of 1 or 0 is reset.

Formation of Adjacency Matrix
After obtaining the similarity level of the new user to other adjacent users using the similarity criteria, the adjacency matrix associated with the rates given by the adjacent users should be created for the ratings and through using a prediction formula which will be further explained, the ratings given by each user to the desired product should be calculated with the similarities obtained in the previous step and predict the rating as the success factor of the target product in order to recommend it to the new user. It should be noted that the highest rated products are recommended as superior products. The general form of the adjacency matrix for a product and user is as follows.

Predicting new user's rating
After forming the adjacency matrix in the previous step, the value of rates given by the users should be calculated for the new user and presented as prediction. So, we make predictions for the new user in the final phase. For each nj user, the proposed model must predict values for the item Ib. Rnj,ib is a predicted rating assigned to item b by the new user.
The predicted rating for each user is obtained using the following equation.
ru,ib is the rating given to item i by the user u in adjacent users' list. Therefore, using the aforementioned prediction formula, the predicted ratings for each item will be approximately predicted the prediction which has the highest rating will be selected. The TF value that indicates the new or old user is effective in the provided rating. If the user is new, the value is TF = 0 and the doubled rating given by the researcher tends to vary from 0.1 to 1, which may affect the predicted total rating.  (6) can be used to obtain the degree of similarity between the users X and Y. One of the most important features of this criterion is that it calculates the similarity between two users on social network with great accuracy. This level of accuracy makes recommended users or friends more attractive.

Phase 3: Improved Friendlink Algorithm
In this phase, which is almost the ultimate phase, the improved Friendlink algorithm is run on the dataset to calculate the similarity between users who are connected though the link. In this section, we first describe the Link Prediction via Friendlink and the improved algorithm.
Friendlink algorithm is a link prediction algorithm that is widely used to predict future links, especially dating, on social networks.
ni is a new user who has recently entered the social network graph and uj is the target user.
L: Specifies the path length.
n: Specifies the total number of graph nodes.
It is a weighting factor which is more effective for paths whose length is more than L = 2. Suppose that the maximum path length is 2 (L = 2). in this case ∑ 1 −1 =2 = 1 and has no effect on the ultimate similarity. When L = 3, this weight is changed to 0.5 and has a significant effect.  Friendlink formula to improve the accuracy of link prediction. Applying the adjacency degree of the nodes the formula will change as follows.
N is the total number of degrees of nodes. is the degree of the target node. So, we try to improve the results of the article by using the Friendlink algorithm and improving the similarity formula. In the proposed method, the presented method is called Pro-Friendlink.
The important thing about Pro-Friendlink algorithm is that it gives more importance to the friends that are more connected to the users and ultimately recommend users who have more credibility in the social network to the target user. It makes it possible to recommend more attractive and popular users to the new users.
An example of Pro-Friendlink algorithm is presented here. Pro-Friendlink prediction method calculates the similarity between nodes in a unidirectional graph so that users' credibility is taken into account. Pro-Friendlink algorithm receives G-graph communications as input, and after generating the adjacency matrix, calculates the similarity between the two nodes and indicates it as output. Consequently, the friend suggestions can be based on the weights calculated by Pro-Friendlink prediction algorithm in the adjacency matrix. Figure 3 shows the Pro-Friendlink prediction algorithm. If we want to suggest a new friend to user U1, there is no direct indication of this in the adjacency matrix shown in Table (1). After running Pro-Friendlink algorithm, we can find the similarity matrix between the two nodes of the G graph and suggest friends based on importance.
In the proposed method, we first modify the adjacency matrix A displayed in Table (2). So that instead of having values of 0 and 1, the input (i.j) is a list of paths from node i to node j.
The basic idea is that if the adjacency matrix A, which contains 0/1 of a graph, is increased to a power N by the adjacency matrix, then the result of the input data (i, j) shows how long the path from node VI to node VJ There is. Then, instead of counting the routes, we will look for all real routes.

Phase 4: Combining Link System and Recommender System
Phase 4 of the proposed method is related to the combination of collaborative recommender system's output and improved Friendlink algorithm. In Phase 4, which is the final phase in the proposed system, the results of Phases 1 and 2 are combined with Phase 3, and the jointly selected users are sent as the final proposal. Suppose that the results of Phase 1 and 2 were users 1, 2, 4, 5, 6, 9 and 10. the results of phase 3 were users 3, 4, 5, 6, 9 and 10.
The results are both combined and users 4, 5, 6, 9, 10 are suggested as attractive users to the new user.

Results
In this paper, simulation on Movielens dataset is performed to investigate the issue and evaluate the results. To access the data source used, simply refer to [22] and select and download the desired data from the versions provided. The version used in this study was 2013 whose size is 1 MB. These files contain 1,000,209 recordings of user ratings, 3,900 movie samples and 6,040 user samples, with each user rating at least 20 movies. Mean Absolute Error and Root Mean Square are used as evaluative criteria.
Where Pu, i is the predicted rating of user u to the movie i and ru, i is the actual rating of user u to the movie i. Here are some of the scenarios that are illustrated in the table below. As can be seen, different results are obtained according to different weightings for age, sex, and occupation. The above scenarios are defined with different weights. Weights that have more features are actually more focused on the feature and have more similarity effects.
The following table calculates the mean prediction errors of the proposed method using MAE and RMSE and compares them with that of other methods. The stage considered for the following results is stage 1.       Step 2.   The following table also summarizes the evaluations of steps 1, 2 and 3 with 500 users. The following table also summarizes the evaluations of steps 1, 2 and 3 with 900 users. Classification [20] as well as algorithms like SVD and ApproSVD [26].
In other reviews we enable our paper with [24]. Chu-Hsing Lin et. al, in [24], presented the neural network for movie recommendation system. Table. 6, shows the comparison of the proposed method using boosting approach with Scikit-learn, TensorFlow. Scikit-learn is an early machine learning tool with many ready-made libraries and functions to call, and has powerful and fast mathematics capabilities. TensorFlow is a tool that has emerged in recent years with the development of deep learning. As can be seen in Table (18), the proposed method performs better than Scikit-learn and TensorFlow methods with and without applying neural network. Also, this method has a lower processing time comparing with other methods.

Discussion
The main purpose of this article is to solve cold start problem in online movie networks and to introduce appropriate movies to new users with acceptable accuracy. To do this, content-based recommender systems and collaborative filtering as well as clustering techniques and deep neural network were used. In this article, the researcher applied clustering techniques, deep neural network, hybrid similarity criteria, and improved Friend link algorithm as methods which are much more accurate than any other methods used before to provide new users with appropriate movies. Therefore, given the simulation to provide attractive movies to new users who are experiencing a cold start, the proposed method, compared to other methods, recommends desired movies to users in a timely manner. So another major issue is the trust issue that arises from disregarding older users. The proposed method consists of four steps: (1) initial clustering of all users and assigning new users to appropriate clusters; (2) assigning appropriate weights to the characteristics of the target cluster users and determining the adjacent users (3) forming adjacency matrix of adjacent users' ratings to the existing movies and calculating the new user's ratings considering adjacent users' ratings and similarity level that exists between the users. At this stage doubled rating opportunity is created for loyal users.

4) Using Friendlink algorithm to introduce similar users to the new user and to combine
Step 4 with Step 3.
Finally, to evaluate the prediction error of the proposed method in comparison with other similar methods such as C24.5, CM4.5, RCA and Naïvebays method, MAE and RMSE evaluation criteria were used. The error rate of the proposed method is less than other similar methods and has indicated acceptable accuracy in introducing movies to new users.

Conclusion
In our future work, we would like to focus on several areas. Here are some recommendations for further research: 1) in the present article, the k-mean clustering technique is used to cluster users with the number of k-tests obtained by trial and error.
Therefore, techniques such as random-walk clustering algorithm or improved clustering algorithm can be used and the results can be compared and evaluated with the current and other methods of clustering. 2) In this paper, the idea of deep neural network is used to assign new users to the desired cluster to select the categories that are most suitable for users. Therefore, future works can replace other techniques and compare the results with the proposed method. This method can be implemented for optimal operation by fog computing in distributed models.

Availability of data and materials
The datasets generated and/or analysed during the current study are not publicly available due [REASON WHY DATA ARE NOT PUBLIC] but are available from the corresponding author on reasonable request.