Skip to main content

Towards a folksonomy graph-based context-aware recommender system of annotated books


The emergence of collaborative interactions has empowered users by enabling their interactions through tagging practices that create a folksonomy, also called, classification of the shared resources, any identifiable thing or item on the system. In education, tagging is considered a powerful meta-cognitive strategy that successfully engages learners in the learning process. Besides, the collaborative tagging gathers learners’ opinions, thus, provides more comprehensible recommendations. Still, the abundant shared contents are mostly unorganized which makes it hard for users to select and discover the appropriate items of their interests. Thus, the use of recommender systems overcomes the distressing search problem by assisting users in their searching and exploring experience, and suggesting relevant items matching their preferences. In this regard, this article presents a folksonomy graphs based context-aware recommender system (CARS) of annotated books. The generated graphs express the semantic relatedness between these resources, i.e. books, by effectively modeling the folksonomy relationship between user-resource-tag and integrating contextual information within a multi-layer graph referring to a Knowledge Graph (KG). To put our proposal into shape, we model a real-world application of Goodbooks-10k dataset to recommend books. The proposed approach incorporates spectral clustering to deal with the graph partitioning problem. The experimental evaluation shows relevant performance results of graph-based book recommendations.


The popularity of the social information systems (SocIS) has enabled social interaction leading to tremendous growth in the volume of shared resources. A resource is any identifiable things stored in the system (e.g. books, images, museums, selling items, electronic health records, and other documents). Social information systems SocIS are information systems based on social technologies and open collaboration [1]. The rise of SocIS allows users to easily communicate, learn and coordinate collaboratively. The SocIS platforms use the features of folksonomy to commonly contribute to the management of the shared resources. More contemporary and popular techniques of SocIS include social network analysis, collaborative filtering and social tagging [2]. The arrival of the social web has enabled users to add metadata using their keywords, called tags, to the shared content. Users collaboratively use tags to annotate the various types of resources such as books, places, blogs, images, videos, audios, and so forth. The collaborative tagging enables a collaborative classification, folksonomy, of the shared resources. Folksonomy, unlike a taxonomy, includes terms without a hierarchy [3]. It is a collection of tags from uncontrolled vocabulary. Folksonomy, named also collaborative tagging and social indexing, derives a classification system from the collaboratively generated tags.

Collaborative tagging techniques are useful for enhancing the recommendation of learning resources [4]. More comprehensible learning resources’ recommendations are provided by gathering learners’ opinions. Indeed, the collaborative tagging technique is used for automatic analysis of the user’s preferences to improve recommender system quality [5]. Folksonomy adds semantics to the shared learning resources without the support of manual indexers or automated generated keywords. Tagging is demonstrated to be an adequate meta-cognitive strategy that successfully engages learners in the learning process [6].

The recommender system (RS) overcomes the distressing search problem for the users, it is not only used in the e-commerce domain to suggest products on Amazon, videos on YouTube, movies on Netflix, friends on Facebook, and so on. RS is also helpful in the education domain. It plays the role of supporting teaching, learning [7] and expanding the knowledge through enhancing the information retrieval and suggesting additional information that has not been yet founded through the user’s searching.

Active research in the field of recommender system has been focusing on measuring the similarities between different items. Many Recommender Systems (RSs) utilize traditional filtering techniques to infer similarities between items. Specific designs of graph-based filtering systems unable flexibility in computing the similarity (using co-occurrence, co-rating, co-tagging, co-describing, and so on) among different entities. Besides, for better-personalized user recommendations, RSs leverage the contextual information in their process of recommendation, called context-aware recommendation system (CARS). This paper will mainly respond to the following Research Questions:

  • RQ1: How to model the relational information between resources, in our case study books, by effectively analyzing the folksonomy relationships?

  • RQ2: How to establish a folksonomy graph-based Context-Aware Recommender System CARS?

  • RQ3: How to perform recommendations based on a Knowledge Graph (KG) database?

To put our proposal into shape, we model a real-world application of Goodbooks-10k dataset [8]. The great volume of its relational data provides tagged and rated books, which we aim to model and represent within a multi-layers graph. Where each layer concerns a graph of homogeneous type of nodes related with their weighted edges, i.e., a graph of tags, graph of books, and graph of users. The combination of the three graphs forms a multi-relational or multi-layer graph, where nodes represent entities and edges represent relationships. The resulting multi-relational graph is referred to as a Knowledge Graph (KG).

Our proposal aims to (i) analyze and model the folksonomy relationship to generate graphs of books and tags (ii) integrate the emergent semantic of books within the CARS by establishing a semantic graph-based recommender system methodology (iii) exploring the generated multi-layer graph that incorporates the three graphs of users, books and tags modeled within a knowledge graph database to perform diversity in the recommendation. Besides, This paper presents the use of spectral clustering as a pre-processing step before constructing the semantic graph-based recommendation algorithm to suggest books.

The rest of the paper is organized as follows: Section "Background and related work" presents the background and related works of recent graph-based recommender systems approaches. Section "Proposed approach" depicts the proposed approach of folksonomy graphs based context-aware recommender system using spectral clustering. Section "Evaluation and results" describes the evaluation and experimental results using the Goodbooks-10k [8] dataset for book recommendations. Finally, the conclusion and future directions are delineated in Sect. "Conclusion and perspectives".

Background and related work

Recommender system

The recommender system is a branch of information retrieval and artificial intelligence, a subclass of information filtering systems that aims to assist the user’s search behavior by suggesting items that best meet their interests and preferences. The main commonly distinguished filtering algorithms of recommender systems are collaborative filtering (CF), content-based (CB) and hybrid filtering recommendations approach [9], which rely on the exploitation of items-items similarities and users-items interactions. These approaches come across some challenges that affect the precision of recommendations [10], such as sparsity, cold start, and scalability issues. Besides, the challenges of recommending in real-time within the scalable systems have called the need to enhance the recommendation method. Given the focus of this article, this literature review will mainly be related to graph-related recommendation algorithms.

Recommender system to support learners

The purpose of recommender systems is to help users conduct searches by suggesting resources (items) that best match their interests and preferences [11, 12]. The use of recommender systems in Computer Environments for Human Learning (CEHL) aims to support learners in the learning process to achieve their learning goals [13]. On such platforms, the pedagogical resources are regularly organized and diffused [14] which involves the learners’ concern about which resource to consult. Recommendation systems provide support to learners by recommending content that might be of their interest and which they may not have discovered before.

Graph-based recommender system

Recent works have shown the effectiveness of using graph modeling to enhance the performance of recommender systems. The authors [15] propose three methods for making KG-based recommendations using a general-purpose probabilistic logic system. Linked Open Data has been used as an external knowledge graph adopted within a hybrid graph data model [16]. In [17], they propose a heterogeneous information network-based music recommendation system that uses a graph-based algorithm to generate recommendations. To improve social trust and influence, the authors [18] propose a graph-based model for the social recommendation. To reduce the dimensionality of the recommendation problem, the authors [19] propose a graph-based recommendation system that learns and exploits the geometry of the user space to create clusters in the user domain. In [20], they overcome the major challenges of sparsity and scalability by leveraging a directed graph of jobs connected by multi-edges to recommend jobs. Although these approaches use graphs-based recommendations to solve specific issues, our work differs significantly as we aim to enhance the recommendation’s performance and diversity by capturing the semantic and social interaction through modeling folksonomy relationship into a semantic graph described by weighted graphs then exploring the multi-layer graph so-called KG. Graphs are mathematical structures enabling to encode multiple types of interactions, as we will show in what follows, recommendations can be computed thanks to graphs that create meaningful clusters of entities reducing the dimensionality of the recommendation problem.

Graph database

The use of graph-oriented database technology leverages the building of flexible recommendation engines. A graph database is characterized by its distinct “data model” compared to traditional relational databases [21]. A data model is a set of conceptual tools to represent and manage data. The data model consists of three components [22]: data structure, types, query operators, integrity rules. A graph database stores data and represented it in a graph. Each graph database has its specialized graph query language. For example, Neo4J uses Cypher language, and RDF (Resource Description Framework) databases use SPARQL (SPARQL Protocol and RDF Query Language). The integrity rules in a graph database are based on its graph constraints, rather than an imposed relational schema. The popularity of using graph databases has emerged with the increasing complexity of real-world data and growing needs for graph queries. For example, Neo4J is suited for online transaction processing (OLTP), a widely used open-source graph database, and characterized as an “embedded, disk-based, and fully transactional graph database engine” [23]. Graph databases are gaining a lot of interest with their data modeling tools that provide a closer fit to real-world data. Early adopters of graph technology reimagined their businesses around the value of data relationships. These companies have now become industry leaders: LinkedIn, Google, and Facebook [24]. The graph database model is a set of nodes related with edges described with properties. The graph database represents and stores data using a graph structure. It is based on the NoSQL approach to lunch semantic queries to retrieve data.

Graph databases use a data model that stores the data relationships as edges related to the nodes representing the data. Therefore, the graph data model enables storage, processing, and querying connections between data efficiently. While relational databases compute data relationships through expensive, high-cost, and complex join queries.

Proposed approach

The social web has initiated the use of collaborative tagging over the years. It has called attention to analyzes the inter-connectivity of the user-resource-tag to improve the recommender system.

The proposed approach models the folksonomy characteristics by inferring user-resource-tag in graphs linked to one another to emerge the semantic graph-based recommendations (see Fig. 1). In this paper, the annotated resource are books. Two coherent clusters are obtained using spectral clustering to reduce the scalability issue of the dataset (see Figs. 2 and 3).

Fig. 1
figure 1

Modeling folksonomy relationship into a semantic graph-based CARS of books

Fig. 2
figure 2

K-means clustering in 2 cluster

Fig. 3
figure 3

Spectral clustering in 2 Clusters

The use of spectral clustering will pre-process the construction of the graph of books to reduce the scalability issue. For each graph, we examine the relationship between its entities and identify actionable knowledge. For a community of users U = {u\(_{s}\)}, they annotate a set of books B= {b\(_{k}\)} with a set of tags T= {t\(_{i}\)}. Where, 1\(\leqslant\)s\(\leqslant\)l ; 1\(\leqslant\)k\(\leqslant\)m ; 1\(\leqslant\)i\(\leqslant\)n . And l, m and n are respectively the total number of users, books and tags.

Spectral clustering

The use of spectral clustering stages a pre-processing phase before constructing the graph of resources. Spectral clustering deals with the graph partitioning problem. It transforms the current space to bring connected data points close to each other to form clusters. In this context, the data points to be clustered are 10K books.


Clustering is one of the most widely used techniques for exploratory data analysis. Its goal is to divide the data points into several groups such that points in the same group are similar and points in different groups are dissimilar to each other. Spectral clustering has become increasingly popular due to its promising performance in graph-based clustering. It can be solved efficiently by standard linear algebra software and very often outperforms traditional algorithms such as the k-means algorithm. Spectral clustering does not make assumptions about the shape of clusters. Unlike K-means, which assumes a spherical shape for the resulted clusters. Spectral clustering gives importance to connectivity (within data points) rather than compactness (around cluster centers). The goal of spectral clustering is to cluster data that is connected.


The usefulness of an item depends on the user’s current context and circumstances. Varying the recommendations depends on contextual factors like time, location, mood, user’s actual situation, position, status, and condition. The challenge is to go beyond the representational approach with its predefined and fixed set of observable attributes.

The set of books B= {b\(_{i}\)} represents the data points, where the book b\(_{i}\) denotes data entry. Each b\(_{i}\) \(\in\) B\(^{f}\), where f is the number of features describing b\(_{i}\), like spatial, temporal and static contextual features, tags.

Similarity matrix

Given an enumerated set of data points B, the Similarity or Adjacency Matrix is defined as a symmetric matrix A, where A\(_{ij}\geqslant\) 0 represents a measure of the similarity between b\(_{i}\) and b\(_{j}\). A\(_{ij} \simeq\) 1 when b\(_{i}\) and b\(_{j}\) have the same features. The data points, books, are in the same cluster when there are close, but in different clusters when there are far away. But data points in the same cluster may also be far away or even farther away than points in different clusters. The goal is to transform the f-dimensional space so that when 2 points b\(_{i}\) and b\(_{j}\) are close, they are always in the same cluster, and when they are far apart, they are in different clusters. A common way to define similarity is by using the Gaussian Kernel A\(_{ij}\).

$$\begin{aligned} A_{ij}=e^{-\frac{\parallel b_i - b_j \parallel ^2}{2\sigma ^2 }} \end{aligned}.$$

Unnormalized graph laplacian

The unnormalized Graph Laplacian is a matrix defined as the difference of 2 matrices denoted: L= D - A , where D is the diagonal Degree matrix.

$$\begin{aligned} D_{ii} = \sum _j A_{ij} \ \ \ And \ \ \ \ L_{ij}=D_{ii}- A_{ij} \end{aligned}.$$

Process of spectral clustering

  1. 1

    Construct the similarity matrix using Gaussian Kernel,

  2. 2

    Compute the first k eigenvectors of its Laplacian matrix to define a feature vector for each object. It embeds the data points, books {b\(_{i}\)} , in a low-dimensional space in which clusters are more obvious.

  3. 3

    Apply a classical clustering algorithm, like K-means, to partition the books into k classes.

The spectral clustering data entry are books described by their contextual information and most popular tags f= 13. Each book b\(_{i}\) has its defined vector of contextual information CI-b\(_{i}\)={author\(_{i}\), language\(_{i}\), year-of-publication\(_{i}\), 10-most-popular-tags{t\(_{i}\) \(_{1}\),...,t\(_{i}\) \(_{10}\)}}. The spectral clustering defines the 2 clusters of books better than the k-means (see Figs. 2 and 3).

Folksonomy graph-based recommendation:

Graph of books

Each cluster resulting from the spectral clustering has its graph of books. The graph of books emerges from the similarities among books using their descriptive tags. Two books b\(_{i}\), b\(_{j}\) \(\in\) B are semantically similar when their weighted edge W(b\(_{i}\),b\(_{j}\)) is high. The weight W(b\(_{i}\),b\(_{j}\)) computes how strongly two books are semantically related (3). The weighted edges are normalized to re-scales values into a range of [0,1].

$$\begin{aligned} \begin{aligned} W(b_i,b_j) = \sqrt{W_t(b_i,b_j)^2 + W_c(b_i ,b_j)^2 } \ \ \ \ \ \ \\ W_t(b_i,b_j) = \frac{Number\ of \ tags \ describing \ both \ books \ b_i \ and \ b_j}{Total\ Number\ of \ tags} \\ W_c(b_i ,b_j)= \frac{\sum w_c(b_i ,b_j)}{{f}} \\ w_c(b_{i} ,b_{j}) = \left\{ \begin{array}{ll} 1 &{} \text{ if } Matching-Member[CI-b_{i},CI-b_{j}]=True\ \\ 0 &{} \text{ else. } \end{array} \right. \end{aligned} \end{aligned}$$

f= Total Number of books’ contextual features.

Matching-Member[CI-b\(_{i}\),CI-b\(_{j}\)] returns True if an element of CI-b\(_{i}\) matches CI-b\(_{j}\), and False otherwise.

The contextual information of a book b\(_{i}\) : CI-b\(_{i}\)={author\(_{i}\),language\(_{i}\),year-of-publication\(_{i}\), 10-most-popular-tags{t\(_{i}\) \(_{1}\),...,t\(_{i}\) \(_{10}\)}}

Therefore, the recommender system will explore the books-books and tags-tags similarity to do suggestions.

Graph of tags

Tags are clustered in an emergent graph of tags. We consider the relevant tags with higher degree of frequency DF (4).

More comprehensive folksonomies emerge from non-expert or novice users’ tags than from experts’ tags only [25] . Therefore, the proposed approach considers the extraction of tags that are frequently used and understood by many users of the community.

We consider a book, b \(\in\)B described by a set of tags from T. The extraction of relevant tags describing this book b is computed by considering the degree of frequency of each tag t\(_{i}\) , denoted by DF(b,t\(_{i}\)).

$$\begin{aligned} DF(b,t_i) = \sqrt{FT(b,t_i)^2 + FU(b,t_i)^2} \end{aligned}$$


FT (b,t\(_{i}\)) is the Frequency of the tag t\(_{i}\) annotating the book b;

FU (b,t\(_{i}\)) is the Frequency of users who use the tag t\(_{i}\) to annotate the book b.

$$\begin{aligned} FT(b,t_i)= & {} \frac{Number\ of \ times\ the \ tag \ t_i\ is\ used\ to\ describe\ the\ book \ b}{Number \ of \ tags\ used\ to\ describe\ the\ book\ b } \end{aligned}$$
$$\begin{aligned} FU(b,t_i)= & {} \frac{Number\ of \ users \ who\ use\ the \ tag \ t_i\ to\ annotate\ the\ book\ b}{Number\ of\ users\ who\ annotate\ the\ book\ b} \end{aligned}$$

The relevant tags are those with higher degree of frequency.

The graph of tags G\(_{T}\)=(V\(_{T}\),E\(_{T}\)) is drawn up by the common weight W(t\(_{i}\), t\(_{j}\)) relating two tags t\(_{i}\) and t\(_{j}\). The tags represents the nodes V\(_{T}\), linked together by the weighted edges W(t\(_{i}\), t\(_{j}\)). The weight W(t\(_{i}\), t\(_{j}\)) identifies the semantic relationships among tags t\(_{i}\) and t\(_{i}\); it scales how strongly the two tags are semantically related regarding their commonly usage of users’ annotation and books’ description.

The emergent folks’ tags semantic graph enables graph-based reasoning about the relationships between tags attributed to describe different books.

Folksonomy graph-based recommendation algorithm

The recommender system will suggest books based on the emergent graphs of books and tags from the folksonomy.

figure a

Evaluation and results

To evaluate the proposed approach, we use the Goodbooks-10k [8] dataset for book recommendations. The dataset provides ten thousand books tagged and rated by users. The books are described by their metadata like ISBN, authors, year, and title. Besides, the books are rated (from one to five) and tagged by users using 34252 tags. The evaluation of the book’s recommendations is performed using the automatic or offline evaluation that considers the previously rated books (rating \(\geqslant\)3) as the ground truth, true positive Tp, interpreted relevant to the user. That is, evaluating how closely the recommendations match the actual preferences of the user. For each user, we select its previously rated book to generate automatic book recommendations based on a graph-based recommender system algorithm. The accuracy metrics, namely precision P, recall R, and F1-measure, are calculated from the number of books that are either rated or not and either recommended or not. Four possible outcomes are shown in the confusion matrix (see Table 1).

Table 1 Confusion matrix accumulating the possible results of books recommendation

The precision P is the probability that the recommended books are relevant to the user. It is defined as the number of recommended relevant books, that were previously rated above 3, divided by the total number of recommended books. While Recall R is defined as the number of recommended relevant books divided by the total number of existing relevant books. Recall R is the probability that the user’s previously above 3-rated (relevant) books are recommended. The F1-measure F combines precision and recall.

$$\begin{aligned} P = \frac{Tp}{Tp + Fp} \ \ ; R =\frac{Tp}{Tp + Fn} \ \ ; F= \frac{2 \times P \times R}{P + R} \end{aligned}$$

We compute Precision P1 for our proposed approach and Precision P2 for hybrid-based recommender system (hybrid based-RS), Recall R1 for our proposed approach, and Recall R2 for hybrid based-RS. The experimental evaluation (Table 2) contains the results of the three metrics evaluating the recommendations presented to randomly selected 10 active users. The accuracy metrics evaluate whether the folksonomy graph-based recommendations algorithm can properly predict the relevant books that were previously well-rated by the users. The high precision results indicate that almost all the recommendations are indeed relevant to the users.

Table 2 Evaluation: Book Recommendation

The proposed approach of folksonomy graphs-based recommender system is compared to hybrid recommendations using both filtering approaches CB and CF (Figs. 4 and 5). The algorithm of hybrid based-RS recommends books with similar content to the 10 active users. Its recommendation process is based also on the similarity of books with users’ profiles using collaborative filtering. The precision-recall curve shows that the precision and the recall of the proposed approach “folksonomy graph based CARS” are higher than the hybrid based-RS. Indeed, the comparative figure (see Fig. 4) shows that precision P1 and recall R1 of our proposed approach have higher results than the precision P2 and recall R2 of the hybrid based-RS.

Fig. 4
figure 4

Comparison of the proposed approach of folksonomy graphs based CARS with Hybrid based -RS

Fig. 5
figure 5

Precision-Recall Evaluation

Recommendation based on knowledge graph database

The use of graph database enables: Graph storage structure, Flexible data model, Graph querying, Scalability, and Transaction processing [26]:

Graph storage

Providing native processing capabilities where every node is directly linked to its neighbour node. Therefore, each node acts as an index of other nearby nodes. The graph-based indices enable retrieve graphs quickly from a large database. the graph database is appropriate for local graph queries that need one index lookup to traverse nodes relationships. While in Relational Database Management System (RDMS), it probably needs joining more tables through foreign keys and using additional index lookup.

Query performance

Graph databases come from a flexible data model that enables graph storage in the database strongly tied to graph querying. It generally performs great regardless of the number and depth of connections. In practice, the most frequent queries use the index-free adjacency that includes looking for a node and its neighbours with the shortest path comprising their edges, then retrieve attributes, and so on. A complex query is the subgraph and supergraphs queries that output another graph by ordinarily performing a selection or a projection of the original graph. The current most known graph query language is Cypher native graph query language working with Neo4j database. The declarative query language Cypher is inspired by SQL syntax.


Scaling graphs requires the graph partitioning called sharding which may fail under the category of NP-hard problems. The reason is that graph data is connected which avoids distributing its relationships as much as possible called the minimum point-cut problem. The scaling concerns the scaling of large datasets, read and write performance. In practice, the scalability is not causing a problem in the graph databases area. For instance, Neo4j manages the size of the graph arbitrary on an upper limit on the order of \(10^2\) which is enough to support most of the real-world graphs. For example, Neo4j deploys on a cluster more than half of Facebook’s social graph. Besides, Neo4j has focused on reading performance using two levels of caching to improve scalability in highly concurrent workloads. The scaling for writes can be accomplished by scaling vertically which requires distributing data across multiple machines especially for very heavy write loads. For example, OLTP graph database system optimizes concurrent access and updates for thousands of users. The scaling for write performance still represents a real challenge for graph databases.

Transaction processing

The optimization of Graph databases is often focusing on transaction processing. For example, Neo4j retains ACID transactions. The nature of the graph data structure helps to spread the transactions overheard across the graph. Consequently, the transnational conflicts fade out with the graph expansion and growth. Distributed graph processing requires appropriate partitioning and replication by minimizing the need to ship data between different network nodes. For example, Neo4j uses master-slave replication which means once a machine is a master and the other machines are its slaves. The architecture Neo4j enables that all writes passe through the master first towards any machine, and if the master fails, the cluster will elect a new master automatically. Besides, it contains a bulk loader that operates at a throughput of million records per second. The Graph Model in the Neo4j database has the following components:

  • Nodes (equivalent to vertices in graph theory): are the main data elements that are interconnected through relationships. A node can have one or more labels (that describe its role) and properties (i.e. attributes).

  • Labels: are used to group nodes, and each node can be assigned multiple labels. Labels are indexed to speed up finding nodes in a graph.

  • Properties: are attributes of both nodes and relationships. Neo4j allows for storing data as key-value pairs, which means properties can have any value (string, number, or boolean).

A KG is a heterogeneous network since it contains multiple types of nodes and relations. To illustrate the recommender system based on the KG database, making it simple, we do not consider the weighted edges of every graph of the multi-layer graph (horizontal weighted edges) and focus only on the direct edges (Vertical relations) relating the different type of nodes. The generated Knowledge Graph KG (Fig. 6) provides different ways to organize, manage and retrieve information.

Fig. 6
figure 6

Modeling Relational Database GoodBook-10K to Knowledge Graph DataBase using Neo4j

The model chosen for the graph consists in:

  • Three kinds of nodes: User, Book and Tag

  • Two kind of direct edges: Has_Tag expressing the relationship Book-Tag. Has_Rated relating users to books. The rating r(b,u) given by user u to book b is recorded as a property of the edge.

By focusing on tags, we find movies that have similar tags in the Graph Database. For instance, we executed the following Cypher Queries (Cypher Query (1) and Cypher Query (2)) presented in Fig. 7 and Fig. 8. For a given user that has rated the book titled "Zero to One: Notes on Startups, or How to Build the Future", the Cypher Query (3) recommends books that are described with similar tags (see Fig. 9). For reasoning, this strategy recommends books that were not rated before by the target user.

figure b
figure c
Fig. 7
figure 7

Cypher Query (1)-KG: tags and their Frequency describing the book “Zero to One: Notes on Startups, or How to Build the Future”

Fig. 8
figure 8

Cypher Query (2)-KG: Related books to the book “Zero to One: Notes on Startups, or How to Build the Future” having the tag “business”

Fig. 9
figure 9

Cypher Query (3)- KG : Discovery Recommendation Based on Knowledge Graph

In many domains, it is insufficient to predict how much a user will like an item. The recommender system’s performances need to have a good overview of Novelty, diversity, unexpectedness [27] and other components of utility. Presume the utility and usefulness of the recommendations for the user. The challenge is finding the exact mix of novelty, diversity and familiarity and inclination to recommend items. For a given user that has rated the book titled “Zero to One: Notes on Startups, or How to Build the Future”, the bellow Cypher Query (3) recommends books that are described with similar tags. For reasoning, this strategy recommends books that were not rated before by the target user.

figure d

Conclusion and perspectives

The emergence of collaborative interactions has scaled up the growth and diversity of shared resources. With the advent of social information systems, users share, annotate and rate resources (e.g, books, movies, articles, etc.) any identifiable thing or item on the system. This abundance of shared resources is mostly unorganized, thus making it tough for users to do their searching and exploring experiences. The arrival of the social web has enabled users to annotate the shared resources with their tags, which creates a collaborative classification or folksonomy. The proposed approach aims to explore the contextual information coming from the application domain as well as analyzing the folksonomy relationship to generate graphs of resources and tags which create the ground of knowledge of the recommender system. the purpose of this article is to enhance the recommendation’s performance and diversity by capturing the semantic and social interaction through modeling folksonomy relationship into a semantic graph described by weighted graphs then exploring the multi-layer graph so-called knowledge graph (KG). The recommender system’s purpose is to help users in finding and discovering books of their interest. This paper describes a folksonomy graph-based context-aware recommender system of annotated books. Our proposal presents three contributions: (i) analyze and model the folksonomy relationship to generate graphs of books and tags (ii) integrate the emergent semantic of books within the CARS by establishing a semantic graph-based Recommender system methodology (iii) exploring the generated multi-layer graph that incorporates the three graphs of users, books and tags modeled within a knowledge graph database to perform diversity in recommendations. Besides, This paper presents the use of spectral clustering as a pre-processing step before constructing the semantic graph-based recommendation algorithm to suggest books. The Goodbooks-10k dataset has been conducted in this study as a real-world use case. The evaluation has shown the accuracy and effectiveness of the proposed approach. The experimental evaluation has provided higher accuracy measures attesting the relevancy of the proposed folksonomy graphs based CARS algorithm compared to hybrid based RS. Future works will extend the experiment to the online evaluation. Future perspectives will focus on integrating additional contextual information to improve the description of resources, also covering the graph theory and network analysis for generating and adjusting the graph of resources to enhance their recommendations.

Availability of data and materials

The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.



Social Information Systems


Context-Aware Recommender System


Knowledge Graph


Recommender System


Recommender Systems


Relational Database Management System




Collaborative Filtering


Computer Environments for Human Learning


Online Transactional Processing


Atomicity, consistency, isolation, durability


  1. Tilly R, Posegga O, Fischbach K, Schoder D. Towards a conceptualization of data and information quality in social information systems. Bus Inform Syst Eng. 2017;59(1):3–21.

    Article  Google Scholar 

  2. Hafidi M, Abdelwahed EH, Qassimi S. Graph-based tag recommendations using clusters of patients in clinical decision support system. Concurrency and Computation: Practice and Experience n/a(n/a), 5624.

  3. Qassimi S, Abdelwahed EH. The role of collaborative tagging and ontologies in emerging semantic of web resources. Computing. 2019.

    Article  Google Scholar 

  4. Klašnja-Milićević A, Ivanović M, Vesin B, Budimac Z. Enhancing e-learning systems with personalized recommendation based on collaborative tagging techniques. Appl Intell. 2018;48(6):1519–35.

    Article  Google Scholar 

  5. Klašnja-Milićević A, Vesin B, Ivanović M, Budimac Z, Jain LC. Folksonomy and tag-based recommender systems in e-learning environments. Cham: Springer. pp. 77–112; 2017.

  6. Cao Y, Kovachev D, Klamma R, Jarke M, Lau RWH. Tagging diversity in personal learning environments. J Comp Edu. 2015;2(1):93–121.

    Article  Google Scholar 

  7. Rivera AC, Tapia-Leon M, Lujan-Mora S. Recommendation systems in education: A systematic mapping study. In: Rocha Á, Guarda T (eds) Proceedings of the International Conference on Information Technology & Systems (ICITS 2018). Cham: Springer; 2018. pp. 937–947.

  8. Zajac Z. Goodbooks-10k: a new dataset for book recommendations. FastML 2017.

  9. Abbas A, Zhang L, Khan SU. A survey on context-aware recommender systems based on computational intelligence techniques. Computing. 2015;97(7):667–90.

    Article  MathSciNet  Google Scholar 

  10. Khalid O, Khan MUS, Khan SU, Zomaya AY. Omnisuggest: a ubiquitous cloud-based context-aware recommendation system for mobile social networks. IEEE Trans Services Comput. 2014;7(3):401–14.

    Article  Google Scholar 

  11. Nassar N, Jafar A, Rahhal Y. Multi-criteria collaborative filtering recommender by fusing deep neural network and matrix factorization. J Big Data. 2020;7(1):34.

    Article  Google Scholar 

  12. Srifi M, Oussous A, Ait Lahcen A, Mouline S. Evaluation of recent advances in recommender systems on Arabic content. J Big Data. 2021;8(1):35.

    Article  Google Scholar 

  13. Drachsler H, Verbert K, Santos OC, Manouselis N. Panorama of recommender systems to support learning; 2015. pp. 421–451.

  14. Manouselis N, Drachsler H, Verbert K, duval e. Recommender Systems for Learning; 2012.

  15. Catherine R, Cohen W. Personalized recommendations using knowledge graphs: A probabilistic logic programming approach. In: Proceedings of the 10th ACM Conference on Recommender Systems. RecSys ’16, pp. 325–332. Association for Computing Machinery, New York, NY, USA 2016.

  16. Musto C, Lops P, de Gemmis M, Semeraro G. Semantics-aware recommender systems exploiting linked open data and graph-based features. Knowl-Based Syst. 2017;136:1–14.

  17. Wang R, Ma X, Jiang C, Ye Y, Zhang Y. Heterogeneous information network-based music recommendation system in mobile networks. Comput Commun. 2020;150:429–37.

    Article  Google Scholar 

  18. Bathla G, Aggarwal H, Rani R. A graph-based model to improve social trust and influence for social recommendation. J Supercomput. 2020;76(6):4057–75.

    Article  Google Scholar 

  19. Yang K, Toni L. Graph-based recommendation system. In: 2018 IEEE Global Conference on Signal and Information Processing (GlobalSIP), 2018; pp.798–802.

  20. Shalaby W, AlAila B, Korayem M, Pournajaf L, AlJadda K, Quinn S, Zadrozny W. Help me find a job: A graph-based approach for job recommendation at scale. In: 2017 IEEE International Conference on Big Data (Big Data), 2017; pp. 1544–1553.

  21. Angles R, Gutierrez C. Survey of graph database models. ACM Comput Surv. 2008;40(1):1–1139.

    Article  Google Scholar 

  22. Codd EF. Data models in database management. SIGMOD Rec. 1980;11(2):112–4.

    Article  Google Scholar 

  23. Park Y, Shankar M, Park B-H, Ghosh J. Graph databases for large-scale healthcare systems: A framework for efficient data management and data services; 2014. pp. 12–19.

  24. Unal Y, Oguztuzun H. Migration of data from relational database to graph database. In: Proceedings of the 8th International Conference on Information Systems and Technologies. ICIST ’18, pp. 6–165. ACM, New York, NY, USA 2018.

  25. Kang J, Lerman K. Leveraging user diversity to harvest knowledge on the social web. In: 2011 IEEE Third International Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third International Conference on Social Computing; 2011. pp. 215–222.

  26. Pokorný J. Graph databases: their power and limitations. In: Saeed K, Homenda W, editors. Comput Inform Syst Ind Manag. Cham: Springer; 2015. p. 58–69.

    Google Scholar 

  27. Castells P, Wang J, Lara R, Zhang D. Introduction to the special issue on diversity and discovery in recommender systems. ACM Trans Intell Syst Technol. 2014;5(4):52–1523.

    Article  Google Scholar 

Download references


Not applicable.


Not applicable.

Author information

Authors and Affiliations



All authors contributed to this paper. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Sara Qassimi.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Qassimi, S., Abdelwahed, E.H., Hafidi, M. et al. Towards a folksonomy graph-based context-aware recommender system of annotated books. J Big Data 8, 67 (2021).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: