The social web has initiated the use of collaborative tagging over the years. It has called attention to analyzes the inter-connectivity of the user-resource-tag to improve the recommender system.
The proposed approach models the folksonomy characteristics by inferring user-resource-tag in graphs linked to one another to emerge the semantic graph-based recommendations (see Fig. 1). In this paper, the annotated resource are books. Two coherent clusters are obtained using spectral clustering to reduce the scalability issue of the dataset (see Figs. 2 and 3).
The use of spectral clustering will pre-process the construction of the graph of books to reduce the scalability issue. For each graph, we examine the relationship between its entities and identify actionable knowledge. For a community of users U = {u\(_{s}\)}, they annotate a set of books B= {b\(_{k}\)} with a set of tags T= {t\(_{i}\)}. Where, 1\(\leqslant\)s\(\leqslant\)l ; 1\(\leqslant\)k\(\leqslant\)m ; 1\(\leqslant\)i\(\leqslant\)n . And l, m and n are respectively the total number of users, books and tags.
Spectral clustering
The use of spectral clustering stages a pre-processing phase before constructing the graph of resources. Spectral clustering deals with the graph partitioning problem. It transforms the current space to bring connected data points close to each other to form clusters. In this context, the data points to be clustered are 10K books.
Clustering
Clustering is one of the most widely used techniques for exploratory data analysis. Its goal is to divide the data points into several groups such that points in the same group are similar and points in different groups are dissimilar to each other. Spectral clustering has become increasingly popular due to its promising performance in graph-based clustering. It can be solved efficiently by standard linear algebra software and very often outperforms traditional algorithms such as the k-means algorithm. Spectral clustering does not make assumptions about the shape of clusters. Unlike K-means, which assumes a spherical shape for the resulted clusters. Spectral clustering gives importance to connectivity (within data points) rather than compactness (around cluster centers). The goal of spectral clustering is to cluster data that is connected.
Background
The usefulness of an item depends on the user’s current context and circumstances. Varying the recommendations depends on contextual factors like time, location, mood, user’s actual situation, position, status, and condition. The challenge is to go beyond the representational approach with its predefined and fixed set of observable attributes.
The set of books B= {b\(_{i}\)} represents the data points, where the book b\(_{i}\) denotes data entry. Each b\(_{i}\) \(\in\) B\(^{f}\), where f is the number of features describing b\(_{i}\), like spatial, temporal and static contextual features, tags.
Similarity matrix
Given an enumerated set of data points B, the Similarity or Adjacency Matrix is defined as a symmetric matrix A, where A\(_{ij}\geqslant\) 0 represents a measure of the similarity between b\(_{i}\) and b\(_{j}\). A\(_{ij} \simeq\) 1 when b\(_{i}\) and b\(_{j}\) have the same features. The data points, books, are in the same cluster when there are close, but in different clusters when there are far away. But data points in the same cluster may also be far away or even farther away than points in different clusters. The goal is to transform the f-dimensional space so that when 2 points b\(_{i}\) and b\(_{j}\) are close, they are always in the same cluster, and when they are far apart, they are in different clusters. A common way to define similarity is by using the Gaussian Kernel A\(_{ij}\).
$$\begin{aligned} A_{ij}=e^{-\frac{\parallel b_i - b_j \parallel ^2}{2\sigma ^2 }} \end{aligned}.$$
(1)
Unnormalized graph laplacian
The unnormalized Graph Laplacian is a matrix defined as the difference of 2 matrices denoted: L= D - A , where D is the diagonal Degree matrix.
$$\begin{aligned} D_{ii} = \sum _j A_{ij} \ \ \ And \ \ \ \ L_{ij}=D_{ii}- A_{ij} \end{aligned}.$$
(2)
Process of spectral clustering
-
1
Construct the similarity matrix using Gaussian Kernel,
-
2
Compute the first k eigenvectors of its Laplacian matrix to define a feature vector for each object. It embeds the data points, books {b\(_{i}\)} , in a low-dimensional space in which clusters are more obvious.
-
3
Apply a classical clustering algorithm, like K-means, to partition the books into k classes.
The spectral clustering data entry are books described by their contextual information and most popular tags f= 13. Each book b\(_{i}\) has its defined vector of contextual information CI-b\(_{i}\)={author\(_{i}\), language\(_{i}\), year-of-publication\(_{i}\), 10-most-popular-tags{t\(_{i}\) \(_{1}\),...,t\(_{i}\) \(_{10}\)}}. The spectral clustering defines the 2 clusters of books better than the k-means (see Figs. 2 and 3).
Folksonomy graph-based recommendation:
Graph of books
Each cluster resulting from the spectral clustering has its graph of books. The graph of books emerges from the similarities among books using their descriptive tags. Two books b\(_{i}\), b\(_{j}\) \(\in\) B are semantically similar when their weighted edge W(b\(_{i}\),b\(_{j}\)) is high. The weight W(b\(_{i}\),b\(_{j}\)) computes how strongly two books are semantically related (3). The weighted edges are normalized to re-scales values into a range of [0,1].
$$\begin{aligned} \begin{aligned} W(b_i,b_j) = \sqrt{W_t(b_i,b_j)^2 + W_c(b_i ,b_j)^2 } \ \ \ \ \ \ \\ W_t(b_i,b_j) = \frac{Number\ of \ tags \ describing \ both \ books \ b_i \ and \ b_j}{Total\ Number\ of \ tags} \\ W_c(b_i ,b_j)= \frac{\sum w_c(b_i ,b_j)}{{f}} \\ w_c(b_{i} ,b_{j}) = \left\{ \begin{array}{ll} 1 &{} \text{ if } Matching-Member[CI-b_{i},CI-b_{j}]=True\ \\ 0 &{} \text{ else. } \end{array} \right. \end{aligned} \end{aligned}$$
(3)
f= Total Number of books’ contextual features.
Matching-Member[CI-b\(_{i}\),CI-b\(_{j}\)] returns True if an element of CI-b\(_{i}\) matches CI-b\(_{j}\), and False otherwise.
The contextual information of a book b\(_{i}\) : CI-b\(_{i}\)={author\(_{i}\),language\(_{i}\),year-of-publication\(_{i}\), 10-most-popular-tags{t\(_{i}\) \(_{1}\),...,t\(_{i}\) \(_{10}\)}}
Therefore, the recommender system will explore the books-books and tags-tags similarity to do suggestions.
Graph of tags
Tags are clustered in an emergent graph of tags. We consider the relevant tags with higher degree of frequency DF (4).
More comprehensive folksonomies emerge from non-expert or novice users’ tags than from experts’ tags only [25] . Therefore, the proposed approach considers the extraction of tags that are frequently used and understood by many users of the community.
We consider a book, b \(\in\)B described by a set of tags from T. The extraction of relevant tags describing this book b is computed by considering the degree of frequency of each tag t\(_{i}\) , denoted by DF(b,t\(_{i}\)).
$$\begin{aligned} DF(b,t_i) = \sqrt{FT(b,t_i)^2 + FU(b,t_i)^2} \end{aligned}$$
(4)
where
FT (b,t\(_{i}\)) is the Frequency of the tag t\(_{i}\) annotating the book b;
FU (b,t\(_{i}\)) is the Frequency of users who use the tag t\(_{i}\) to annotate the book b.
$$\begin{aligned} FT(b,t_i)= & {} \frac{Number\ of \ times\ the \ tag \ t_i\ is\ used\ to\ describe\ the\ book \ b}{Number \ of \ tags\ used\ to\ describe\ the\ book\ b } \end{aligned}$$
(5)
$$\begin{aligned} FU(b,t_i)= & {} \frac{Number\ of \ users \ who\ use\ the \ tag \ t_i\ to\ annotate\ the\ book\ b}{Number\ of\ users\ who\ annotate\ the\ book\ b} \end{aligned}$$
(6)
The relevant tags are those with higher degree of frequency.
The graph of tags G\(_{T}\)=(V\(_{T}\),E\(_{T}\)) is drawn up by the common weight W(t\(_{i}\), t\(_{j}\)) relating two tags t\(_{i}\) and t\(_{j}\). The tags represents the nodes V\(_{T}\), linked together by the weighted edges W(t\(_{i}\), t\(_{j}\)). The weight W(t\(_{i}\), t\(_{j}\)) identifies the semantic relationships among tags t\(_{i}\) and t\(_{i}\); it scales how strongly the two tags are semantically related regarding their commonly usage of users’ annotation and books’ description.
The emergent folks’ tags semantic graph enables graph-based reasoning about the relationships between tags attributed to describe different books.
Folksonomy graph-based recommendation algorithm
The recommender system will suggest books based on the emergent graphs of books and tags from the folksonomy.