 Research
 Open access
 Published:
Discovering topweighted ktruss communities in large graphs
Journal of Big Data volumeÂ 9, ArticleÂ number:Â 36 (2022)
Abstractï»¿
Community Search is the problem of querying networks in order to discover dense subgraphscommunitiesthat satisfy given query parameters. Most community search models consider link structure and ignore link weight while answering the required queries. Given the importance of link weight in different networks, this paper considers both link structure and link weight to discover topr weighted ktruss communities via community search. The topweighted ktruss communities are those communities with the highest weight and the highest cohesiveness within the network. All recent studies that considered link weight discover topweighted communities via global search and indexbased search techniques. In this paper three different algorithms are proposed to scaleup the existing approaches of weighted community search via local search. The performance evaluation shows that the proposed algorithms significantly outperform the existing stateoftheart algorithms over different datasets in terms of search time by several orders of magnitude.
Introduction
Community search is a major problem in graph model which had recently gained excessive attention from researchers. Community Search problem is to search a graph to discover a community that satisfies certain query parameters. For example, a community that contains a certain vertex or a set of keywords is required to be discovered. There are many studies over community search especially on large graphs [1,2,3,4,5,6,7,8,9]. Most of studies within community search usually ignore edge weight. The edge weight is playing an important role where it is used to represent the strength of the relationship between any two vertices. There are many applications that clarify the importance of edge weight:

The edge weight in a coauthorship network may indicate how many papers the two linked authors had coauthored together [10]. Considering the edge weight during community search would ensure that authors within discovered communities have strong coauthoring relationship between them.

The edge weight in social network may represent the similarity, or interactions between users [10]. Considering the edge weight in the resulted community would ensure the discovery of highly interacted and similar group of users.

Corporate ownership networks (CON), this is a weighted economic network that links 406 different countries, and its weights represent the business ties among countries [11]. Considering the edge weight within the discovered communities would reveal and ensure the business ties between countries.
A sample of social network is illustrated in Fig. 1 where vertices represent users and edge between any two vertices represents the friendship relation. In such social network the edge weight plays an important role describing the social interactions between users. The two communities in Fig. 1a and in Fig. 1b are densely connected in terms of the number of edges between vertices. Besides, the two communities in Fig. 1b show the top interacting groups of users based on the weights on the edges between them. The community with minimum edge weight 20 is considered as the top weighted community of interacting users while the community with minimum edge weight 7 is considered as the second top weighted community. In this example, edge weight was able to distinguish between different groups of users according to their interaction level. In addition, all three communities in Fig. 1a and b are densely connected where their trussness is equal to three. The trussness level of three ensures that every two connected nodes have one common neighbor and consequently ensures a high level of structural similarity between nodes.
Inspired by the importance of edge weight, this paper considers the edge weight to discover weighted communities. More specifically, the proposed models in this paper utilize edge weight and ktruss model in order to discover top weighted ktruss communities. ktruss community is the densest subgraph in which each edge resides in at least k2 triangles [1], where triangle models the cyclic relationship between 3 vertices.
Querying top weighted ktruss communities has been studied recently in the literature using different methods like online method and index based method [10]. Both methods discovers top weighted ktruss communities using global search where the whole graph resides in the main memory and all edges are required to be visited. Another direction to discover top weighted communities utilizes local search method [4, 12]. Local search discovers the community of a given vertex using the neighbouring vertices and their edges. Local search is more efficient than global search as it searches a small portion of the graph to discover the required communities. On the other hand, local search techniques proposed in the literature donâ€™t consider edge weigh and discover coherent kcore local communities only. The main challenge of local search while considering edge weight is to find the small portion of the graph that certainly contains the required top weighted communities.
This paper builds on the concept of local search in order to obtain the same output communities discovered by global search. The utilization of local search technique is motivated by their search strategy that tends to check the neighborhood of a node rather than checking the whole graph. Using such a search strategy to obtain the same results obtained by global search would guarantee an extremely less search time while having the required results. This paper utilizes local search in three different methods in order to optimize the required search time to find the topweighted ktruss communities. Local search is performed by checking the vicinity of the edge with the highest weight while processing edges in a descending order based on their weights. Once a community is found in the vicinity of the edge being checked between the edge and its neighbors, a local solution can be confirmed. The local solution is an evidence for the existence ofat leastone global solution within the portion of the graph checked so far. Then, the global solutions can be found by checking this portion of the graph instead of the whole graph.
The main contribution of this paper is utilizing local search technique in three proposed algorithms to discover topweighted ktruss communities. More specifically, the main contributions are as follow:

A LOCAL kTRUSS ALGORITHM (LKA): is proposed as a base solution to apply local search. The algorithm processes edges with the highest weight in sequence. With each edge, the vertices of the edge and their common neighbouring that are processed so far are checked to find out if they form a local ktruss community.

A DEGREEBASED LOCAL kTRUSS ALGORITHM (DBLKA): based on the fact that any ktruss community must be (k1)core community and not vice versa, DBLKA is proposed. Similar to LKA, edges with the highest weight are processed in sequence. With each edge, the vertices of the edge and their common neighbouring that are processed so far are checked to find out if they form a (k1)core first before checking if they form a local ktruss community. Consequently, the number of local communities that need to be checked for ktruss existence are decreased which improves search time as shown in "Performance evaluation" section.

A MULTIPLE CANDIDATE LOCAL kTRUSS ALGORITHM (MCLKA): Similar to DBLKA, the algorithm check the existence of (k1) core before checking the the existence of local ktruss. On the other hand, the algorithm does the checking process on multiple edges vertices and their common neighbours at once rather than checking each edge case on its own. The collective checking process for multiple candidates has led to a dramatic improvement in terms of search time as shown in "Performance evaluation" section
For all proposed algorithms; once the number of local ktruss communities reaches the required number of communities, the edges processed so far are examined using enumeration algorithm [10] in order to find global communities resides in them.
The rest of this paper is organized as follows: "Related work" section presents related work. "Preliminaries" section overviews some of the basic concepts used in the paper including weighted graphs, edge support, and weighted ktruss communities. In "Proposed algorithms" section the proposed algorithms are presented. "Performance evaluation" presents the empirical results and discusses them. Finally, "Conclusion" section concludes the paper and highlights possible directions for future work.
Related work
There are several models in the literature which address the problem of discovering cohesive subgraph in terms of structure which is called community detection task. Community detection models [13,14,15,16] are used to discover group of vertices that are strongly connected to each others and weakly connected to outside vertices. The most commonly used and familiar techniques to discover dense subgraphs are cliques [17, 18], quasiclique [19], kcore [20, 21, 21], edge density [22, 23], edge connectivity [24, 25], and ktruss [1, 2]. Recently, authors in [26] proposed a new model called KTMiner to detect ktruss communities in a distributed manner using MapReduce framework on Apache Spark environment.
Community search is another task where the goal is to discover cohesive group of vertices but in terms of search according to a certain query in the graph rather than detection of all existing communities. Community search is proposed to address the problem of discovering group of vertices that contains a specific vertex, set of vertices, or set of keywords. The community search problem is well studied in the literature [4,5,6, 27,28,29,30]. A global search procedure is proposed by the authors in [27] to search for a subgraph that contains a query vertex by iteratively removing vertices with the minimum degree which can be computed in a linear time. In [4] an efficient local search procedure for the same problem is proposed by the authors where the algorithm starts from the vertex query and expand the search to its neighbours in order to find the best community that query vertex resides into. A novel \(\alpha\)adjacency \(\gamma\)quasikclique model was proposed by the authors in [28] to study the overlapping community search problem. In [5, 29], the community search problem is studied by utilizing the ktruss model, where the maximal connected ktruss component containing a query vertex is considered as a community.
Another category of community search algorithms are weightbased community search. Influential community search is an example of weightbased community search where the each node in the graph is weighted with its influence. In [3], two algorithms;online and index are proposed to ed discover the top weighted influential communities where kcore model is utilized. In [8], the authors extended online algorithm of[3] in two ways namely Backward and Forward algorithms. Backward algorithms it starts the search by adding vertices with the highest weight and verify the component if it is a kcore or not; if it is a kcore a solution is returned otherwise it proceeds to add more vertices. Forward algorithm iteratively removes vertices with the minimum weight until the graph becomes disconnected and the top communities are returned. The authors in [12] proposed localoptimal algorithm which considered is the state of art according to its performance. Localoptimal algorithm build its search space incrementally by adding subsets of vertices with the highest weights until the required top weighted communities are discovered. All these techniques are nodeweight based which utilize the kcore model as their cohesion measure. Another weightbased community search algorithm is proposed in [10] which utilize ktruss as its cohesion model. It differs from other algorithms as it considers an edgeweighted approach rather than nodeweighted. The authors proposed two different techniques to retrieve top weighted ktruss communities, the first one is discovering communities online; The procedure starts by discovering the maximal ktruss of the original graph, and iteratively removes edges with the minimum weight and with each removal a maximal connected component procedure is run to find the next ktruss connected component. The online approach cannot scale for large graph as the whole graph should be resides in main memory. The other approach is the index based where all the weighted ktruss communities are indexed separately for each k. The required communities are returned directly from the index. The index based approach is more efficient than the online base one, it suffers from the large size of the index which requires much time to traverse that index. In addition, the maintenance of the index would be time consuming. Recently authors in [31] proposed two online algorithms namely BACKWARD ALGORITHM, and WEIGHTSENSITIVE LOCAL SEARCH ALGORITHM (WSLSA). The main idea for the two proposed algorithms is iteratively attaching the edge with the highest weight. The two proposed algorithms are, the BACKWARD ALGORITHM, and WEIGHTSENSITIVE LOCAL SEARCH ALGORITHM (WSLSA) overcome the drawbacks of the online search algorithms proposed in [10]. The BACKWARD ALGORITHM algorithm detects the topr weighted ktruss communities by iteratively attaching the edges with the highest weight after reducing the graph to its ktruss. On the other hand, the WEIGHTSENSITIVE LOCAL SEARCH ALGORITHM (WSLSA) detects the topr communities by visiting only the highest weighted edges in the graph without the need to reduce the graph into its ktruss. The drawback of BACKWARD ALGORITHM and WEIGHTSENSITIVE LOCAL SEARCH ALGORITHM (WSLSA) is their failure to process large graphs especially when k approaches the max level of trussness. When k approaches the max level of trussness, more candidate solutions have to be verified where candidate size gets bigger with each cycle. Consequently, the algorithm fails when it has to verify candidates with very large size. kcore is a community detection model which discovers a connected subgraph where each vertex has degree no less than k. Another community detection model is called \(ktruss\) which is defined based on the concept of triangle where each edge in a connected \(ktruss\) subgraph resides in at least \(k2\) triangles. All previous models that addressed the problem of top weighted community search have focused mainly on global search solutions on a way or another. Except for BACKWARD ALGORITHM and WEIGHTSENSITIVE LOCAL SEARCH ALGORITHM (WSLSA), all models have either used a global search or built an index structure to reduce search time. On the other hand, models presented in [31] utilize local search but suffer from limitations while processing large graphs. The local search paradigm utilized in this paper presents an opportunity to perform community search while having the least search time.
Preliminaries
Table 1 describes the notions that are utilized within the paper.Â Given undirected and edge weighted graph G(V,Â E,Â W), where V, E, and W represent vertices, edges, and vector of weights respectively. Each entry in the vector of weights is assigned a different weight value for each edge. Each edge is denoted by e(u,Â v), its weight denoted by \(\omega (e)\), and the set of neighbors of a vertex \(\upsilon\) are denoted by nb(\(\upsilon\)), i.e., nb(\(\upsilon\)) = { u \(\in\) V: \(\exists\) e (u, \(\upsilon\)) \(\in\) E}, and degree of \(\upsilon\) is denoted by d(\(\upsilon\)) = nb(\(\upsilon\)). A triangle denoted by \(\bigtriangleup _{u\upsilon {w}}\) is a cyclic relationship between three vertices u,Â v,Â w such that \((u,v),(u,w),(v,w) \in E\). Given an induced subgraph \(H(V_H,E_H)\) from G where \(V_H \subseteq V_G\) and \(E_H \subseteq E_G\), the support of an edge \(e(u,v) \in H\) is defined as the number of triangles that edge resides in, and denoted by sup(e,Â H). The edge trussness is the edge support increased by 2.
Definition 1
(ktruss) A subgraph \(H(V_{H},E_{H},W_{H})\) is a connected ktruss iff each \(e_{H}\) has sup(e,Â H) at least k2. A subgraph H is called a maximal ktruss if there is no other subgraph \(H\prime\) contains H.
The trussness of a subgraph H denoted as \(\tau (H)\) is the minimum support of the all edges in subgraph H incremented by 2, e.g. \(\tau\)(H) = min{sup(e, H): e \(\in\) \(\hbox {E}_{\mathrm{H}}\}+2\).
In this paper weighted graph is considered where the weight of the subgraph H is defined as the minimum weight of the set of edges in subgraph H.
Definition 2
Subgraph Weight: The weight of subgraph H denoted by f(H) is the minimum weight of the edgesweights in H, e.g. \(f(H) = {min}_{\mathrm{e}\in E_{H}}\{\omega ({e})\}\). The edge with minimum weight in subgraph H is called the keyedge of H.
The rational behind the minimum weight is that each edge in the subgraph H has at least this minimum weight as discussed in [3]. In addition, minimum weight would be robust to outliers than average weight.
Based on the previous definitions of ktruss and subgraph weight, the weighted ktruss community is defined as follow.
Definition 3
Weighted ktruss Community: Given undirected and edge weighted graph \(G=(V, E, W)\), and trussness level k, a subgraph H \(\subset G\) is weighted ktruss community satisfies the following constraints:

Connectivity H is a connected subgraph

ktruss The minimum sup(e,H) is at least k2.

Maximal There is no other subgraph \(H^\prime\) contains H and the \(f(H^\prime )\)= f(H).
By applying the three conditions of weighted ktruss community while extracting resulting communities, the output communities are guaranteed to be ktruss and not a subset from other weighted ktruss community with the same weight.
Example 1
Consider the graph in Fig. 2. Suppose for instance k = 4, as clarified in Definition 3 the original graph is a weighted 3truss community with minimum weight of value = 1. In addition, two weighted communities with higher weights reside in the original graph; the top1 4truss subgraph shown in Fig. 2 highlighted by a red rectangle with weight value 37 of the edge \(e(v_1,v_3)\). The highlighted subgraph by green rectangle shown in the same Fig. 2 is the top2 4truss community with weight 35 of the edge \(e(v_2,v_4)\). The subgraph induced by the set of edges {(\(v_1,v_2),(v_2,v_3),(v_1,v_3),(v_3,v_4),(v_1,v_4),(v_2,v_4)\)} also has weight 35; however it is not weighted \(4truss\) community sine it is already contained in the subgraph highlighted by green rectangle with the same weight 35.
Problem Definition G = (V, E, W) is an undirected and edgeweighted graph where r and k are the two query parameters. The problem is defined as the task to discover the topr weighted ktruss communities from G = (V, E, W)
Example 2
Consider the example illustrated in Fig. 2, suppose k = 4 and r = 2, the top2 weighted 4truss communities are highlighted in red and green rectangles in Fig. 2. The top1 4truss community is the one highlighted by red rectangle with weight 37 where each edge in the community resides in two triangles. The top2 4truss community shown in the same Figure highlighted by green rectangle with weight value 35. The top2 4truss community contains the top1 community but it has smaller weight than the top1 community.
Proposed algorithms
This section discusses the proposed algorithms to discover topr weighted ktruss communities. ktruss community detection model is used to measure the cohesiveness of the resulting communities. Since ktruss is defined based on the concept of triangle; ktruss model main advantage is related to its ability to ensures a high level of cohesiveness. In addition, a community with certain ktruss is also a community with (k1)core on the same time but not vice versa which guarantees the high cohesiveness level of ktruss. ktruss community is a (k1)core community since it is (k1)edge connected and any deletion of no fewer than k1 edges will not disconnect ktruss. Also, ktruss is a diameter bounded algorithm where a subgraph of n vertices has a diameter no more than \([2n2/k]\). All these properties are indicators for the cohesivness of the resulting communities from the the ktruss model [32]. For self completeness of this paper truss decomposition algorithm introduced in [2] is outlined in Algorithm 1.
A local search procedure is used in the proposed algorithms. Mainly a local vicinity of the edge e with highest weight w(e) is built from the edges with weights higher then w(e) and checked whether it is a connected ktruss component or not. Suppose that two vertices u, v, and an edge e(u,Â v) is considered as the edge being checked, a local vicinity of this edge is built from the set of edges between common neighbors of u and v and e(u,Â v) itself. A local vicinity is defined as follow:
Definition 4
local vicinity H\(_{e(u,v)}\) is the set of edges E\({_{S}}_{(u,v)} \leftrightarrow {u,v}\) where each edge E\(_{i}\) in E is defined as \((v_1,v_2): v_1 \in S_{u,v}, v_2 \in {u,v}\) and \(S_{(u,v)}= nb(u) \cap nb(v)\) is the common neighbors of u,Â v.
Local Vicinity H of the edge is checked whether it has connected ktruss component iff sup(e,H) at least k2, \(nb(u) \cap nb(v)\)=k and degree of each vertex in \(S_{(u,v)}\) at least k1 which is formalized in 1.
Property 1
A ktruss connected component should have at least k vertices with degree at least k1.
This paper argues that the existence of a \(ktruss\) component in the local subgraph of edge vicinity would prove the existence of at least one global community that would be discovered by decomposing the larger subgraph containing the local graph. The existence of community in the local subgraph is used as an evidence that the edges processed so far has a community within them. Thus, In the first step; the edges with the highest weights are processed until the required number of communities is discovered through local communities subgraphs. Afterwards, global communities can be discovered by decomposing the subgraph containing only the edges that was processed during the first step. Such procedure would allow the proposed algorithms to discover the top weighted communities while processing a small subset of the edges with the highest weights rather than the whole graph edges. Formally, we define the paper argument in the following property followed by an example to illustrate the idea of local communities versus global communities.
Property 2
Given a local vicinity subgraph \(H \subseteq G\), if there is a ktruss \(H^\prime \in H\) then there is ktruss component \(H^{\prime \prime } \in\) G and \(H^\prime \subseteq H^{\prime \prime }\).
Example 3
Consider the graph G shown in Fig. 3a contains set of the highest weight edges where \(e(v_1,v_3)\) is the one with the least weight of 27 and considered for local vicinity subgraph checking. Property 1 is satisfied within the local vicinity of edge \(e(v_1,v_3)\) and the local subgraph is built as shown in Fig. 3b where the goal is to extract top weighted 4truss communities. However, the local subgraph has no 4truss community. Then, the next edge \(e(v_2,v_4)\) with weight 25 is considered for local vicinity subgraph checking . Property 1 is satisfied within the local vicinity of of edge \(e(v_2,v_4)\) and the local subgraph is built as shown in Fig. 3d. The local subgraph is found to have a local 4truss community. Thus, the local exploration of the edges has led to the discovery of one local community. However, the graph in its current form as shown in Fig. 3c has two top weighted global communities that can be discovered only through ktruss decomposition of graph 3c. The two top weighted global communities are themselves the two graphs in Fig. 3a and Fig. 3c with weights 27 and 25. The local graph in Fig. 3b didnâ€™t have a local community where the graph in Fig. 3a that the local subgraph was extracted from has a global community. This example shows that the local exploration of the edges local vicinity can provide an evidence for communities existence rather than truly discovering them. In addition, the existence of one local community is an evidence of the existence of at least one global community where multiple global communities can exist in the original graph. Thus, it is a necessity to analyze the graph that local graphs are extracted from in order to discover the set of global communities.
This paper proposes three algorithms which utilize property 1 and property 2 to discover topr weighted ktruss communities; LOCAL kTRUSS ALGORITHM (LKA), DEGREEBASED LOCAL kTRUSS ALGORITHM (DBLKA), MULTIPLE CANDIDATE LOCAL kTRUSS ALGORITHM (MCLKA) where each of them is explained in the following subsections.
Local ktruss algorithm
This algorithm consists of three main steps, all three main steps work together to eventually define a subgraph y that contains the required topr weighted communities. Through this way, only a small subgraph y can be used to discover the communities rather than the original graph G. The three main steps are listed below and explained in more details:

Build temp graph.

Temp graph verification.

Enumerating the required topr weighted ktruss communities.
Local ktruss algorithm takes as an input the weighted graph G along with k, r; The weighted graph G edges are presorted in descending order and stored on disk. The algorithm starts by processing the edges with the highest weight in sequence. The processed set of edges are added into subgraph Y. With each edge being processed, the effect of this edge is checked according to property 1. If the edge e(u,Â v) has k2 common neighbors i.e. \(S_{u,v} \ge k2\) and each with degree at least \(k1\) in subgraph Y, a ktruss component is suspected. Then, a temp graph X is constructed, X= \(E_{S_{(u,v)}} \leftrightarrow {\{u,v\}}\) where the set of vertices are the two vertices u,Â v in addition to the set of vertices of \(S_{(u,v)}\)=\(nb(u) \cap nb(v)\) that reside in subgraph Y. Finally, edges between \(S_{(u,v)}\) in Y are added to temp graph X.
The next stepâ€”Temp graph verificationâ€”is to check if the temp graph X has a ktruss component. The algorithm impose count triangle procedure in order to remove edges with support less than k2. Then, detect connected component procedure is run to check whether there is a ktruss connected component in X or not as outlined in Algorithm 1. Once a community in X is discovered, the number of verified communities is incremented by 1. The decomposition of X is expected to be fast in terms of time as the decomposition is done only on a small number of edges in subgraph X.
Finally, once the number of verified local communities reach the required number of the ktruss weighted communities r, the global communities should be discovered from subgraph y as realization of property 2. The enumeration procedure firstly decomposes the subgraph y that contains rcommunities and iteratively removes edges with the minimum weight that represent the keyedges of the community. Before each keyedge removal, a community with the weight of the keyedge is retrieved and stored. The removal of the edges continues till graph y is no longer connected. Then, top r weighted communities are retrieved and considered as the output. The pesudo code of LOCAL kTRUSS ALGORITHM (LKA) is outlined in Algorithm 2.
Complexity analysis Local ktruss Algorithm complexity can be described as the decomposition cost of each candidate subgraph in addition to the decomposition cost of the subgraph containing the global solutions. Given the number of candidates subgraphs as \(\alpha\), the edges in each candidate subgraph X as \(m_{x}\) and the decomposition cost of edges as the number of edges to the power of 1.5 as mentioned in [33], the complexity of the first phase of the algorithm (from line 2 to line 10 in Algorithm 2) can be formally defined as \(O(\alpha * m_{x}^{1.5})\). Similarly, the decomposition of the subgraph Y containing the global solutions (line 12 in algorithm 2) would be \(O({m_{y}}^{1.5})\). The total complexity of the Local ktruss Algorithm would be \(O(\alpha * m_{x}^{1.5}) + O(m_{y}^{1.5})\).
Example ï»¿4
This example shows a detailed explanation for local ktruss algorithm where \(k=4\) and \(r=2\) are considered as the search parameters. The subgraph Y contains the highest weight edges processed so far from graph G in Fig. 2, upon processing edge \(e(v_1,v_3)\), property 1 is satisfied where \(e(v_1,v_3)\) has four vertices \(\{v_2,v_4,v_5,v_9\}\) as a common neighbours with degree 3. Then, temp graph X for \(e(v_1,v_3)\) is built as shown in Fig. 4b. Afterwards, decompose ktruss procedure is run over \(X_{(v1,v3)}\) where the subgraph highlighted by green circle will be removed as the support of all its edges is equal to 1. The subgraph highlighted by red rectangle is a 4truss community. Figure 4c shows the same subgraph Y where the edge \(e(v_2,v_4)\) is added and checked according to property 1 the temp graph \(X_{(v2,v4)}\) is built as shown in Fig. 4d andÂ a \(4truss\) is found. Thus, subgraph Y with edge \(e(v_2,v_4)\) has at least two global 4truss communities according to property 2 as two local communities are discovered. The top2 4truss communities are highlighted in Fig. 4e, the top1 4truss community is highlighted by7 red rectangle and top2 4truss community is highlighted by green rectangle.
Degree based local kTruss Algorithm (DBLKA)
In this subsection DBLKA is proposed to extract local communities more efficiently. It follows the same steps similar to LKA except for the verification step which introduces kcore filtration as an extra step during the verification process. The three main steps are listed below and explained in more details:

Build temp graph.

Temp graph verification.

Enumerating the required topr weighted ktruss communities.
All these three steps work together to mainly extract topr weighted communities. The topr weighted ktruss communities are extracted from a subgragh Y rather than original graph G. The DBLKA takes as an input original graph G, k, and r where the set of edges are presorted in descending order in terms of edge weights. The set of edges are processed iteratively starting from the edge with the highest weight. It starts with building the temp graph X following the same steps as explained in the first step in section "Local ktruss algorithm". After the temp graph X is built a verification step is performed. The main idea of the verification step here is based on the fact that any \(ktruss\) component must be \((k1)core\) component. Therefore, the temp graph X is checked if it is \((k1)core\) before performing ktruss decomposition. Ktruss decomposition is performed only if temp graph X is found to be \((k1)core\). Otherwise, the edge being processed is ignored and next edge with the highest weight is considered for processing.
Temp graph X is checked if it is \((k1)core\) by removing any vertex with degree less than \(k1\). Thus, the remaining vertices if any represents a \((k1)core\) component. If a \((k1)core\) component is found, its vertices and edges are checked against two different cases; First, if the number of the remaining vertices are greater than k then the remaining vertices and their edges should be processed to check if they form a \(ktruss\) component or not. Second, if the number of remaining vertices are equal to k vertices with degrees equal to \(k1\) then the remaining vertices with their edges is a \(kclique\) component and consequently a \(ktruss\) component.
Finally the third step is performed to extract the \(topr\) weighted \(ktruss\) communities from the subgraph Y following the same steps as explained in the third step in section "Local ktruss algorithm". The pesudo code of Degree Based LOCAL kTRUSS ALGORITHM (DBLKA) is outlined in Algorithm 3 and a detailed explanation for the algorithm is showed in Example 5.
Complexity analysis DBLKA complexity can be described as the decomposition cost of each candidate subgraph in addition to the decomposition cost of the subgraph containing the global solutions. Given the number of degreebased filtered candidates subgraphs as \(\beta\), the edges in each candidate subgraph X as \(m_{X}\) and the decomposition cost of edges as the number of edges to the power of 1.5 as mentioned in [33], the complexity of the first phase of the algorithm (from line 2 to line 19 in Algorithm 3) can be formally defined as \(O(\beta * m_{x}^{1.5})\). Similarly, the decomposition of the subgraph Y containing the global solutions (line 20 in algorithm 3) would be \(O(m_{y}^{1.5})\). The total complexity of the Degree Based Local ktruss Algorithm would be \(O(\beta * m_{x}^{1.5}) + O(m_{y}^{1.5})\). Itâ€™s noted that the number of degreebased filtered candidates subgraphs \(\beta\) is expected to be less than the number of candidates subgraphs \(\alpha\) generated by the Local ktruss Algorithm.
Example ï»¿5
Consider the graph G in Fig. 2 where \(k=4\) and \(r=2\) are considered as the search parameters. The subgraph Y contains the highest weight edges processed so far, upon processing edge \(e(v_1,v_3)\), property 1 is satisfied where \(e(v_1,v3_)\) has four vertices \(\{v_2,v_4,v_5,v_9\}\) as a common neighbours with degree 3. Then, temp graph X for \(e(v_1,v_3)\) is built as shown in Fig. 5b where the two vertices \(\{v_2,v_9\}\) are excluded while building X due to \(k1\) core filtration step. The the two vertices \(\{v_2,v_9\}\) are not a part from \(3core\) where the degree of each them is 2. The remaining component is highlighted by red in Fig. 5b is a \(4truss\) community without calling decompose ktruss procedure as \(X_{(v1,v3)}\) contains 4 vertices with degree 3 which turns it to as a clique clarified in the proposed algorithm. Figure 5c shows the same subgraph Y where the edge \(e(v_2,v_4)\) is added and checked according to property 1. Then, the temp graph \(X_{(v2,v4)}\) is built as shown in Fig. 5d and a \(4truss\) is found directly given the number of vertices and their degrees where no vertices will be removed in \(k1\) core filtration step and the remaining number of vertices with their degrees form a clique as clarified in the proposed algorithm. Thus, subgraph Y with edge \(e(v_2,v_4)\) has at least two global 4truss communities according to property 2 as two local communities are discovered. The top2 4truss communities are highlighted in Fig. 5e, the top1 4truss community is highlighted by red rectangle and top2 4truss community is highlighted by green rectangle.
Multiple candidates local ktruss algorithm(MCLKA)
In this subsection the third algorithm MCLKA is proposed where multiple temp graphs of multiple edges with the highest weight and satisfying property 1 are generated as one multiple candidates graph. The generated multiple candidates graph is verified once at a time instead of verifying each temp graph separately. In generating multiple candidates graph, edges with the highest weight are processed in sequence where the vertices and their common neighbours of each edge satisfying property 1 is added to multiple candidates graph Z. Once the number of processed edges satisfying property 1 reach r, temp graph Z is built and its vertices are connected. Then, a verification step is performed over the multiple candidates graph Z. The verification of multiple candidates at once should allow a faster processing for the edges and consequently a faster discovery for the \(topr\) weighted \(ktruss\) communities. The three main steps are listed below and explained in more details:

Generate Multiple Candidates Graph.

Multiple Candidates Graph verification.

Enumerating the required topr weighted ktruss communities.
In the first stepâ€”Generate Multiple Candidates Graphâ€”the edges are processed in sequence from the highest weight edge where the processed set of edges are added into subgraph Y. For each edge e(u,Â v) being processed, the edge is checked to find out if it satisfies property 1 or not. If the edge was found to be satisfying for property 1, the set of affected vertices with degree >= kâˆ’1; the two vertices u,Â v in addition to the set of vertices of \(S_{(u,v)}\)=\(nb(u) \cap nb(v)\) that reside in subgraph Y are added to graph Z. In addition, e(u,Â v) is considered as a keyedge for the set of the affected vertices that was added to subgraph Z and saved in keyedges list for later processing. The processing of the edges continue until the number of key edges reaches r. Once the number of keyedges reaches r, the edges between vertices in graph Z which exist in graph Y are added to graph Z.
In the second and the third stepâ€”Multiple Candidates Graph verification and Enumerating the required topr weighted ktruss communities, the subgraph Z is verified to check whether there is a connected \(ktruss\) component in it or not. The count triangle procedure is performed over all multiple candidates in subgraph Z to remove any edge with support less than \(k2\). If a \(ktruss\) component is found, then the existence of the keyedges from the keyedges list are checked in the decomposed version of subgraph Z. If r keyedges are found in decomposed version of subgraph Z, then the subgraph Y is decomposed to extract and enumerate topr weighted ktruss communities. On the other hand, if less than r keyedges existed on the \(ktruss\) component or no \(ktruss\) component was found in the decomposed version of subgraph Z, the algorithm gets back and starts to process edges with the highest weights for another cycle. The pesudo code of Multiple Candidates Local ktruss Algorithm (MCLKA) is outlined in Algorithm 4 followed by a detailed explanation in Example 6.
Complexity analysisMCLKA complexity can be described as the decomposition cost of each multiple candidate subgraph in addition to the decomposition cost of the subgraph containing the global solutions. Given the number of multiple candidate subgraphs as \(\mu\), the edges in each multiple candidate subgraph X as \(m_{X}\) and the decomposition cost of edges as the number of edges to the power of 1.5 as mentioned in [33], the complexity of the first phase of the algorithm (from line 2 to line 21 in Algorithm 4) can be formally defined as \(O(\mu *m_{x}^{1.5})\). Similarly, the decomposition of the subgraph Y containing the global solutions (line 22 in Algorithm 4) would be \(O(m_{y}^{1.5})\). The total complexity of Multiple Candidates Local kTruss Algorithm would be \(O(\mu *m_{x}^{1.5})+ O(m_{y}^{1.5})\). Itâ€™s noted that the number of multiple candidate subgraphs \(\mu\) is expected to be less than the number of degreebased filtered candidates subgraphs \(\beta\) generated by Degree Based Local ktruss Algorithm.
Example ï»¿6
Consider the graph G in Fig. 2 where \(k=4\) and \(r=2\) are the search parameters. The subgraph Y contains the highest weight edges processed so far from graph G, upon processing edge \(e(v_1,v_3)\), property 1 is satisfied where \(e(v_1,v_3)\) has four vertices \(\{v_2,v_4,v_5,v_9\}\) as a common neighbours with degree 3. Then, the list of affected vertices of \(e(v_1,v_3)\) are added to the temp graph Z as shown in Fig. 6b where the two vertices \(\{v_2,v_9\}\) are excluded due to \(k1\) core filtration step. The the two vertices \(\{v_2,v_9\}\) are not a part from \(3core\) where the degree of each them is 2. Figure 6c shows the same subgraph Y where the edge \(e(v_2,v_4)\) is added and found to be satisfying to property 1. Accordingly, the temp graph Z is updated as shown in Fig. 6d by adding the vertex \(v_2\) and the list of keyedges is updated by adding the current one \(e(v_2,v_4)\). The temp graph Z is then updated by adding the edges between the set of vertices in graph Z as shown in Fig. 6e. The edges between vertices are extracted from graph y in Fig. 6c. The temp graph Z is decomposed where the decomposed version is found to contain the two keyedges. The existence of two keyedges in the decomposed version of Z is an evidence to decompose the subgraph Y to extract the required top2 4truss communities. The top2 4truss communities are highlighted in Fig. 6f, the top1 4truss community is highlighted by red rectangle and top2 4truss community is highlighted by green rectangle.
Performance evaluation
In this section the proposed algorithms are evaluated to find out their performance in terms of execution time and prove their efficiency against the stateoftheart algorithms. The execution time was considered as evaluation metric since it was used while evaluating similar community search models in the literature [4, 6, 8, 10, 31]. The proposed algorithms are evaluated against four different algorithms the BFSbased Online Search Algorithm proposed in [10] ,the LocalSearchP Algorithm proposed in [12] and both algorithms Backward Algorithm and Weight Sensitive Local Search Algorithm proposed in [31]. All algorithms discover the weighted communities in an online manner. The BFSbased Online Search Algorithm, backward Algorithm, and Weight Sensitive Local Search algorithm are designed to find weighted ktruss communities. The LocalSearchP Algorithm utilizes the concept of local community search to discover the topk influential communities where kcore and vertexweight are considered. In order to ensure fairness and completeness of the experiment, LocalSearchP Algorithm is implemented following the same strategy but with ktruss model and weighted edges instead of kcore model and vertexweight. All algorithms were explained in details in related work section. All experiments are conducted on seven public datasets as shown in Table 2 and are run over Python environment. In addition, all experiments are conducted on a machine with an Intel i5 2.5GHz CPU and 8 GB main memory.
Datasets
The proposed algorithms are evaluated using seven datasets shown in Table 2 and availed in [34]. The datasets are in different sizes which range from small to large size. As shown in Table 2, each dataset has set of parameters to identify its size where V represents the number of vertices, E represents the number of the edges, and \(k_{max}\) represents the max \(ktruss\) that can be extracted from the dataset. As the weightededge graph is considered in this paper, the weight is calculated as the common neighbors between each two vertices i.e. \(w(e(u,v))=nb(u)\cap nb(v)\). The proposed methods are not affected whether the weight is calculated or given as the weighted communities will be extracted correctly.
Experimental results
The proposed algorithms are evaluated against the BFSalgorithm, the optimal version of the LocalSearchP algorithm and the state of the art algorithms Backward algorithm and Weight Sensetive Local Search algorithm. There are two varying query parameters where the algorithms are evaluated against them. The query parameters are K and r where k represents the trussness level and varying from 5 to largest k for each dataset and r represents the number of required top communities and varying from 10 to 1000 in large datasets. Figures 7 and 9 show the evaluation results where k equals a default value 10 and r is varying while Figs. 8 and 10 show the evaluation results where r equals the default value 10 and k is varying. Generally Figs. 7 and 8 show that (MCLKA) performs better than the other algorithms. The processing time is linearly proportional to the size of the graph the algorithm visits.
With small values of both r and k, all the proposed algorithms perform the same and require the same time to discover the output communities where the algorithms usually verify a small subset of the graph. When r increases in Figs. 7 and 9, (MCLKA) shows a better performance as it has a faster verification process than the other two proposed algorithms (LKA) and (DBLKA). Following the same behavior in Figs. 8 and 10 when k increases, (MCLKA) shows a better performance as it verifies a set of local candidates altogether at once rather than individual candidates verification. Verifying as set of local candidate saves the times required to do the verification steps for each candidate individually. Consequently, it has the ability to discover local communities with higher k efficiently than other algorithms.
As shown in Figs. 7 and 8, the BFS algorithm takes constant time for different r as it discovers all the possible communities in the whole graph and then outputs the top ones. The BFS algorithm couldnâ€™t run for the large graphs WikiTalk, LiveJournal, Skitter, and Orkut dataset due to the large size of the graphs and the inability to decompose such graphs in main memory. These graphs donâ€™t fit in a small main memory which is used during performance evaluation.
On the other hand, the localsearchp algorithm performs in a different way where the search time increases in leaps as the search time is constant while discovering a number of communities before leaping in time to discover the next set of communities. For example, the localsearchp algorithm examines a subset of graph that contains 100 communities while trying to discover only the top 10 weighted communities. Then, when the algorithm try to discover the top 20 weighted communities, it will also examine the same subset of the graph containing 100 communities. Consequently, the search time will be constant until the number of required communities is greater than 100. By then, the search time will have a leap as it will examine a bigger subset of the graph. The linear paradigm is better than leaps especially in large k and r.
Figures 9 and 10 indicate that MCLKA performs better than Backward and Weight Sensitive Local Search algorithms. MCLKA outperformance is related to its discovery method as it processes a small portion of graph that has the required rcommunities. During this experiment, BACWARD has the worst performance as it first decompose the original graph into ktruss to prune all set of edges that donâ€™t reside in ktruss. WSLSA yeilds a close performance to MCLKA at different values of trussness level with constant number of communities to retrieve of 10 communities. However, WSLSA fails at high levels of trussness unlike MCLKA and other local search based algorithms. Mostly, BACWARD and WSLSA algorithms failed to discover weighted communities at large values of K and large values of r unlike localsearch based algorithms which did succeed at the same case.
Conclusion
In this paper community search problem is investigated to discover topr weighted communities where the link weight is considered and the ktruss model is considered as the cohesiveness model. Three different algorithms which utilize the concept of local community search in order to discover the global community search results were proposed. The three different algorithms are LKA, DBLKA, and MCLKA. LKA where these algorithms consist of three main steps. LKA imposes a traditional procedure for ktruss model that count triangles and decompose local graphs in order to find ktruss local graphs before finding the global community search results. DBLKA imposes a more filtration step over the local graph to remove vertices which wont ever belong to ktruss community and consequently find ktruss local graphs in a faster way. MCLKA does a verification step over rgenerated local graphs at once rather than considering each of them on its own. All these algorithms are evaluated against the (BFA), LocalSearchp, Backward and Weight Sensetive Local Search algorithms . Experimental results showed that (MCLKA) is the superior algorithm in terms of execution time against all other algorithms. One of the main challenges to consider as future work is to ensure that the difference between the number of discovered candidates and the number of truly existing communities will be low in order to enhance the search time. In addition, the proposed algorithms could be extended to add the vertices properties as an extra dimension to find homogeneous topweighted communities. Extending the algorithms to find top weighted communities where each edge is assigned multiple weights is another future work direction.
Availability of data and materials
The datasets used in the experiments are available online and referenced in the paper.
References
Cohen J. Trusses: Cohesive subgraphs for social network analysis. Natl Secur Agency Tech Rep. 2008;16:3â€“1.
Wang J, Cheng J. Truss decomposition in massive networks. Proc VLDB Endow. 2012;5(9):812â€“23.
Li RH, Qin L, Yu JX, Mao R. Influential community search in large networks. Proc VLDB Endow. 2015;8(5):509â€“20.
Cui W, Xiao Y, Wang H, Wang W. Local Search of Communities in Large Graphs. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. SIGMOD â€™14. New York, NY, USA: ACM; 2014. p. 991â€“1002. http://doi.acm.org/10.1145/2588555.2612179.
Huang X, Cheng H, Qin L, Tian W, Yu JX. Querying ktruss community in large and dynamic graphs. In: Proceedings of the 2014 ACM SIGMOD international conference on Management of data. ACM; 2014. p. 1311â€“1322.
Akbas E, Zhao P. Trussbased community search: a trussequivalence based indexing approach. Proc VLDB Endow. 2017;10(11):1298â€“309.
Huang X, Lakshmanan LV. Attributedriven community search. Proc VLDB Endow. 2017;10(9):949â€“60.
Chen S, Wei R, Popova D, Thomo A. Efficient computation of importance based communities in webscale networks using a single machine. In: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. ACM; 2016. p. 1553â€“1562.
Fortunato S. Community detection in graphs. Phys Rep. 2010;486(3â€“5):75â€“174.
Zheng Z, Ye F, Li RH, Ling G, Jin T. Finding weighted ktruss communities in large networks. Inf Sci. 2017;417:344â€“60.
Garas A, Argyrakis P, Rozenblat C, Tomassini M, Havlin S. Worldwide spreading of economic crisis. New J Phys. 2010;12(11): 113043.
Bi F, Chang L, Lin X, Zhang W. An optimal and progressive approach to online search of topk influential communities. Proc VLDB Endow. 2018;11(9):1056â€“68.
Chang L, Li W, Qin L, Zhang W, Yang S. Fast and exact structural graph clustering. IEEE Trans Knowl Data Eng. 2017;29(2):387â€“401.
Shao J, Han Z, Yang Q, Zhou T. Community Detection Based on Distance Dynamics. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD â€™15. New York, NY, USA: ACM; 2015. p. 1075â€“1084. http://doi.acm.org/10.1145/2783258.2783301.
Yang J, Leskovec J. Defining and evaluating network communities based on groundtruth. Knowl Inf Syst. 2015;42(1):181â€“213.
Huang X, Lu W, Lakshmanan LV. Truss decomposition of probabilistic graphs: Semantics and algorithms. In: Proceedings of the 2016 International Conference on Management of Data; 2016. p. 77â€“90.
Cheng J, Ke Y, Fu AWC, Yu JX, Zhu L. Finding maximal cliques in massive networks. ACM Trans Database Syst. 2011;36(4):21.
Cheng J, Zhu L, Ke Y, Chu S. Fast algorithms for maximal clique enumeration with limited memory. In: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM; 2012. p. 1240â€“1248.
Tsourakakis C, Bonchi F, Gionis A, Gullo F, Tsiarli M. Denser than the densest subgraph: extracting optimal quasicliques with quality guarantees. In: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM; 2013. p. 104â€“112.
Cheng J, Ke Y, Chu S, Ã–zsu MT. Efficient core decomposition in massive networks. In: 2011 IEEE 27th International Conference on Data Engineering. IEEE; 2011. p. 51â€“62.
Khaouid W, Barsky M, Srinivasan V, Thomo A. Kcore decomposition of large networks on a single PC. Proc VLDB Endow. 2015;9(1):13â€“23.
Charikar M. Greedy approximation algorithms for finding dense components in a graph. In: International Workshop on Approximation Algorithms for Combinatorial Optimization. Springer; 2000. p. 84â€“95.
Goldberg AV. Finding a maximum density subgraph. In: Tech. Report No. UCB CSD 84/171. Computer Science Division (EECS), University of California, Berkeley, CA, 1984.
Chang L, Yu JX, Qin L, Lin X, Liu C, Liang W. Efficiently computing kedge connected components via graph decomposition. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. ACM; 2013. p. 205â€“216.
Zhou R, Liu C, Yu JX, Liang W, Chen B, Li J. Finding maximal kedgeconnected subgraphs from a large graph. In: Proceedings of the 15th International Conference on Extending Database Technology. ACM; 2012. p. 480â€“491.
Alemi M, Haghighi H. KTMiner: distributed ktruss detection in big graphs. Inf Syst. 2019;83:195â€“216.
Sozio M, Gionis A. The communitysearch problem and how to plan a successful cocktail party. In: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM; 2010. p. 939â€“948.
Cui W, Xiao Y, Wang H, Lu Y, Wang W. Online search of overlapping communities. In: Proceedings of the 2013 ACM SIGMOD international conference on Management of data. ACM; 2013. p. 277â€“288.
Wu Y, Jin R, Li J, Zhang X. Robust local community detection: on free rider effect and its elimination. Proc VLDB Endow. 2015;8(7):798â€“809.
Zhu Y, He J, Ye J, Qin L, Huang X, Yu JX. When Structure Meets Keywords: Cohesive Attributed Community Search. In: Proceedings of the 29th ACM International Conference on Information & Knowledge Management; 2020. p. 1913â€“1922.
Habib WM, Mokhtar HM, ElSharkawi ME. WeightBased KTruss Community Search via Edge Attachment. IEEE Access. 2020;8:148841â€“148852.
Shao Y, Chen L, Cui B. Efficient cohesive subgraphs detection in parallel. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data; 2014. p. 613â€“624.
Latapy M. Mainmemory triangle computations for very large (sparse (powerlaw)) graphs. Theor Comput Sci. 2008;407(1â€“3):458â€“73.
Acknowledgements
Not applicable.
Funding
Not applicable.
Author information
Authors and Affiliations
Contributions
Authors contributed equal share in this research. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Habib, W.M.A., Mokhtar, H.M.O. & ElSharkawi, M.E. Discovering topweighted ktruss communities in large graphs. J Big Data 9, 36 (2022). https://doi.org/10.1186/s40537022005881
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s40537022005881