Structural and functional analytics for community detection in largescale complex networks
 Pravin Chopade†^{1}Email authorView ORCID ID profile and
 Justin Zhan†^{2}
DOI: 10.1186/s405370150019y
© Chopade and Zhan; licensee Springer. 2015
Received: 8 May 2015
Accepted: 12 June 2015
Published: 8 July 2015
Abstract
Community structure is thought to be one of the main organizing principles in most complex networks. Big data and complex networks represent an area which researchers are analyzing worldwide. Of special interest are groups of vertices within which connections are dense. In this paper we begin with discussing community dynamics and exploring complex network structural parameters. We put forward structural and functional models for analyzing complex networks under situations of perturbations. We introduce modified adjacency and modified Laplacian matrices. We further introduce network or degree centrality (weighted Laplacian centrality) based on modified Laplacian, weighted microcommunity centrality. We discuss its robustness and importance for microcommunity detection for social and technological complex networks with overlapping communities. We also introduce ’kclique subcommunity’ overlapping community detection based on degree and weighted microcommunity centrality. The proposed algorithms use optimal partition of kclique subcommunity for modularity optimization. We establish relationship between degree centrality and modularity. This proposed method with modified adjacency matrix helps us solve NPhard problem.
Keywords
Community Big data Complex network Laplacian Centrality Robustness ModularityIntroduction
The last decade has witnessed the birth of a new field of interest and research in the study of complex networks, i.e. networks whose structure is irregular, complex and dynamically evolving in time, with the main focus moving from the analysis of small networks to that of systems with thousands or millions of nodes, and with a renewed attention to the properties of networks of dynamical units. Networks are all around us, and we are ourselves, as individuals, the units of a network of social relationships of different kinds and, as biological systems, the delicate result of a network of biochemical reactions. Networks can be tangible objects in the Euclidean space, such as electric power grids, the Internet, highways or subway systems, and neural networks. Or they can be entities defined in an abstract space, such as networks of acquaintances or collaborations between individuals [1].
The network construction from general, realworld data presents several unexpected challenges owing to the data domains themselves, e.g., information extraction and preprocessing, and to the data structures used for knowledge representation and storage. The increased availability of largescale, realworld sociographic data has ushered in a new era of research and development in social network analysis. The quantity of contentbased data created every day by traditional and social media, sensors, and mobile devices provides great opportunities and unique challenges for the automatic analysis, prediction, and summarization in the era of what has been dubbed “Big Data” [2].
Centrality is one of the most studied concepts in social network analysis to characterize social power and structural influence [3]. When studying faults and fault propagation in physical networks, complex networks such as smart grid, communication, highway, traffic networks, centrality plays a somewhat different role than in social networks [4].
In this paper we discuss structural and functional analysis of complex technological and social networks. First we discuss various existing structural analysis parameters. Major contribution of this work is modified relationship between adjacency and Laplacian matrix. We use this modified relationship to define new degree centrality and new modularity. Using these new degree centrality and new modularity we are able to detect micro level overlapping community structures. We introduce network or degree centrality (weighted Laplacian centrality) based on modified Laplacian, weighted microcommunity centrality and discuss its robustness and importance for microcommunity detection for social and technological complex networks with overlapping communities. We also introduce ‘kclique subcommunity’ overlapping community detection based on degree and weighted microcommunity centrality. These new matrices and algorithms are helpful for identifying hidden level vulnerabilities. First we review various complex network structural parameters. We further put forward new community detection based on network or degree centrality. In the related work section, we review and discuss existing community detection methods and algorithms. The our approach section discusses about community dynamics, research approach and complex network structural parameters. The Methodology section discusses analysis of unweighted, weighted networks (functional analysis), where we introduce modified relationship between adjacency, degree and Laplacian matrices. Using this we define weighted Laplacian centrality, weighted microcommunity centrality and related algorithms. We also discuss and introduce algorithm for kclique subcommunity and optimal partition of k clique subcommunity for weighted modularity optimization and overlapping community detection. In the “Results and discussion” section, we analyse real world complex networks and carry out comparison of different community detection algorithms. Lastly we discuss computational complexity of our proposed algorithms and conclude the paper with major findings and future works.
Background and literature review
Community detection is a fundamental component of network analysis for sensor systems and is an enabling technology for higher level analytical applications such as behavior analysis, prediction, and identity and patternoflife analysis [2]. In both commercial industry and academia, significant progress has been made on problems related to the analysis of community structure; however, traditional work in social networks has focused on static situations (i.e., classical social network analysis) or dynamics in a largescale sense (e.g., disease propagation) [2].
Communities are of interest for a number of reasons. They have intrinsic interest because they may correspond to functional units within a networked system [5]. The aim of community detection in graphs is to identify the modules and, possibly, their hierarchical organization, by only using the information encoded in the graph topology. Community detection is important for other reasons, too. Identifying modules and their boundaries allows for a classification of vertices, according to their structural position in the modules. So, vertices with a central position in their clusters, i.e. sharing a large number of edges with the other group partners, may have an important function of control and stability within the group; vertices lying at the boundaries between modules play an important role of mediation and lead the relationships and exchanges between different communities [6]. Fortunato [6] discussed various crucial issues of community detection like the significance of clustering and its application to real networks. This paper triggered a big activity in the field, and many new methods have been proposed in the last years.
With the aim at explaining and comprehending common principles and properties in real networks, three general network models have been intensely researched: random network [7], smallworld network [8] and scalefree network [9], though these models cannot interpret all phenomena observed in real networks. Random network has binomial or Poisson degree distribution [10], so random network is rather robust since it is a homogeneous network where majority of vertices almost have the same number of edges to be connected. However, real networks do not show random distribution and properties. Smallworld is a network between a lattice and random networks. Smallworld network has smaller average path length like a random network but larger clustering coefficient like a lattice network. Rather unexpectedly, the degree distribution of smallworld network is mathematically explained by binomial distribution that is same as random network. Besides, most of real networks have the degree distribution that is power law [11] rather than Poisson distribution and these networks are called as scalefree network which is sensitive to intentional removal of vertices but robust against randomly removing vertices because the power law distribution shows it is a heterogeneous network where a larger number of vertices have larger edges to be connected and these vertices are called as hubs that play important role in connectivity of networks [12].
Centrality measures the relative importance of a node or a link in terms of the network efficiency and utilization of the network resources. Koschutzki et al. [13] discusses centrality indices based on degree considering distances and neighborhoods as well as shortest paths. Koschutzki et al. presented some of the more influential, ‘classic’ centrality indices but he did not strive for completeness and provide a catalog of basic centrality indices with some of their main applications.
Borgatti [14] claimed that centrality measures can be regarded as generating expected values for certain kinds of node outcomes (such as speed and frequency of reception) given implicit models of how traffic flows. Borgatti regarded the formulas for centrality concepts like betweenness and closeness as generating the expected values under specific unstated flow models of certain kinds of node participation in network flows. As such, they do not actually measure node participation at all but rather indicate the expected participation if things flow in the assumed way. One contribution of Borgatti’s paper is to make explicit what the assumptions behind each measure are, and then to test each measures deconstruction via simulation. Nodecentric measures are more convenient for computation and interpretation, hence more common than edgecentric measures.
The problem of community detection requires the partition of a network into communities of densely connected nodes, with the nodes belonging to different communities being only sparsely connected. Precise formulations of this optimization problem are known to be computationally intractable. Several algorithms have therefore been proposed to find reasonably good partitions in a reasonably fast way [15]. One of the proposed algorithms is by Greedy sketch method for modularity Q optimization [16]. It is an agglomerative hierarchical clustering method, where groups of vertices are successively joined to form larger communities such that modularity increases after the merging. Greedy optimization method attempts to optimize the “modularity” of a partition of the network. The optimization is performed in two steps. First, the method looks for “small” communities by optimizing modularity locally. Second, it aggregates nodes belonging to the same community and builds a new network whose nodes are the communities. These steps are repeated iteratively until a maximum of modularity is attained and a hierarchy of communities is produced.
By assumption, high values of modularity Q indicate good partitions. So, the partition corresponding to its maximum value on a given graph should be the best or at least a very good one. This is the main motivation for modularity maximization, by far the most popular class of methods to detect communities in graphs. An exhaustive optimization of Q is impossible, due to the huge number of ways in which it is possible to partition a graph, even when the latter is small. Besides, the true maximum is out of reach, as it has been recently proved that modularity optimization is an NPcomplete problem [17], so it is probably impossible to find the solution in a time growing polynomially with the size of the graph. However, there are currently several algorithms able to find fairly good approximations of the modularity maximum in a reasonable time [6].
Integer linear programming algorithms solve the modularity maximization problem for small graphs [16, 18]. Brandes et al. [18] have given an integer linear programming formulation for modularity clustering and established that the formal problem is – in the worst case – NPhard.
Gregori et al. [19] presented a novel, parallel kclique community detection method, based on an innovative technique which enables connected components of a network to be obtained from those of its subnetworks. The novel method has an unbounded, userconfigurable, and inputindependent maximum degree of parallelism, and hence is able to make full use of computational resources. Chen et al. [20] introduce two novel finetuned community detection algorithms that iteratively attempt to improve the community quality measurements by splitting and merging the given network community structure but they did not consider optimal number of clusters or subnetwork or concept of modularity for community detection.
Considering the importance of the community detection problem this work aim to identify hidden layer microcommunity, overlapping communities and related functional dynamics by using concept of modified adjacency and modified Laplacian matrices.
Research design and methodology
Research design
Complex network structural parameters
Structural parameters are the tools of Complex Network Analysis which are of useful to understand salient properties of complex systems. Some of the important local and global structural parameters are discussed below
Node degree distributions, correlations and assortativity
The degree distribution, usually denoted by P(k), is the probability that a vertex chosen uniformly at random has degree k, or equivalently, the fraction of vertices in the network with degree k. In many real networks it has been found that the degree distribution follows a powerlaw, i.e. P(k)^{∼} k ^{−α }, where α is the scaling coefficient, it is typically between 1 and 3 [21]. A large number of real networks are correlated in the sense that the probability that a node of degree k is connected to another node of degree, say \(k^{^{\prime }}\) depends on k. The degree correlations are formally characterized by \( P(k^{^{\prime }}k).\) Some networks (including the Internet and the World Wide Web) have degree distributions in the form of a power law: that is, the probability that a node has degree k is given as P(k)^{∼} k ^{−α } [22]. Assortativity is the correlation between the degrees of connected nodes. Positive assortativity indicates that highdegree nodes tend to connect to each other.
Shortest path lengths or characteristics path length
Average path length is the distance between two vertices is defined as the number of edges along the shortest path connecting them. Many complex networks, despite their oftenlarge size, have a relatively short average path length between any two vertices.
where \(N_{c_{1}}\) is the number of resource nodes in the subcommunity 1, and \(N_{c_{2}}\) is the number of nodes in the subcommunity 2.
When two nodes are not connected at all, or become disconnected due to attacks, their shortest path length d _{ ij } becomes infinite, and then \(\frac {1}{d_{\textit {ij}}}\) is zero. If \(X\left (G_{c_{1}}\cap G_{c_{2}}\right)\) is large, it is indicated that the network is well connected and has high efficiency [24].
Local and global clustering coefficient
If the nearest neighbours of a node are also directly connected to each other they form a cluster. The clustering coefficient quantifies the number of connections that exist between the nearest neighbours of a node as a proportion of the maximum number of possible connections [8]. Interactions between neighbouring nodes can also be quantified by counting the occurrence of small motifs of interconnected nodes [25]. The distribution of different motif classes in a network provides information about the types of local interactions that the network can support [26].
Here C C _{ i } is the local clustering coefficient, m _{ i } is the number of edges that exist between the neighbors of vertex i and k _{ i } is the number of neighbors for vertex i. The denominator k _{ i }(k _{ i }−1)/2 is the maximum possible number of edges that can exist between the neighbors of vertex i.
The global clustering coefficient CC is the ratio of the number of triangles in a network versus the number of paths of length 2. This ratio is typically high in social networks, whose generative processes tend to close triangles. In contrast, the clustering coefficient is close to 0 for random graphs.
N represents the number of vertices or the number of nodes in the network. A general problem of network measures, such as the clustering coefficient, is whether sampling or perturbations change the values of these measures. Network measures are frequently used for the classification of different networks [27] or of topological changes (addition or deletion of nodes or edges) within the same network.
Network centrality and robustness
Centrality measures are used in network science to rank the relative importance of vertices and edges in a graph. Within graph theory and network analysis, there are various measures of the centrality of a vertex or an edge. Centrality indices are quantifications of the fact that some nodes/edges are more central or more important in a network than others [28]. Our algorithm uses the network centrality known as degree centrality to find overlapping community structure.
Degree centrality
Where 2n _{ E } is used as a normalization factor. In order to make better comparisons between graphs of different sizes the degree is standardized by dividing by 2n _{ E }, the maximum possible degree of any node.
Modularity
Network density or cost
Network or Connection density is the actual number of edges in the graph as a proportion of the total number of possible edges and is the simplest estimator of the physical cost, for example, the energy or other resource requirements, of a network.
We use above discussed complex network structural parameters for supervised community detection. Research methodology and algorithms are discussed in next section.
Methodology
As briefly discussed in the Related Work section, our research process and methodology consists of structural and functional analysis. To account for structural analysis we already discussed various complex network structural parameters in the Our Approach section.
Analysis of unweighted network (structural analysis)
Here A _{ ij }=k if there are k parallel edges from i to j. Moreover, D _{ i }≥0,D _{ ii }= number of edges connected to node i. Note that the diagonal elements of the Laplacian are assumed to be positive.
Eigenvalues of matrices in a graph, especially the adjacency matrix, the Laplacian matrix and the normalized Laplacian matrix reflect structural properties about the graph. For instance, adjacency matrix is useful for counting paths of certain length in a graph, number of spanning trees and connected components can be determined from the Laplacian, and the normalized Laplacian enables recognition of connected components and bipartite structures [32].
Analysis of weighted networks (functional analysis)

The adjacency matrix A is real, symmetric, and zero on the diagonal, with entries being either 0 or 1. Since the trace is zero, then some of the eigenvalues must be positive and others must be negative, and hence this matrix is not signdefinite. It is obtained from the Laplacian matrix after zeroing its diagonal elements.

The Laplacian matrix L is real symmetric and the sum of each row is zero. The diagonal elements are nonnegative, and the offdiagonal elements are nonpositive, either 0 or 1.

The degree matrix D is a matrix with diagonal elements equalling either 0 or 1.
If parallel links are allowed between nodes, then nonzero entries can have integer values higher than 1 but of the same sign. If self loops are allowed, the adjacency matrix can have nonnegative integer diagonal elements. In any case, the matrices are related by Eq. 12, L=D−A.

The pseudoadjacency matrix \(\tilde {A}\) is real, symmetric, and zero on the diagonal, with nonnegative entries. Since the trace is zero, then some of the eigenvalues must positive and others must be negative, and hence this matrix is not signdefinite.

The pseudoLaplacian matrix \(\tilde {L}\) is real symmetric and the sum of each row is zero. The diagonal elements are nonnegative, and the offdiagonal elements are nonpositive.

The pseudodegree matrix \(\tilde {D}\) is diagonal with nonnegative diagonal elements. The sum of the diagonal elements is twice the total susceptance of all the lines in the system.
Note however that entries need not be integers or 1, 1, 0.
where I is an N×N identity matrix (with ones on the diagonal, other elements being zero).
where Do is the diagonal matrix of outdegrees (or row sum of A) and D _{ I } is the diagonal matrix of indegrees (or column sum of A).
where \(\tilde {A}\) is weighted adjacency matrix of directed networks, \(\tilde { D}_{O}\) is the weighted diagonal matrix of outdegrees (or row sum of \(\tilde {A}\)) and \(\tilde {D}_{I}\) is the weighted diagonal matrix of indegrees (or column sum of \(\tilde {A}\)).
For incorporating functional analysis in order to consider flow or functional dynamics in the network we proposed modified relationship between adjacency and Laplacian of a graph given by Eq. 16.
This modified relationship turns modularity maximization into a spectral graph partitioning problem using the modified Laplacian matrix. A nice feature of the modified Laplacian is that, for graphs which are not too small, it can be approximated (up to constant factors) by the transition matrix \(\tilde {A}_{x}\), obtained by normalizing \(\tilde {A}\) such that the sum of the elements of each row equals one.
Weighted laplacian centrality
Using Eq. 22 we will get centrality of the functional network. We used this functional degree centrality to determine robustness of the network.
Definition 1.
(MicroCommunity). The microcommunity is a small dense group or a subgraph or isolated node that consists of one or more connected dense network or pairs with certain energy.
where \(V_{l_{1}}\) vertices of sub or dense network, \(E_{l_{1}}\) edges of sub or dense network and \({ne}_{l_{1}}\) is the energy of local sub or dense network.
For a given community network, to partition it into a certain number of smaller subcommunities or number of subsets, called clusters.
Weighted microcommunity or subcommunity centrality
Smaller subcommunities are given more weight than larger ones, which makes this measure appropriate for characterizing network motifs. The subcommunity centrality can be obtained mathematically from the spectra of the weighted adjacency matrix of the network. The subcommunity centrality of a node is a weighted sum of closed walks of different lengths in the network starting and ending at the node. This function returns a vector of subcommunity centralities for each node of the network [33, 34].
Definition 2.
(MicroCommunity Centrality).
For all methods and approaches discussed above MicroCommunity Centrality (MCC) network robustness algorithm is developed. Overall process of MCC is described in Algorithm 1. For any given largescale community network G _{ n }. First it identifies type of network i.e. Directed Unweighted (DU), Directed Weighted (DW), Undirected Unweighted (UU), Undirected Weighted (UW). As per the type of network then it calculates all required statistical parameters from adjacency A, Laplacian L and degree matrices D and similarly for weighted matrices i.e. \(\tilde {A},\tilde {L},\) and \(\tilde {D}\) etc. Then it calculates network energy, microcommunity and microcommunity clusters. With these parameters it then calculate weighted Laplacian centrality and weighted microcommunity centrality. Using algebraic connectivity it check for robustness of the network i.e. whether network is strongly connected or weakly connected.
Kclique subcommunity: degree and weighted microcommunity centrality based overlapping community algorithm
Most real networks typically contain parts in which the nodes (units) are more highly connected to each other than to the rest of the network. The sets of such nodes are usually called clusters, communities, cohesive groups, or modules [35]. Most real networks are characterized by well defined statistics of overlapping and nested communities. Such a statement can be demonstrated by the numerous communities each of us belongs to, including those related to our scientific activities or personal life (family, work, college) and so on [35], as illustrated in Fig. 3.
Definition 3.
A typical community consists of several complete (fully connected) subcommunities that tend to share many of their nodes. Thus, we define a subcommunity, or more precisely, a kcliquesub community as a union of all kcliques (complete subgraphs of size k) that can be reached from each other through a series of adjacent kcliques (where adjacency means sharing k−1 nodes).
Proposed algorithms (Algorithm 2 and 3) firstly extracts all complete weighted subcommunities of the network that are not parts of larger complete subcommunities. A maximal clique is a clique that is not a subset of any other clique in a community network [36]. These maximal complete subgraphs are simply called cliques, and the difference between kcliques and cliques is that kcliques can be subsets of larger complete subcommunities. Once the cliques are located, the cliqueclique overlap matrix is prepared [37]. In this symmetric matrix each row (and column) represents a clique and the matrix elements are equal to the number of common nodes between the corresponding two cliques, and the diagonal entries are equal to the size of the clique. The intersection of two cliques is always a complete subcommunities. The kcliquecommunities for a given value of k are equivalent to such connected clique components in which the neighbouring cliques are linked to each other by at least k−1 common nodes. Advantage of this method is that the cliqueclique overlap matrix encodes all information necessary to obtain the communities for any value of k, therefore once the cliqueclique overlap matrix is constructed, the kcliquecommunities for all possible values of k can be obtained very quickly [35]. Algorithm 2 describes the process of finding maximum ssize kcliques in the community network. It uses degree sequence for finding largest possible clique size.
For detecting overlapping communities Algorithm 3 is developed. It uses weighted adjacency matrix, weighted microcommunity centrality and maximum ssize kcliques in the community network (With Algorithm 2). First it generates the cliqueclique overlap matrix. Then extracts the kclique matrix kM from the cliqueclique overlap matrix and kclique subcommunities cc from the kclique matrix k M.
Modified weighted modularity: optimal partition of kclique subcommunity
where the network is fully subdivided into a set of nonoverlapping communities n, and e _{ ij } is the proportion of all links that connect nodes in community i with nodes in community j.
e: The N×N symmetric weighted matrix of the partition C.
e _{ ij }: The fraction of edges between clusters C _{ i } and C _{ j }.
e _{ ii }: The fraction of edges in cluster C _{ i }. (i.e. the portion of edges that connect vertices inside community C _{ i }).
where k _{ i } k _{ j } are degrees of vertex i and vertex j.
Like Laplacian matrix for any network the vector (1,1,1,…) is an eigenvector of the modularity matrix with eigenvalue zero but the eigenvalues of the modularity matrix are not necessarily all of one sign i.e. matrix has both positive and negative eigenvalues [39].
where \(\overset {\symbol {126}}{k_{i}}\) and \(\overset {\symbol {126}}{k_{j}}\) are weighted degrees.
A measure for the modified modularity is proposed to quantify the overlapping community structure referred as \({Q_{M}^{W}}\) (Weighted modified modularity). With the measure \({Q_{M}^{W}},\) the overlapping community structure can be identified by finding an optimal partition of kclique subcommunity, i.e., the one with the maximum \({Q_{M}^{W}}.\) The \({Q_{M}^{W}}\) is based on a maximal clique view of the original network. A maximal clique is a clique (i.e. a complete subgraph) which is not a subset of any other clique in a network. The maximal clique view is according to a reasonable assumption that a maximal clique cannot be shared by two communities due to that it is highly connective. To find an optimal partition, we construct a maximal clique network from the original network. We then prove that the optimization of \({Q_{M}^{W}}\) on the original network is equivalent to the optimization of the modularity on the maximal clique network. Thus the overlapping community structure can be identified through partitioning the maximal clique network with an efficient modularity optimization Algorithm [40].
where C _{ i },C _{ j } are the number of overlapping communities to which node i and node j belongs. High value of \(\overset {\rightarrow }{Q}_{{M}\max }^{W}\) indicates a significant overlapping community structure.
In our implemented Algorithm 4 given below we used Fast Newman Greedy algorithm for modularity optimization [41] with modified functional parameters. In order to efficiently detect community structure using complex network structural and functional parameters listed above we developed an Algorithm 5 for modified modularity for overlapping community detection.
It measures modularity variation for each candidate partition where pair of clusters are merged. It merges the pair of clusters by maximizing modularity Q using Algorithm 4. So for each formed clusters it splits community and then updates corresponding Q. For each subcommunity then it measures subcommunity energy ne, microcommunity centrality using overlapping community detection Algorithm 3. Then it selects subcommunity with highest Q and highest ne to find kcliques subcommunity network to form microcommunity clusters μ c c. These microcommunity cluster formation continues till value of Q is 0 i.e. leading eigenvalue is zero which means that subgraph is indivisible. Overall process of modified modularity for overlapping community is described in Algorithm 5.
This modified modularity for overlapping community algorithm has several advantages. First, its steps are intuitive and easy to implement. Moreover, the algorithm is extremely fast, i.e., network simulations on largescale adhoc modular networks found that its complexity is linear on typical and sparse data. Experimental evaluation of these algorithms for complex technological networks and social networks are discussed in next Section result and discussion on analysis of realworld largescale complex networks.
Results and discussion
Analysis of realworld largescale complex networks
Analysis of complex social and technological networks
Networks →  PhD’s CS  SciMet  U.S. Power Grid  

Analysis Parameters ↓  
Type of Network  Directed  Directed Weighted  Directed multigraph  Undirected 
V  1882  1899  3084  4941 
E  1740  20296  10399  6594 
Avg k  40.913  5.6962  16.6402  260.0526 
C C _{ global }  0.0051  0.1107  0.1703  0.0801 
ac  20.2106  115.9189  77.3748  15.0674 
ne  34.18  109.3126  96.1957  35.5106 
c  189  512  391  35 
cc (oc)  4  353  650  307 
Comparison of different community detection algorithms
Modularity comparison
Algorithms →  FN  DGA  FD  MSTAB  MMOC  

Networks ↓  Size ↓  Q FN  Q DGA  Q FD  Q MSTAB  (Q Our method) 
PhD’s in CS  1882  0.9610  0.9610  0.9295  0.9601  0.9755 
1899  0.2717  0.2567  0.3751  0.3742  0.3860  
SciMet  3084  0.5469  0.5949  0.6146  0.6146  0.6502 
US Power Grid  4941  0.9341  0.9358  0.9347  0.9348  0.9587 
Modularity maximization achieved with MMOC algorithm helps for detection of dense, hidden micro level communities. These results clearly indicate the importance of modularity maximization even though it is NPcomplete problem.
From these results it showed that community centrality appears to have relation with vertices that are central in their local communities. The centrality is correlated with degree, for few overlapping communities they are not perfectly correlated and in particular some vertices have quite high centrality while having relatively low degree. High centrality is an indicator of individuals who have more connections than expected within their neighborhood and hence potentially make a large contribution to the modularity, rather than simply having a lot of connections.
Computational complexity
The determination of the full set of cliques of a network is widely believed to be nonpolynomial problem. In spite of this, proposed algorithm proves to be very efficient when applied to the graphs of the investigated real systems. Our method consists of five stages, finding degree sequence, microcommunity centrality, finding out the maximal cliques, constructing the maximal clique network and overlapping community network matrix and partitioning the maximal clique network based on the modularity maximization and then finding overlapping communities.
We analyze the computational complexity of MMOC and other algorithms (Algorithms 1 to 5). Finding an exact solution to a partitioning task of this kind is believed to be an NPcomplete problem, making it prohibitively difficult to solve for largescale networks, but a wide variety of heuristic algorithms have been developed that give acceptably good solutions in many cases. The first algorithm of the modern age of community detection introduced by Newman and Girvan has a complexity O(N ^{3}) on a sparse networks and other mentioned existing algorithms for detecting community structures gives qualitatively similar results. Fast implementation of Newman algorithm (Fast Newman algorithm) has worstcase running time of O((m+N)N), or O(N ^{2}) on sparse network with N nodes and m edges. Experimental evaluation on the realworld complex technological and social community networks show that MMOC algorithm achieves the best performance when compared with other existing methods discussed in Table 2. Efficient time complexity of MMOC algorithm and other algorithm is O(N logN) which is scalable in nature. For MMOC algorithm running time is consumed by maximizing modularity and forming overlapping community matrix based on subcommunity energy. Also in case of directed weighted networks running time is also consumed by computation of large eigen values. Our method is very efficient on real world networks. In our future work we will work for modifying our MMOC algorithm for better run time performance.
Conclusions
In this paper we have discussed community dynamics and reviewed complex network structural parameters. We highlighted the importance of network centrality or degree centrality and network robustness for community detection. Centrality is correlated with degree. We discussed network or degree centrality (weighted Laplacian centrality) based on modified Laplacian, weighted microcommunity centrality. We also discussed and introduced algorithm for kclique subcommunity and optimal partition of kclique subcommunity for weighted modularity optimization and overlapping community detection based on degree and weighted microcommunity centrality. These new matrices and algorithms are helpful in identifying hidden level vulnerabilities. We analyzed realworld largescale complex networks and carried out comparison of different community detection algorithms. Our results indicated certain relationship between degree centrality and modularity optimization. Network centrality and robustness will help for supervised community detection in overlapping communities. Proposed algorithms will be useful for finding communities of densely connected vertices in network data. Computational complexity of our proposed algorithms is better as compared to other existing algorithms. Scalable nature of this algorithm is valuable for analyzing more complex largescale networks.
It is also an interesting problem about the selection of the parameter k in our method. We will further investigate how to determine an appropriate k for a given network later. In our future work we will put forward functional dynamics of complex network by incorporating network centrality and weighted clustering coefficient for identifying micro level communities and their associated relationship.
Notes
Declarations
Acknowledgment
We are thankful to The United States Department of Defense (DoD) for their support and finance for this project. This work is supported by The United States Department of Defense (DoD grants #W911NF1310130).
We are also thankful to The United States National Science Foundation (NSF) and The United States National Consortium for Data Science (NCDS). This research is partially supported by the following grants: NSF No. 1137443, NSF No. 1247663, NSF No. 1238767, DoD No. W911NF1410119, and the Data Science Fellowship Award by the National Consortium for Data Science.
Authors’ Affiliations
References
 Boccaletti S, Latora V, Moreno Y, Chavez M, Hwang DU (2006) Complex networks: Structure and dynamics. ELSEVIER, Physics Reports 424: 175–308.MathSciNetView ArticleGoogle Scholar
 Campbell W, Dagli C, Weinstein C (2013) Social Network Analysis with Content and Graphs. MIT Lincoln Laboratory Journal 20: 62–81.Google Scholar
 Marsden P (2002) Egocentric and sociocentric measures of network centrality. Social Networks 24: 407–422.View ArticleGoogle Scholar
 Zio E, Piccinelli R (2010) Randomized flow model and centrality measure for electrical power transmission network analysis. Reliability engineering and System Safety 95: 379–385.View ArticleGoogle Scholar
 Newman M (January 2012) Communities, modules and largescale structure in networks. Nature Physics 8: 25–31. doi:http://dx.doi.org/10.1038/NPHYS2162.View ArticleGoogle Scholar
 Fortunato S (2010) Community detection in graphs. Physics Reports486, no no. 35: 75–174.MathSciNetView ArticleGoogle Scholar
 Bollobas B (2001) Random graphs. Cambridge university press 2: 1–496.MathSciNetGoogle Scholar
 Watts D, Strogatz S (June 4 1998) Collective dynamics of ’smallworld’ networks. Nature 393: 440–442.View ArticleGoogle Scholar
 Barabasi A (Oct 15 1999) Emergence of scaling in random networks. Science 286: 509–512.MathSciNetView ArticleGoogle Scholar
 Newman M (2003) The structure and function of complex networks. Siam Review 45: 167–256.MATHMathSciNetView ArticleGoogle Scholar
 Clauset A (2009) PowerLaw distributions in empirical data. Siam Review 51: 661–703.MATHMathSciNetView ArticleGoogle Scholar
 Crucitti P (Sep 1, 2004) Error and attack tolerance of complex networks. Physica aStatistical Mechanics and Its Applications 340: 388–394.MathSciNetView ArticleGoogle Scholar
 Koschutzki D, Lehmann K, Peeters L, Richter S, RenfeldePodehl D, Zlotowski O (2005) Centrality Indices. Network Analysis: Methodological Foundations SpringerVerlag Book Chapter: 16–61. doi:http://dx.doi.org/10.1007/97835403195593.
 Borgatti S (2005) Centrality and network flow. Social Networks 27: 55–71.View ArticleGoogle Scholar
 Blondel V, Guillaume J, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiments: P–10008. doi:10.1088/17425468/2008/10/P10008.
 GeyerSchulz A, Ovelönne M (2014) The Randomized Greedy Modularity Clustering Algorithm and the Core Groups Graph Clustering Scheme. Springer Book Chapter. ISBN: 9783319012636, eBook ISBN:9783319012643, doi:http://dx.doi.org/10.1007/978331901264310.1007/9783319012643, http://www.springer.com/9783319012636
 Brandes U, Delling D, Gaertler M, Gorke R, Hoefer M, Nikoloski Z, Wagner D (30 Aug 2006) Maximizing Modularity is hard. Cornell University Library, physics data an: 1–10. arXiv:physics/0608255v2.Google Scholar
 Brandes U, Delling D, Gaertler M, Gorke R, Hoefer M, Nikoloski Z, Wagner D (Feb 2008) On Modularity Clustering. Knowledge and Data Engineering, IEEE Transactions on20 no.2: 172–188. doi:http://dx.doi.org/10.1109/TKDE.2007.190689.View ArticleGoogle Scholar
 Gregori E, Lenzini L, Mainardi S (August 2013) Parallel kClique Community Detection on LargeScale Networks. IEEE Transactions on Parallel and Distributed Systems24, no.8: 1651–1660. doi:http://dx.doi.org/10.1109/TPDS.2012.229.View ArticleGoogle Scholar
 Chen M, Kuzmin K, Szymanski BK (March 2014) Community Detection via Maximization of Modularity and Its Variants. Computational Social Systems, IEEE Transactions on1, no.1: 46–65. doi: http://dx.doi.org/10.1109/TCSS.2014.2307458.View ArticleGoogle Scholar
 Sun K (2005) Complex Networks Theory: A New Method of Research in Power Grid. 2005 IEEE PES Transmission and Distribution Conference and Exhibition: Asia and Pacific Dalian, China: 1–6. doi:http://dx.doi.org/10.1109/TDC.2005.1547099.
 Bullmore E, Sporns O (March 2009) Complex brain networks: graph theoretical analysis of structural and functional systems. Nature Reviews Neurosci10,no.3: 186–198. doi:http://dx.doi.org/10.1038/nrn2575.View ArticleGoogle Scholar
 Chopade P, Bikdash M, Kateeb I (April 2013) Interdependency modeling for survivability of Smart Grid and SCADA network under severe emergencies, vulnerability and WMD attacks. Southeastcon, 2013 Proceedings of IEEE, ISBN: 9781479900527: 1–7. doi:http://dx.doi.org/10.1109/SECON.2013.6567510.
 Chopade P, Bikdash M (November 2013) Structural and functional vulnerability analysis for survivability of Smart Grid and SCADA network under severe emergencies and WMD attacks. Technologies for Homeland Security HST, 2013 IEEE International Conference, ISBN: 9781479939633: 99–105. doi:http://dx.doi.org/10.1109/THS.2013.6698983.
 Milo R (2002) Network motifs: simple building blocks of complex networks. Science 298: 824–827.View ArticleGoogle Scholar
 Sporns O, Kötter R (2004) Motifs in brain networks. PLoS Biol 2: 1910–1918.View ArticleGoogle Scholar
 Amaral L, Scala A, Barthelemy M, Stanley H (10 Octomber 2000) Classes of smallworld networks. Proc Natl Acad Sci USA97 no. 21: 11149–11152.View ArticleGoogle Scholar
 Chopade P (2013) Robustness and survivability of smart power grid and scada networks when subjected to severe emergencies, vulnerability and WMD attacks. Doctoral Dissertation, North Carolina Agricultural and Technical State UniversityACM,ISBN: 9781303684906: 1–194.http://dl.acm.org/citation.cfm?id=2604359.Google Scholar
 Chopade P, Bikdash M (2012) Analyzing smart power grid and SCADA network robustness using the node degree distribution and algebraic connectivity under vulnerability and WMD attacks. Homeland Security (HST), 2012 IEEE Conference on Technologies forIEEE,ISBN: 9781467327084: 365–372. doi:http://dx.doi.org/10.1109/THS.2012.6459876.View ArticleGoogle Scholar
 Chopade P, Zhan J (May 2014) Community Detection in LargeScale Big Data Networks. ASE International Conference 2014 on BIGDATA, SOCIALCOM, CYBER SECURITY, Stanford University, CA, USAASE, ISBN: 9781625610003: 1–7.http://www.ase360.org/handle/123456789/64.Google Scholar
 Biyikoglu T, Leydold J, Stadler P (2007) Laplacian Eigenvectors of Graphs. Springer PublicationsSpringer, ISBN: 9783540735090: 1–120.Google Scholar
 Baltz A, Kliemann L (2005) Spectral Analysis, in Network Analysis: Methodological Foundations. Springer Publications Verlag Berlin HeidelbergSpringer, ISSN 03029743, ISBN 3540249796: 373–416.Google Scholar
 Ernesto E, Juan R (2005) Subgraph centrality in complex networks. Physical Review E, American Physical Society 71: 056103–056103. doi:http://dx.doi.org/10.1103/PhysRevE.71.056103.View ArticleGoogle Scholar
 Estrada E, Desmond H (2010) Network Properties Revealed through Matrix Functions. Society for Industrial and Applied Mathematics SIAM REVIEW52 no. 4: 696–714.Google Scholar
 Palla G, Derenyi I, Farkas I, Vicsek T (9 June 2005) Uncovering the overlapping community structure of complex networks in nature and society. Nature 435: 814–818. doi:http://dx.doi.org/10.1038/nature03607.View ArticleGoogle Scholar
 Shen H, Cheng X, Guo J (2009) Quantifying and identifying the overlapping community structure in networks. J Stat Mech 7: P07042–P07042.Google Scholar
 Everett M, Borgatti S (1998) Analyzing clique overlap. Connections INSNA21 no. 1: 49–61.Google Scholar
 Newman M (6 June 2006) Modularity and community structure in networks. PNAS103 no. 23: 8577–8582. doi:www.pnas.org/cgi/doi/10.1073/pnas.0601602103.View ArticleGoogle Scholar
 Newman M (2013) Spectral methods for community detection and graph partitioning. Physical Review E 88: 042822–1–04282210. doi:http://dx.doi.org/10.1103/PhysRevE.88.042822.View ArticleGoogle Scholar
 Shen H (2013) Detecting the Overlapping and Hierarchical Community Structure in Networks. Springer, Community Structure of Complex NetworksXIV 120 eISBN 9783642318214: 042822–1–10. doi:http://dx.doi.org/10.1103/PhysRevE.88.042822.Google Scholar
 Newman M (2004) Fast algorithm for detecting community structure in networks. APS, Physical Review E69 np. 6: P066133–P066133.Google Scholar
 MATLAB (2015) The Mathworks Inc. USAR2015a: 0–1. http://www.mathworks.com/.Google Scholar
 iLab (2015) iLab Big Data Center. North Carolina A and T State University 1: 0–1. http://www.ilabsite.org.Google Scholar
 NERC (2015) The North American Electric Reliability Corporation. USA 1: 0–1. http://www.nerc.com/.Google Scholar
 WECC (2015) US Power Grid Data. USA 1: 0–1. http://www.wecc.biz/.Google Scholar
Copyright
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.