Vaccination allocation in large dynamic networks
 Justin Zhan†^{1}Email author,
 Timothy Rafalski†^{1},
 Gennady Stashkevich†^{2} and
 Edward Verenich†^{2}
DOI: 10.1186/s4053701600614
© The Author(s) 2017
Received: 26 October 2016
Accepted: 9 December 2016
Published: 13 January 2017
Abstract
Network infections that are already in progress cause challenges to those officers trying to preserve those nodes not yet infected. Static solutions can take advantage of global knowledge of the network to produce quick and approximate answers for those members who should be vaccinated. In dynamic situations however, small changes can severely alter those static solutions making them irrelevant. Yet in dynamic situations it can not be known with certainty which small changes will affect the solution and those that will not. Computational resources are wasted recalculating a global solution for the entire network, when a local recalculation may be enough. This paper presents a dynamic node vaccination solution that seeks to take advantage of these local recalculations.
Keywords
Dynamic networks Immunization Dominance treeBackground
Event propagation over a network is a complex and frequently studied human phenomena [1–3]. Historical examples of information exchange between humans in a social network, past and present, can be seen during colonial expansion of the British Empire [4], memes passed between friends on any assortment of social networks on the internet [1] and transmission of human or animal pathogens such as the virus H1N1 over airways [5]. In the information age computer networks can mirror these types of information exchange, with event information being passed along from node to node when network neighbors communicate through various network protocols [6].
This information can be useful, whether in the form of postal delivery of early colonial directives and parcels [4], neutral, shared irrelevant Facebook posts [7], or detrimental, such as human pathogens or computer attacks [5, 8]. In this paper we will use consistent terminology of a negative event for the purpose of visualization for the reader. Some examples may include: how to spread, as in infection, or how to prevent the spread, as in vaccination, of this negative event, or attack. Malicious attacks can take many forms, such as, malware, viruses, undesirable information, or even memes that might violate a networks posting policy [8, 9]. In terms of this paper “vaccination”? is defined as a set of nodes that can be inoculated, removed, and/or protected from these malicious attacks. The main assumption is that if these specific sets of nodes are given preemptive measures then the attack will only be effective on a minimum number of nodes in the network.
The development and growth of networks has led to a diverse number of academic domains with shared properties and/or unique properties. Network traversal by means of graph theory [10], network communication protocols [6], static networks [11], dynamic networks [9] and network security [12] are network domains with similar terminology and visualizations. If a similarity exists across such diversity then it is possible to explore a similar solution that models important aspects borrowed from similar domains. This paper examines networks with nodes that are sparsely connected that may have a hierarchical structure that can be exploited using parts of graph theory to develop a solution.
Large networks can suffer disastrous results if an attack is allowed to migrate over a network. Various problems have been considered in the literature regarding this issue. Stopping a known attack at a source node before it can infect its neighbor nodes is a trivial problem. Once the attack spreads beyond the source node, however, its complexity rises as more nodes are exposed and infected through available edges [13]. Another trivial situation for an active infection network is if the amount of vaccine available is much larger than the number of noninfected adjacent nodes to any infected nodes. The more typical case occurs when noninfected nodes outnumber the amount of vaccine by a large degree [11]. This limitation causes a costbenefit solution for which choices must be made in order to create an outcome that is favorable to the network administrators. Giving system administrators a variety of implementations allows them to choose the best option, or possible combinations of near optimal solutions, for their system [14].
Several solutions exist to prevent the spread of attacks over closed static systems; however these solutions have limited application to dynamic systems. An effective solution for arbitrary static networks with sparse connectivity is called DataAware Vaccine Allocation [11] or DAVA. DAVA is a framework that treats infected subgraphs as super nodes and all its infected node neighbors as rooted bidirectional subtrees that has a representational optimal solution in the form of a set of adjacent neighbors of the infected root super node. The framework determines this set by using a benefit system derived from a probability of propagation set, an infection set, and the main network graph. Thoroughly discussed by Zhang and Prakash the DAVA algorithm has been proven effective for “vaccine” distribution over an arbitrary network given prior information [11]. However this solution has limited application if new nodes are being added to the network over a consistent time interval.
Large networks that are dynamic have vulnerabilities because of the changes made to the network in real time [9]. Complications arise when new nodes vary in their importance to the networks due to location of edges and varying probability to infection [8]. For example a node could be added to the network that has edges to nodes that were not connected before or that create a cycle that was not present before the edge was added. This addition may or may not change the overall benefit of the subtree that contains the node due to probability changes or dominance changes caused by this addition. In a static system this addition or removal now would represent a new graph which requires the entire network to be examined from the start. This paper seeks to make the distinction that these new nodes will not necessarily alter the current benefit solution of its associated subtrees thus would not alter the optimal solution; it would require less work than recalculating the entire tree with new benefits and optimal solution. As a bonus, overall network information would be gained that could be applied to machine learning to enhance the information about this network. This gain might be exploited when examining local or global solutions.
Similarly to a multitask learning problem [2], dynamic network node vaccination, or removal, has a main task, namely the optimal solution and then sub tasks to calculate and determine the overall benefit for topk vaccination order subgroups. Individual nodes that have outlier tendencies might alter any current branch benefit, thus altering the topk benefit order. It is from this concept that the idea of local solutions are of greater interest than global solutions for dynamic networks. If these local changes could be determined to have little or no changes to the benefits of the global subtree to which the local solution belongs, then it maybe possible to exploit this information to make an algorithm more efficient.
This paper presents a dynamic network algorithm, VAILDN, that is able to receive nodes sequentially from a network and then devise an optimal set of vaccination nodes to ensure maximum protection of the network. While performing effectively on static solutions DAVA group of algorithms slow down due to the fact that the global solution always must be determined, even though changes may occur only for local solutions. This paper will show that it is possible to adjust the optimal solution without examining the entire network which will save time over static algorithms. In addition this paper will show that VAILDN has the same level of accuracy as do DAVA type algorithms.
Dynamic solutions using dominator trees also present challenges that are not considered when using static solutions. These challenges include the addition or removal of edges to the graph that may redraw the dominator tree if a cycle is created or destroyed. In a dynamic environment it is possible that children may be detected before the parents in terms of the overall graph. This will not affect the global solution however, as edges are added, local connections and dominance are built that could be exploited for time saving when any parent does arrive.
In this paper first related works are discussed in “Preliminaries” section. The main algorithm is broken into three phases and presented in “Experiments” section. “Results” section describes the experiments in which this algorithm was compared to random baseline algorithms and vaccination methods for static networks. The final sections discuss the experimental results and the conclusions, including suggestions for future work.
Related work
This paper combines several resources into a cohesive solution that would be able to deal with active infections in a large dynamic network. Static solutions for networks having prior information were examined [15] and interesting solutions are present that could be expanded to dynamic networks by using a multitask approach. Topics for consideration include, epidemiology, multitask learning, dataaware vaccination allocation, dominance trees, network hierarchy, and large dynamic network optimization.
The topic of population vaccination has been well studied with a variety of models based on different approaches. Sometimes the global solution may not be obvious [16] or maybe difficult to obtain [17]. A localized approach could target specific groups that are known to have higher propagation probabilities or more frequent contact with infected. A global static approach could be too costly due to number of nodes in the network [14]. A dynamic solution that examines edges could be beneficial in calculating the optimal solution [16].
In references to “DataAware Vaccination Allocation” [11] this paper demonstrates that the Data Aware Vaccination problem is NPhard using a reduction from the MinKU set problem [18]. In order to demonstrate the variety of the problem and the general application of DAVA two common discretetime models were used for virus infection, the susceptibleinfectedremoved (SIR) model and the immune response (IR) model. The SIR model [19] used for the general case, is based on chickenpoxlike infections, during which a time cycle consists of two phases, an independent infection probability and a recovery probability \(\delta \). If an infection is transferred from an infected node to a healthy node successfully (the probability of the infection is given by the weights of the edges) then that healthy node will be infected during next round. If recovery of an infected node occurs, then that node is “cured” and no longer will able to participate in the epidemic. The second model, the IR model [11], is a specific case of the SIR model during which the interaction between an infected and a healthy node happens only once, and the recovery probability, \(\delta \) = 1. The solution algorithms suggested by Zhang and Prakash the DAVA, DAVAprune, and DAVA fast algorithms were demonstrated to perform better within static networks in which an infection already was in progress when compared with five different static network algorithms.
This paper presents the propagation probabilities based on population simulations. Due to the related natures of networks and human communities [20] it was worth exploring to simulate how human communities interact in order to assign nonrandom meaning to the propagation. A normalization of a population simulation yields sufficient data values to test the research. People can be considered nodes and their contact can be considered edges. Much like human populations contact among nodes is dynamic and should benefit from a dynamic solution. Contacts of certain individuals may have more weight than others thus the importance of calculating a local solution needs to be stressed. For example the elderly or infirm might be of concern only to their local group associations rather than to the network as a whole [17].
Network exploration and node importance are two fundamental concepts to this work. Research in the field of network exploration has resulted in many good algorithms and two popular methods include depthfirstsearch (dfs) [21] and breadthfirstsearch (bfs) [10]. Dfs can be used to maximize propagation results as well as to define subdominators and dominators [11] as a dominator tree is built. Dfs is an effective tool for network exploration when the entire network is known and is relatively sparse. Highly connected graphs will benefit more from bfs due to the exploration of individual levels before the depth of each subtree is explored. This distinction is important for this study because the assumption for a dominance tree is based around the fact that the nodes are sparsely connected. Dominator trees are used to discover the interconnectedness of nodes in a sparse network [21]. A dominator tree will yield subtrees that show the relative connection between nodes, thus can yield maximum solutions based on those subtrees [11]. If the structure of a dominator tree is correct [22] a new relational mapping of a network can be built quickly to determine which parent nodes singularly expose their children to infection. A dominator tree shows that if there is a nonunique cycle between two nodes x and y then x and y will be the only common nodes to all paths. As nodes in a network become more and more connected, dominance will lose its significance as many nodes begin to share the same dominator. This is why the assumption was made in this study that the graphs had some hierarchical structure present and that they were sparsely connected.
The building of the dominator tree becomes more challenging in a dynamic setting. Shortest path algorithms were shown to be useful to determine dominance in linear time [23]. In a dynamic setting it is not simply the shortest path alone which determines dominance. Any edge that creates a cycle between nodes in a subtree will upset the dominance and create the need to recalculate dominance for any nodes who are siblings to any node in the cycle. Another consideration for dominance is when an edge is removed the original dominance must be restored to all nodes that were either in the shortest path to the dominator or those neighbors of the shortest path nodes.
A multitask solution is necessary for large dynamic networks due to the number of subproblems created by sequential node entry into the network. Collaborative Online Multitask Learning (COML) [2], an important aspect considered in this study was written to deal with the discovery of individual distributions that may not be properly recognized by the main task. It is the dynamic nature of the subtrees that are formed with sequential node entry into the network that can form these subtasks. COML demonstrates that subtasks can be learned effectively at the same time as the main task. Thus information about individuals and the group are maintained independent of each other. Predictions made from using these learned distributions could be useful in making classifications for unknown instances that enter the system.
One of the main libraries used for implementation was the networkX library. This package is a library of graph objects and a variety of associated functions beneficial to this study. It was possible to use the graph representations as a cornerstone for the building, analysis, and traversal of the network. This graph could then be dissected into its representative dominance tree in order to yield the final solution [24].
Preliminaries
Notation

graph represents a networkx graph object

infectedSet is a list of nodes all who can propagate the infection

InfectedSuper is the name for the node, \(I^{\prime }_{x}\), that will represent the node with all infected edges and will serve as the root of the dominance tree

x is each member of the infected set

y is a variable that represents the other vertex that shares an edge with x

\(p_{I_{x},y}\) is the propagation probability of that edge

newGraph is the resultant graph with all infected nodes merged into one super infected node

graph is the newGraph from infectedSetMerge with the infected super node infectedSuper

changedsubtrees is a list of subtrees from the super node that have been altered by the addition or removal of an edge

Tree and domTree are the dominance tree being built with the infected super node as root

Tree.benefit is the associated benefit dictionary that is based off of domTree

k is the number of available vaccines

The table of benefits is retrieved from the dominance tree.
Algorithm outline

Phase1 The InfectionSetMerge phase,during which all connected infected nodes are combined into one supernode that will server as the root in Phase 2.

Phase2 The Tree Building phase, during which the nodes are entered sequentially and the subtrees are built, propagation probabilities are set, and subtrees are marked if changed; and

Phase3 The Benefit and Solution Decision phase during which, if necessary, the benefits are recalculated and compared to the current solution set. If there is a difference all benefits are sorted and the solution set of the topk benefits are chosen.
InfectionSetMerge
Tree Building
Solution Decision
Vaccine allocation in large dynamic network
Complexity analysis
The time complexity analysis for Phase 1 mainly is dependent on the number of nodes in the infections set. Each recursion from InfectionSetMerge Line 14 would be a subset of the main Infection Set thus the time complexity would be O(I\(_{V}\)  + I\(_{E}\)). BuildTree is based upon taking the graph, building the dominance tree representation of the graph and then calculating the dictionary of benefits for the individual subtrees. This makes the worst time O(V + E) with the additional constant costs of calculating the benefits. At this point, there is a significantly large difference between a static approach and a dynamic approach. Static algorithms must repeat this process for each benefit needed regardless if it is new or not. In the real time solution the average time is dependent on the number of subtrees changed by the addition or removal of an edge. There are constant multipliers that are dependent on the number of subtrees changed; however in sparsely connected graphs this number always should be much lower than the total of the subtrees in the graph. Also another factor that may change the solution is how close the subtrees are to the infected supernode. Changes to nodes close to the infected super node will require rebuilding large sections of the current dominance tree. Again, the overall changes to the graph should not be close to the infected set (Note For the Gnutella set only 400 edges are adjacent to the infected super node out of 20,000 edges in total). The third phase is a linear sorting of the benefits dictionary. A rare but worst case for sorting for this would be O(n); however this is an extreme exception [25]. The average case is O(1).
Experiments

How does the algorithm developed in this study compare speed and accuracy with the baseline algorithms?

How does the speed of this algorithm compare with DAVA Prune and DAVA fast?

How does the accuracy of this algorithm compare to DAVA fast?
Settings
The baseline algorithms used were: krandom, krandom adjacent, and krandom tier 1. The algorithm krandom chooses the krandom nodes for vaccination. The adjacent krandom algorithm randomly chooses nodes that are neighbors of the infected super node using the graph as a basis. krandom tier 1 randomly chooses nodes that are subtrees of the infected super node. Due to dominance, in the study there have might have been edges between the infected supernode and its neighbors that were not present in the graph. In addition the DAVA, DAVAfast, and DAVAprune algorithms were implemented for comparison. Due to the real time nature of the data the original code as presented by Li et al. [2] was not used for tests in this study. However the algorithms were maintained by means of an implementation that used their paper as a guide. If a dominance tree was needed for an algorithm, krandom tier 1 or DAVAprune for example, the tree was built using the BuildTree algorithm from this paper in order to maintain consistency and reduce variability. The first set of tests used the IR model and a propagation of 1. Later tests used a variety of propagations, either 0.5 or 0.9, in order to simulate the propagation of negative information across the network.
The tests were ran at the Supercomputing Center, Cherry Creek, at University of Nevada Las Vegas (UNLV). The trials were run on a single node with two Intel Xeon E52697v2 12core, 128Gb ram [26].
Dataset information
Set name  Total nodes  Infected set  infectedSuper adj 

Random  4000  300  525 
Gnutella  6301  53  401 
EmailEnron  36,692  200  1932 
The trial driver used to test the algorithms attempted to simulate a real time network. The network was created, either by means of the random driver or by using a .csv file of edges obtained from SNAP. Periods then were created to simulate changes in the network over time. For the purposes of this experiment a dynamic solution was presented in which changes might occur to the network as the solution was being run. Changes that occured that may affect the network included both adding and removing edges. The changes for each period were randomized for each trial but were kept consistent while the algorithms were run. Changes were made to the graph and then a copy of that graph was passed to each of the algorithms. The changes were chosen randomly using standard randomization functions (Python™). The percent of node change and variance for each period were fixed in order to demonstrate the concept and to reduce difference in performance. The time and infection results were averaged over 100 runs for each algorithm.
Results
Discussion
These experiments demonstrate how a real time solution for a dynamic network can outperform a static solution. VAILDN is able to compute the solution as edges are examined in real time. This suggests that a local solution to the vaccination problem may be a better overall solution as the network grows in size or has a global solution that is difficult to calculate. Random vaccination is fast but it is not as effective as VAILDN.
The main assumption for algorithms that use dominance as a basis is that either there are sparse connections or there is some overall hierarchical structure to the network. Network analysis in a static situation can reveal this type of information with many of the basic tools already established, such as depth first or breadth first searches. Dynamic networks, however present a complex field of parameters that can confound a single method of finding a solution. A dynamic network that is relatively stable might mimic a static environment; however edges can still be presented that could disrupt the assumptions of dominance or sparsity.
In VAILDN adding edges allowed the exploitation of the current dominance tree in order to establish new dominance of siblings and children. The removal of edges presented a specific challenge because after removing an edge the current tree structure gave little information about how to restore dominance. This is in contrast to adding edges where the dominance tree structure is already present. The method that was used relied on determining the siblings of each level for all the grandchildren of the common dominator. Removing edges was the most time consuming part of the algorithm, if it could be improved the efficiency will increase even more.
Conclusion and future work
A real time solution is able to deal with the reality of a modern and changing network dynamic. A solution can be determined quickly, as only the information local to the incoming, or removed, edge is considered. The accuracy of VAILDN is comparable to the static solution with diminishing losses. This means a network can make real time decisions about how the information is being distributed over the network. Dynamic networks offer difficult challenges with regard to issue detection. This algorithm is able to determine a real time solution with the assumption that the network is sparse.
Densely connected graphs create a low level dominance tree that makes finding the optimal solution challenging. To compensate for this a few future directions are open for exploration. This solution can be further refined by adding a factor that will calculate how much effect an edge propagation might have on its subtree solution. This will allow no change to the solution set if a small benefit is added, thus saving time. This could represent highly unlikely, yet present, connections between network members.
To continue this research, community clusters, rather than dominance trees, might be another structure for detecting the optimal solution. Even if a community is densely connected, the communities themselves maybe sparsely connected. This implies that with community detection, a local outbreak solution maybe able to be determined quickly offering a better solution than the global solution. A machine learning approach could be applied to determine which solution, dominance tree or community clustering, presents the most beneficial to the host network in that configuration.
This paper used terminology that indicated the information that was propagated was negative in nature (Table 2). This may not always be the case. The development of algorithms for information distribution with limited resources may be able to build on this algorithm as a real time solution for information distribution. The real time solution presented here is a heuristic solution that demonstrates the value of information propagation over a dynamic network.
Acronym list
Acronym  Meaning 

bfs  Breadthfirstsearch 
COML  Collaborative Onlinie Multitask Learning 
DAVA  Data Aware Vaccine Allocation 
dfs  Depthfirstsearch 
IR  Immune Response (model) 
NPHard  Nondeterministic Polynomialtime Hard 
SIR  SusceptibleInfectedRemoved (model) 
SNAP  Stanford Network Analysis Project 
UNLV  University of Nevada, Las Vegas 
Notes
Declarations
Acknowledgements
We gratefully acknowledge the United States Department of Defense (W911NF1410119, W911NF1310130, W911NF1610416), the National Science Foundation (1560625, 1137443, 1247663, 1238767), United Healthcare Foundation (#1592), Oak Ridge National Laboratory (#4000144962), and National Consortium for Data Science for their support. Through these financial support, the following articles have been published [28–42].
Authors’ contributions
JZ, TR, GS, and EV developed the concept for the manuscript. JZ and TR drafted the manuscript. JZ, GS, and EV revised the manuscript. All authors read and approved the final manuscript.
Competing interests
The authors declare that they have no competing interests.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Authors’ Affiliations
References
 Leskovec J, Backstrom L, Kleinberg J. Memetracking and the dynamics of the news cycle. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’09. New York: ACM; 2009. p. 497–506.
 Li G, Hoi SCH, Chang K, Liu W, Jain R. Collaborative online multitask learning. IEEE Trans Knowl Data Eng. 2014;26(8):1866–76.Google Scholar
 Wu Y, Deng S, Huang H. Information propagation through opportunistic communication in mobile social networks. Mob Netw Appl. 2012;17(6):773–81.Google Scholar
 Nkwi WG, de Bruijn M. “Human Telephone Lines”: flag post mail relay runners in british southern cameroon (1916–1955) and the establishment of a modern communications network. Int Rev Soc Hist; 59:211–35, 12 2014. CopyrightCopyright © Internationaal Instituut voor Sociale Geschiedenis 2014; Last updated—20150725; SubjectsTermNotLitGenreTextCameroon; United Kingdom–UK.
 Hwang GM, Mahoney PJ, James JH, Lin GC, Berro AD, Keybl MA, Goedecke DM, Mathieu JJ, Wilson T. A modelbased tool to predict the propagation of infectious disease via airports. Travel Med Infect Dis. 2012;10(1):32–42.Google Scholar
 Memon I. A secure and efficient communication scheme with authenticated key establishment protocol for road networks. Wirel Pers Commun. 2015;85(3):1167–91.Google Scholar
 Pempek TA, Yermolayeva YA, Calvert SL. College students’ social networking experiences on facebook. J Appl Dev Psychol. 2009;30(3):227–38.Google Scholar
 Cai F, Qingfeng H, LanSheng H, Li S, XiaoYang L. Virus propagation power of the dynamic network. EURASIP J Wirel Commun Netw. 2013;2013(1):1–14.Google Scholar
 Wang J, Liu Y, Deng K. Modelling and simulating worm propagation in static and dynamic traffic. IET Intell Transp Syst. 2014;8(2):155–63.Google Scholar
 Merrill D, Garland M, Grimshaw A. Highperformance and scalable gpu graph traversal. ACM Trans Parallel Comput. 2015;1(2):14:1–30.Google Scholar
 Zhang Y, Prakash BA. Dataaware vaccine allocation over large networks. ACM Trans Knowl Discov Data. 2015;10(2):20:1–32.Google Scholar
 Battiti R, Cigno RL, Sabel M, Orava F, Pehrson B. Wireless LANs: from warchalking to open access networks. Mob Netw Appl. 2005;10(3):275–87.Google Scholar
 EmmertStreib F, Dehmer M. Exploring statistical and population aspects of network complexity. PLoS ONE. 2012;7(5):1–17.Google Scholar
 Dewri R, Ray I, Poolsappasit N, Whitley D. Optimal security hardening on attack tree models of networks: a costbenefit analysis. Int J Inform Secur. 2012;11(3):167–88.Google Scholar
 Yaesoubi R, Cohen T. Dynamic health policies for controlling the spread of emerging infections: influenza as an example. PLoS ONE. 2011;6(9):1–11.Google Scholar
 Cohen R, Havlin S, ben Avraham D. Efficient immunization strategies for computer networks and populations. Phys Rev Lett. 2003;91:247901.Google Scholar
 Dushoff J, Plotkin JB, Viboud C, Simonsen L, Miller M, Loeb M, et al. Vaccinating to protect a vulnerable subpopulation. PLoS Med. 2007;4:05.Google Scholar
 Vinterbo SA. Privacy: a machine learning view. IEEE Trans Knowl Data Eng. 2004;16(8):939–48.Google Scholar
 Hethcote HW. The mathematics of infectious diseases. SIAM Rev. 2000;42:599.MathSciNetMATHGoogle Scholar
 NDSSL. Synthetic data products for societal infrastructures and protopopulations: Data set 2.0. ndssltr07003. 2007. http://ndssl.vbi.vt.edu/Publications/ndssltr07003.pdf.
 Lengauer T, Tarjan RE. A fast algorithm for finding dominators in a flowgraph. ACM Trans Program Lang Syst. 1979;1(1):121–41.MATHGoogle Scholar
 Georgiadis L, Tarjan RE. Dominator tree certification and divergent spanning trees. ACM Trans Algorithms. 2015;12(1):11–42.MathSciNetGoogle Scholar
 Buchsbaum AL, Kaplan H, Rogers A, Westbrook JR. Corrigendum: a new, simpler lineartime dominators algorithm. ACM Trans Program Lang Syst. 2005;27(3):383–7.Google Scholar
 Aric A, Hagberg, Daniel A. Schult, Pieter J. Swart. Exploring network structure, dynamics, and function using NetworkX. In: Proceedings of the 7th Python in Science Conference (SciPy2008). Pasadena; 2008. p. 11–5.
 python. Time complexity. 2015. https://wiki.python.org/moin/TimeComplexity. Accessed 1 Aug 2016.
 NSCEE. How to run jobs with pbs/pro on cherrycreek. 2016. http://wiki.nscee.edu. Accessed 1 Aug 2016.
 Leskovec J, Krevl A. SNAP Datasets: Stanford large network dataset collection. 2014. http://snap.stanford.edu/data.
 Chopade P, Zhan J (2016) Towards a framework for community detection in large networks using gametheoretic modeling. IEEE Trans Big Data (accepted)
 Pirouz M, Zhan J, Tayeb S (2016) An optimized approach for community detection and ranking. J Big Data 3:22Google Scholar
 Gan W, Lin CW, Chao HC, Zhan J (2016) Data mining in distributed environment: a survey. WIREs Data Min Knowl Disc (accepted)
 Wu JMT, Zhan J, Lin JCW (2017) An ACObased approach to mine highutility itemsets. KnowlBased Syst 116:102–113Google Scholar
 Lin JCW, Yang L, FournierViger F, Wu MT, Hong TP, Wang LSL, Zhan J (2016) Mining highutility itemsets based on particle swarm optimization. J Eng Appl Artif Intell 55(C):320–330Google Scholar
 Lin JCW, Wu TY, FournierViger P, Lin G, Zhan J, Voznak M (2016) Fast algorithms for hiding sensitive highutility itemsets in privacypreserving utility mining. J Eng Appl Artif Intell 55(C):279–284Google Scholar
 Lin JCW, Li T, FournierViger P, Hong TP, Wu JMT, Zhan J (2016) Efficient mining of multiple fuzzy frequent itemsets. Int J Fuzzy Syst 18:19MathSciNetGoogle Scholar
 Lin JCW, Li T, FournierViger P, Hong TP, Zhan J, Voznak M (2016) An efficient algorithm to mine high averageutility itemsets. Adv Eng Inform 30(2):233–243Google Scholar
 Lin JCW, Liu Q, FournierViger P, Hong TP, Voznak M, Zhan J (2016) A sanitization approach for hiding sensitive itemsets based on particle swarm optimization. J Eng Appl Artif Intell 53(C):1–18Google Scholar
 Zhan J, Gudibande V, Parsa SPK (2016) Idenfication of TopK influential communities in large networks. J Big Data 3:16Google Scholar
 Pirouz M, Zhan J (2016) Optimized relativity search: node reduction in personalized page rank estimation for large graphs. J Big Data 3:12Google Scholar
 Selim H, Zhan J (2016) Towards shortest path identification on large networks. J Big Data 3:10Google Scholar
 Chopade P, Zhan J (2016) Structural and functional analytics for community detection in largescale complex networks. J Big Data 2(1):1Google Scholar
 Fang X, Zhan J (2016) Sentiment analysis using product review data. J Big Data 2(1):1Google Scholar
 Zhan J, Fang X (2014) A computational framework for detecting malicious actors in communities. Int J Inf Priv Secur Integr 2(1):1–20Google Scholar