Query Optimal k-Plex Based Community in Graphs

Data Science and Engineering, Nov 2017

Community search problem, which is to find good communities given a set of query nodes in a graph, has attracted increasing research interest recently. Though various measurement models have been proposed to define and solve community search problem. Few of them could define a community concisely and have good quality of query results. They either involve additional constraints for modeling communities, such as size and diameter, or suffer from the free rider effect, i.e., include irrelevant subgraphs. In this paper, we propose a new k-plex based community model for community search. We show that our model not only is simple and clear, but also meets with basic requirements of defining a community search problem. We formulate the maximum k-plex community query (MCKPQ) problem, that is, given a set of query nodes Q, searching for optimal k-plex containing Q. We prove that MCKPQ is NP-hard, and it is hard to approximate in any constant factor. We first give exact solutions. Then, we propose an efficient branch-and-bound (B&B) method and design an effective upper bound function and a pruning strategy. Furthermore, we optimize the basic B&B by fast candidate generation. We also give a fast heuristic solution, which produces high-quality results in practice. The effectiveness of our model of community and the efficiency of our methods are verified by elaborate experiments.

A PDF file should load here. If you do not see its contents the file may be temporarily unavailable at the journal website or you do not have a PDF plug-in installed and enabled in your browser.

Alternatively, you can download the file locally and open with any standalone PDF reader:


Query Optimal k-Plex Based Community in Graphs

Query Optimal k-Plex Based Community in Graphs Yue Wang 0 1 Jian Xun 0 1 Zhenhua Yang 0 1 Jia Li 0 1 K-plex 0 1 0 Zhenhua Yang 1 Huawei Technologies Co. Ltd. , Xi'an , China Community search problem, which is to find good communities given a set of query nodes in a graph, has attracted increasing research interest recently. Though various measurement models have been proposed to define and solve community search problem. Few of them could define a community concisely and have good quality of query results. They either involve additional constraints for modeling communities, such as size and diameter, or suffer from the free rider effect, i.e., include irrelevant subgraphs. In this paper, we propose a new k-plex based community model for community search. We show that our model not only is simple and clear, but also meets with basic requirements of defining a community search problem. We formulate the maximum k-plex community query (MCKPQ) problem, that is, given a set of query nodes Q, searching for optimal k-plex containing Q. We prove that MCKPQ is NP-hard, and it is hard to approximate in any constant factor. We first give exact solutions. Then, we propose an efficient branch-and-bound (B&B) method and design an effective upper bound function and a pruning strategy. Furthermore, we optimize the basic B&B by fast candidate generation. We also give a fast heuristic solution, Community search; Graph algorithm - Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong which produces high-quality results in practice. The effectiveness of our model of community and the efficiency of our methods are verified by elaborate experiments. 1 Introduction Community, which is defined as set of nodes densely connected internally, is considered as an important structure in networks and plays a significant role in graph mining. Community detection is a task to find communities for a graph, and it has many applications in different fields, such as: (1) mining sets of highly correlated stocks [ 27 ]; (2) detecting DNA motif in bioinformatics [ 25 ]; (3) finding Web sites communities sharing similar topics [ 23 ]. Numerous works, which are based on different community models, have been developed to detect communities [ 13, 32 ]. Since nowadays graph data become large and dynamic, much research attention has been transferred from the community detection problem to the community query problem [ 10, 11, 17, 28 ]. Unlike community detection, which enumerates all communities, community query aims to find the communities that contain a set of query nodes Q. Compared with community detection, community search by query nodes could avoid: (a) the time consumed to find all communities of entire large graphs; (b) the inflexible global parameter for the community criterion; (c) the difficulty to deal with the dynamically evolving graphs [ 10 ]. In addition, community search has many applications in real-life networks on its own for query specific nodes: • Social group finding In an event-based social networks such as Meetup,1 a common task is to personally search for a group with strong social ties among people, to organize interesting activities. • Friends recommendation In popular social network applications, e.g., Facebook,2 Twitter,3 and Instagram,4 community search could help predict potential friendship relationships for users, especially for new users with few friends. • Frequent pattern mining When transactions and items are modeled as a bipartite graph, then dense communities group items which are bought together frequently in different transactions. Given some specific items, community search would help identifying frequent item sets and further benefiting marketing. It is not straightforward to define what is a community and to formulate community search problem precisely. There are several aspects to be concerned, among which the most important are how to measure the goodness of a community and how to define it. Although there is no standard answer, we summarize common requirements that a formulated community search problem should meet: 1. Cohesive A community should be cohesive. Each member within it should be familiar with others. Currently, many cohesive measurements have been proposed to ensure the cohesiveness of a community, such as subgraph density [ 2, 9 ], minimum degree [11], and subgraph diameter [ 28 ]. 2. Number of query nodes When jQj ¼ 0, community query is identical to community detection. Some works limit jQj ¼ 1, a single query node, which is reasonable but not general in all cases. It is required jQj 1 sometimes. For example, Alice and Bod are holding a party, and they want to invite friends who have strong social ties with both of them. 3. Avoid free rider effect Lots of community measurements lead to free rider effect [ 30 ], which means including unrelated subgraphs in query results and making communities impure. Thus, it is required that the query results should not have the free rider effect. 4. Connectivity Suppose there is a community H for the community query, then not only H should contain Q, but also H is supposed to be connected. However, the latter requirement is not always guaranteed in different models. 5. Size of community In some scenarios, we are interested in not only cohesive communities given a query, but 1 https://www.meetup.com/. 2 https://www.facebook.com/. 3 https://twitter.com/. 4 https://www.instagram.com/. also those with size bound. As shown in [ 21 ], an activity may not be triggered on if the size of social group is less than the activity’s lower bound for the number of participants. Sozio and Gionis [ 28 ] also study the problem of querying the optimal community with an upper bound of the size. 6. Few parameters In spite of above constraints, as indicated in [ 17 ], definitions of problems which are simple to formulate have few parameters are more preferable. Currently, several models of community have been proposed to solve community query problem [ 4, 11, 17, 18, 28, 31 ]. However, none of them could satisfy the above requirements simultaneously, which means that more constraints would be added to their proposed models to cover missing requirement, and making the problems more complicated. Take k-truss model [17] as an example, Huang et al. [ 17 ] use k-truss property to ensure community cohesiveness and use edge connectivity to ensure the community connectivity. K-truss model is further improved by restricting community diameter, and avoiding free rider effect in [ 18 ]. k-core based (minimum degree) model is also used to define community in [ 4, 11, 22, 28 ]. Sozio and Gionis [28] first introduce minimum degree as the goodness function of community and give a linear time algorithm for unbounded size problem. Cui et al. [ 11 ] improve the unbounded size version using the local search strategy, but leave the problem with bounded size unsolved. Sozio and Gionis [ 28 ] also add the bounded diameter and the size constraint to keep community small. Cui et al [ 10 ] introduce a quasi-clique based community to discover overlapping communities for single query node, but include too many parameters and make the problem hard to set appropriate parameters, as pointed out in [ 17 ] . To address the free rider effect, which is common in most community models, Wu et al. [ 31 ] propose a query-density based model. In [ 31 ], the random walk is used to measure the density of each node from query nodes, and the problem is to find a community which maximizes the querybased density. However, the result communities have no size guarantee in this measurement. To address the above problems, in this paper, we proposed a novel k-plex based community model for query communities. K-plex, first introduced in [ 26 ], is one kind of relaxation of a k-clique. A graph H is called a k-plex if for each node v 2 H, v has at least jHj k neighbors. In other words, each member could have missed at most k nonneighbors in H. Clique is a special case of k-plex when k ¼ 1. The reason of that we model a community as a k-plex is it can meet all above requirements for the community search problem, that is, a k-plex community H has good cohesive structure, bounded diameter, upper bound size guarantee. At the same time, the connectivity of H is guaranteed as H becomes large. The overview properties of a k-plex H are listed as below: 1. The diameter of H is bounded by k; 2. the lower bound of the density of H is jH2j k, which makes community cohesive with its size increasing; 3. any k-plex larger than k þ 1 is connected; 4. there is an upper bound for |H|; 5. when H is optimal (defined in Section 2) for query nodes, it can avoid free rider effect; 6. since a k-plex must be a (k þ 1)-plex, k-plex communities could form a hierarchy structure for query nodes, by querying with increasing k. The detailed analysis would be shown in Sect. 2. In spite of listed properties, querying optimal k-plex community only needs specifying Q and k, making issuing a query as easy as possible. Although the description of querying k-plex community problem is simple and clear, we show it is NP-hard. In addition, it is even NP-hard to approximate with O(n) factor in polynomial time. Thus, to search k-plex community efficiently, we first introduce two enumeration methods to get the exact solution, while the later can prevent repeated enumeration. Next, we come up with an efficient branch-and-bound framework. Furthermore, we reduce the search space by carefully defining a upper bound function. In addition, we also introduce global pruning and local pruning techniques to filter out unqualified neighbors. Then, we present two optimization methods, partial branching and incremental generation, to accelerate the candidate generation processes. In the end, we present heuristic solutions which can generate results fast with good quality in practice. The rest of the paper is organized as follows. We give the problem definition in Sect. 2. We present baseline solutions in Sect. 3. In Sect. 4, we propose an efficient branch-and-bound framework. In Sect. 5, we introduce two optimization techniques to accelerate the B&B framework. In Sect. 6, we discuss a heuristic solution. Experiments are reported in Sect. 7. We discuss related works in Sect. 8 and conclusion in Sect. 9. 2 Problem Statement In this section, we give some preliminaries and formulate our k-plex based community problem, and then we compare our model with current ones and present the analysis of our proposed problems. 2.1 Problem Definition In this paper, we consider simple undirected graphs G(V, E) which have no weight on nodes and edges. Let n ¼ jV ðGÞj and m ¼ jEðGÞj. We denote the neighbors of a node v 2 V by N(v) so the degree of v is degGðvÞ ¼ jNðvÞj. For any H V , the subgraph induced by H is denoted as G[H] with nodes V ðG½H Þ ¼ H and edges EðHÞ ¼ fðv1; v2Þjv1; v2 2 H; ðv1; v2Þ 2 EðGÞg. Sometimes we replace G[H] by H if the context is clear (Table 1). As a relaxation of clique, k-plex was first introduced in [ 26 ]. We now give the definition of connected k-plex. Definition 1 (Connected k-plex) Given a graph G and a constant k, a subgraph G[H] is said to be a connected kplex if G[H] is connected and for any v 2 H, degG½H ðvÞ jHj k. Lemma 1 If G is a k-plex. Then, any vertex-induced subgraph from G is also a k-plex. Compared with k-core model, whose lower bound of minimum degree is a constant k, the minimum degree of a k-plex is increased with its cardinality. The larger of a k-plex, the more cohesive it is. However, a k-plex defined by a single k would form different structural subgraphs. Consider k ¼ 1, which is the minimum k we could provide, then both an edge and a clique could satisfy 1-plex property. Clearly the former one is less concerned than the latter. So we define the k-plex community problem with size constraint. Definition 2 (Connected k-plex query) Given a graph G, a set of query nodes Q 2 G, a constant k and size constraint c, find a subgraph G[H], such that: 1. Q H VðGÞ and G[H] is a connected k-plex. 2. |H| is no less than c. The results of CKPQ could be exponential. Consider a clique of size n, set an arbitrary single query node q, k ¼ 1 and c ¼ 1, then there would be 2n 1 candidate results(each subgraph whose size larger than 1 containing q would be a feasible solution). Now we give the optimized version of k-plex community search problem. Definition 3 (Maximize connected k-plex query) Given a graph G, k, a set of query nodes Q 2 G find a subgraph G[H] whose size is maximized among all the solutions to CKPQ. Theorem 1 For each q 2 Q, the maximized connected k-plex consisting of Q must be one of maximal k-plex of q. Proof Clearly the optimal solution H which consists of Q must be maximal, otherwise we can add extra nodes to current solution to make it larger, which results in better solution and contradicts with that H is optimal. Next, q 2 Q, Q H implies H is maximal k-plex of q. h 2.2 Compare with k-core Based Community There are already some works about searching community with minimum degree guarantee [ 11, 28 ]. We make a comparison with k-plex community here and show what is the difference. For a community H, previous works mainly focus on make dðHÞ large, where dðHÞ defines the lower bound of neighbors for each node in |H|. However, k-plex defines upper bound of the number of each node would miss. That results in the larger H is, the more neighbors each node has. As a result, k-plex community is ‘‘denser.’’ Definition 4 (k-core) Given graph G(V, E), a subgraph G[H] is called k-core or a core of order k iff 8v 2 H; degH ðvÞ k and H is a maximum subgraph with this property. Without size bound, deciding whether there exists a H such that dðHÞ k can be done in linear time, shown in [ 11, 28 ]. However, with size bound, the problem becomes NP-hard. Authors of [11] provided efficient solution for the unbounded version and proposed bounded problem unsolved, shown below. Definition 5 (mCST [ 11 ]) Given a graph G(V, E) and a query node v0 2 V and a constant k, find H V such that (1) dðG½H Þ k; (2) G[H] is connected; (3) |H| is minimized. Note that CKPQ(Q, k, c) is identical to decision version of mCST, by setting jQj ¼ 1; k0 ¼ c k; c0 ¼ c. In this paper, we only focus on the optimization problem MCKPQ, since the solution of MCKPQ can help solving the CKPQ problem. 2.3 Hardness Results In this section, we show the decision version of MCKPQ, CKPQ is NP-complete. We further show that MCKPQ is also hard to approximate in any constant factor. Theorem 2 The CKPQ Problem is NP-Complete for any constant k and jQj 1. Proof Clearly, given Q, k, c and a candidate subgraph H, we could test whether H is a k-plex with size larger than c and contains Q in polynomial time. Next, we reduce from k-plex problem, which is shown NP-complete for any positive integer k in [ 3 ]. Given a graph G(V, E) and k, the k-plex problem is to decide whether there exists a k-plex of size c in G. We now construct an instance of CKPQ by constructing a new graph G0ðV0; E0Þ as follows: we create arbitrary set of nodes Q and add them to V, so V ðG0Þ ¼ Q þ VðGÞ, for each node v in Q, we connected v to all the other nodes, i.e., EðG0Þ ¼ EðGÞ [ fðv1; v2Þjv1 2 Q; v2 2 V 0; v2 6¼ v1g, obviously G0½Q is a clique, and we finish construction by setting c0 ¼ c þ jQj and Q as query nodes. We show that kplex problem is a Yes-instance iff the decision version of MCKPQ is a Yes-instance. Suppose subgraph G[H] is a solution of k-plex problem, then H0 ¼ G0½H [ Q is a solution of MCKPQ: (1) Each pair of nodes in H0 is adjacent or connected by at least one node in Q; (2) jH0j jQj þ c ¼ c0; (3) for any node v 2 H, degG½H0 ðvÞ ¼ degG½H ðvÞ þ jQj jHj k þ jQj ¼ jH0j k0, for any node v 2 Q, degG½H0 ðvÞ ¼ jQj 1 þ jHj ¼ jH0j k0. So G½H0 is a k-plex. For the other direction, if we have a connected kplex G0½H0 with size c0, then G½H0 n Q ¼ G0½H0 n Q is a kplex by Lemma 1 and has size c0 jQj ¼ c, which completes the proof. h Theorem 3 For any [ 0, it is hard to approximate for MCKPQ problem in polynomial time within a factor n1 . Proof Works in [ 16 ] show it is n1 -hard to approximate the maximum clique problem, which aims to find the largest clique in a given graph. Given an instance G of MCP, we can perform a gap-preserving reduction to MCKPQ by setting G0 ¼ G, Q ¼ ;, and k ¼ 1, whose solution is identical to MCP. So there is no ðn1 Þ-approximation algorithm which runs in polynomial time for MCKPQ, where n ¼ jVðGÞj. h 2.4 Problem Analysis In this subsection, we first show that the solution of MCKPQ problem can avoid the free rider effect, then we present the properties of our proposed community model, and show that it can meet the requirements for community search problem, as summarized in Sect. 1. 2.4.1 Free Rider Effect(FRE) In community detection problem, the free rider effect is under some goodness metric, the results of community g search admit including irrelevant subgraph [ 18, 30 ]. According to [18], most of community metrics suffer from free rider effect, such as minimum degree, graph density, subgraph modularity and so on. We first give an example of FRE and formal definition of it based on [ 18 ], and then we show the formulation of MCKPQ can avoid free rider effect. Minimum degree dðHÞ is commonly used as community measurement in [ 4, 11, 22, 28 ], the bigger, the better. Take graph G in Fig. 1 for example, suppose Q ¼ f g g . Then most subgraphs in G consisting of fg; e; hg could be as results of this query with optimal value 2, for example the subgraph dðfg; h; e; d; k; jgÞ ¼ 2. But obviously it cannot be returned as best community, since k, j are too far away from g and make no contribution to optimal value. The same scenario also exists in k-truss model in [17]. Definition 6 (Free Rider Effect [ 18 ]) Suppose H is a solution to a community definition according to a goodness function f(.). The community definition is called suffers from free rider effect, provided whenever there is an optimal solution H to the community detection problem, then f ðH [ H Þ f ðHÞ. This means the combination of communities H and H is no worse than H. The definition of MCKPQ could avoid free Theorem 4 rider effect. Proof Given Q and k, the optimal community H is the largest k-plex containing Q. So for any other k-plex fH : Q Hg. And f(.) is simply |H|. There are two cases between H and H : 1. H H . Then f ðH [ H Þ ¼ f ðH Þ ¼ jH j [ jHj ¼ f ðHÞ. 2. H 6 H . Since H is optimal and maximal, then H [ H cannot be k-plex(otherwise, H ¼ H [ H since H [ H is larger) and becomes not feasible any more. We cannot evaluate H [ H by f(.). In both cases, we cannot get f ðH [ H Þ nition of MCKPQ could avoid FRE. f ðHÞ. So defih 2.4.2 Properties of k-plex In this section, we dive into properties of k-plex and describe how it can meet requirements of community search problem stated previously. These properties include bounded size, bounded diameter, connectivity. Given single k, then any k-plex community has global bounded size. Theorem 5 Given a connected graph G(V, E), for any node v, its optimal k-plex size satisfies qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi jH j ðk þ 2Þ þ ðk þ 2Þ2 2 8ðjVj jEjÞ Proof Suppose a k-plex H. Then dðHÞ jHj k and jEðHÞj jHjðjH2j kÞ. Then, we contract H to one node in G to get G0. From the fact G is connected, G0 is also connected. So jEðG0Þj jV ðGÞj jHj þ 1 1 ¼ jVj jHj. Also |E| is the upper bound of number of edges of this two graphs H and G0. jEj jEðHÞj þ jEðG0Þj jHjðjHj 2 kÞ þ jVj jHj Solving above inequation, we get the result. If we take Q into consideration, then we have a local bounded community size stated as follows: Given an instance of MCKPQ(G, Q, k), the Theorem 6 optimal size jH j mq2iQnðjNGðqÞjÞ þ k Proof Suppose H is one dðHÞ jHj k, it follows jNH ðqÞj Since H G, then jNGðqÞj jHj k þ jNGðqÞj. Because jHj minq2QðjNGðqÞjÞ þ k. of solutions. Then jHj k for any q 2 Q. jNH ðqÞj. So we have arbitrariness of q, so h We next discuss the connectivity of k-plex. We first show that for a disconnected k-plex, upper bound of its size is related to the number of its disjoint component and k. Based on that, we show when a k-plex is larger than a threshold, it must be connected. Theorem 7 If disconnected k-plex H ¼ disjoint components (a [ 1), then jHj\ a a 1 ðk a i¼1 Ci has a 1Þ. Proof Suppose Ci is the minimized component of H, then jCij jHa j. For each node in Ci, it has at least ða 1Þ nonneighbors, so a 1 k 1 ) a k. S ð1Þ ð2Þ h ð3Þ jH2j jCij þ ða ) jH2j ) jH2j jH2j a þ ðk a 1 ðk a 1Þ 1Þ Actually this bound is tight. Consider the disconnected 3-plex VðHÞ ¼ fg; h; j; kg in Fig. 1 with two components. jHj ¼ 2 ð3 1Þ ¼ 4. Corollary 1 If a k-plex H whose size is larger than ðk þ 1Þ, then H must be connected. Proof We prove by contradiction. Suppose H is disconnected, then it has 1\a k components, and H a a 1 ðk 1Þ. That implies k þ 1\ a a 1 ðk 1Þ by Theorem 7. However, k þ 1 a a 1 ðk 1Þ ¼ ða Theorem 8 If G is a connected k-plex, then jGj Proof Since G is connected, then every node in G would at least have one neighbor, which implies jGj k 1. h From above we can see that when H is larger than ðk þ 1Þ, it must be connected. As a result, while finding kplex community, we do not need to do the connectivity checking for most of them, which is a main process in most community search algorithm [ 28, 31 ]. If we make the assumption for each k-plex community query, there always exists H such that jHj [ k þ 1, in this case, connectivity is not a concern any more. Since diameter is another additional constraint for community query [ 18, 28 ]. Next, we show that any k-plex has bounded diameter by k. Theorem 9 If a connected k-plex G has the diameter d, then k d. Proof Suppose v1 v2 vdþ1 is the longest shortest path in G. For node v1, there are at least d 1 non-neighbors in this path, which implies k 1 d 1. h Corollary 2 All nodes of optimal solution of MCKP are in k-hop neighbors of Q. In this section, we state rationality of k-plex community search problem. Even though the problem is simple to model, interestingly it has nice properties for size, kÞ ð5Þ h connectivity, diameter, free rider effect, all of which are major requirements for most of community search problem. Next, we discuss solutions to MCKPQ problem. First, we give two basic enumeration algorithms as baselines. Next, we improve by presenting branch-andbound based algorithm and introduce the upper bound function and pruning strategies. Furthermore, we present two methods to accelerate the basic B&B algorithm. 3 Baseline Methods We first give the baseline method to solve the MCKP problem. Since MCKP is NP-hard and it is hard to approximate by any linear function in polynomial time, we use the generate and verify method to explore the whole search space. Algorithm 1 describes this basic framework: It enumerates all k-plexes consisting of Q, the methods of enumeration differ but all using neighbors of current k-plex being enumerated, keep those with largest size(note that there maybe multiple largest k-plexes with the same size), among them it returns the connected one. Otherwise, it claims there does not exist solution for the query. The search space of enumeration decreases with increasing query size, based on Theorem 10. Theorem 10 Given Q ¼ fq1; . . .; qng and Qi ¼ fq1; . . .; qigð1 i\nÞ. Let Mi denote set of all maximal k-plex of Qi. Then for any 1 i j , Mj Mi. Next, we show that if maximized k-plexes are not connected, then all k-plexes are not connected. Theorem 11 If all maximized k-plex of Q is not connected, then there does not exist connected k-plex consisting of Q. Proof We prove by contradiction. Suppose H ¼ ia¼1 Ci is maximized k-plex of Q, disconnected. And there exists another connected k-plex H0 that satisfies jH0j\jHj. Since the enumeration is based on neighbors of Q and H is disconnected, then G[Q] is disconnected either, at the same time, each component Ci contains at least one of the query nodes. H0 includes all query nodes, and H0 would intersect each of component of H. Now H0 can be represented by T S S H0 ¼ ð ia¼1 IiÞ ðH0 n HÞ. We now can construct a connected k-plex H whose size is equal to |H| by following steps. Initially set H ¼ H, at each iteration remove the node with minimum degree among all the components and add one node from H0 until all nodes in H0 n H are added to H0, H is a connected k-plex, contradicting with all H are disconnected. h To achieve both maximization and connectivity constraint of MCKP, based on Algorithm 1, we only need to check the connectivity of those k-plexes with largest size with Theorem 11. The NaiveEnum (shown in algorithm 2) is used to generate all k-plexes of Q. Starting with Q, it searches each of Q0s neighbors to check whether it is validate to extend Q until there is no such neighbors. Even though NaiveEnum would generate all maximal k-plexes, it can result in repeated generation, i.e., enumerate identical k-plex multiple times. Example 1 When Q ¼ fgg and k ¼ 2 in Fig. 1, then the maximal 2-plex fg; a; e; hg could be enumerated by g ! e ! a ! h and g ! h ! a ! e. Hence, tedious computation may happen in naive enumeration. To prevent stated drawbacks and reduce the computation, we design algorithm basing on Bron–Kerbosch algorithm [ 8 ], which is used to generate all maximal cliques recursively given a graph. Here, we changed it to enumerate all maximal k-plex containing a set of specific query nodes. At each iteration, three sets R, P, X are feed to Maximal Search (MS), shown in algorithm 3. R is current founded k-plex, which is to be extended; P is candidate nodes, each of which can enlarge R, and nodes in X can also extend R but they are used in previous search. When there is no candidate can be used to enlarge R, then it is maximal. And if there are all candidates have already been used in previous search, i.e., X is not empty, it means that maximal k-plex containing R is already enumerated. Otherwise, we perform DFS search for all candidates. To get maximal k-plex of Q, MS is initialized with R ¼ fQg; P ¼ fv : v 2 NðQÞ; R ¼ fv : v 2 NðQÞ; fvg [ Q is k ¼ plexg; X ¼ ;. 4 Branch-and-Bound The general generate and test procedure shown above is costly. First, for each found k-plex, it expands all its neighbors no matter whether they can lead to a better solution or not. Second, all candidates are enumerated equally, and each neighbor is chosen with equal probability to expand current solution. However, some candidates and neighbors are more potential to enlarge the size of current k-plex. To reduce the search space and improve efficiency, we develop branchand-bound(B&B) based algorithm shown in Algorithm 4. B&B paradigm is widely used for solving large-scale NP-hard combinatorial problems. An explicit enumeration for hard problem is normally time consuming due to exponentially increasing number of potential solutions. Using bounds for the function to be optimized plus score of current best solution enables B&B to search parts of the candidate space only. Algorithm 4 is to get the maximized k-plex of Q. It is DFS-based branch-and-bound, which is shown efficient practical in solving various hard problems. At each iteration, H is the current k plex, we select validate nodes B (candidate neighbors are generated by cand_gen) from H0s neighbors and branch on each node of set B. Algorithm 4 also generates maximal k-plexes of Q; however, it only searches partial searching space instead of all of them, which is performed in NaiveEnum and B&KEnum. This is done by following strategies: 1. define effective upper bound function for candidate, if current candidate k-plex cannot improve the quality better than current optimal one, then it is discarded. So the search space starting with this branch is removed. 2. for each candidate to be extended, select neighbors of candidate with high priority first to get optimal k-plex early. 3. prune invalidate neighbors. 4.1 Upper Bound Function Upper bound function is crucial in B&B algorithm, loose upper bound would have no ability to prune search space. To derive efficient upper bound function of current k-plex H, we consider two cases. Definition 7 (tight nodes) A node v 2 H is called tight if degH ðvÞ ¼ jHj k The upper bound function of H is defined as follows: 1. There exist tight nodes in H. T Suppose V ¼ fv : v 2 H; v is tightg. Then at most j v2V NðvÞj nodes could be added to H, this means every node added to H must be one of neighbors of tight node v (any node non-neighborhood of v is unqualified because v has already k 1 non-neighborhood in H). 2. There exist no tight nodes in H. In this case, upper bound is defined as jHj þ minv2HðjNGnHðvÞj þ ðk þ NHðvÞ jHjÞÞ. For each node, it can expand |N(v)| neighbors and limited number of non-neighbors candidates, and the maximized size H can be expanded depends on lowest candidates number of v 2 H. Upper bound of above two cases involves |N(v)|. We further improve the bound quality by replacing N(v) with fu : u 2 NðvÞ; dðH [ fugÞ jHj þ 1 kg. By not counting neighbors that cannot be used to expand H, the gap between limitation of H and bound function is reduced. 4.2 Prune Unqualified Nodes Unqualified nodes refer to those that are not able to produce maximized k-plex of Q or even k-plex of Q. Next, we introduce two strategies for pruning unqualified nodes: one is based on the query distance, the other is based on the core number. Definition 8 (Query Distance) Given a graph G(V, E) and a set of query nodes Q V , and a arbitrary node v 2 V , the query distance between v and Q is distðv; QÞ ¼ minv2Q distðv; qÞ, where dist(v, q) is the length of shortest path between v and q. Based on Theorem 9, the query distance of any node v and Q is no greater than k. So the candidate nodes set T Vc ¼ ii¼¼j1Qjfv : v is k hop neighbor of qg . We only perform the search in the subgraph G0 ¼ G½Vc . Furthermore, not all neighbors of current candidate H are supposed to branch on. Let current maximized k-plex is denoted by H . Neighbors of H are pruned by following strategies: degree pruning and core number pruning. Degree pruning is straightforward: if for node v 2 NðHÞ, degG0 ðvÞ\jH j k, then it is unqualified for branching. Since for any subgraph consisting of v, its minimum degree is no greater than deg(v). Definition 9 (core number) The core number cv of node v is the highest order of a core that contains this node. For any subgraph H G, if v 2 H, then Theorem 12 dðHÞ cv. Proof This can be proved by contradiction. Let dðHÞ [ cv and H0 denote dðHÞ core of G, then H H0, otherwise, H0 [ H would result in a larger subgraph whose minimum degree is no less than dðHÞ, contradicting H0 is maximized. Since H 2 H0 and v 2 H, then cv dðH0Þ ¼ dðHÞ. This contradicts that dðHÞ [ cv. h By Theorem 12, a node v cannot be included in any subgraph whose minimum degree is larger than cv. So we remove nodes in N(H) if cv\jH j k, where H is current best solution. Note that calculating core number for each node is a preprocessing step, and this can be done in linear time by core decomposition using method in [ 5 ]. We now analyze time complexity of basic candidate generation. Testing whether H is k-plex can be done in O(|H|) time. The algorithm 5 would run in OðdmaxjHj2Þ, where dmax ¼ maxv2H degGðvÞ. 5 Optimization on B&B 5.1 Partial Branching In basic branch-and-bound algorithm, given a k-plex H, we simply extend H by branching all the validate neighbors B of H until all of them become maximal or cannot produce larger results by upper bound function. That is, we enumerate all branches and give them the same priority. The following two observations lead us to devise more subtle branching schema. First, the minimum degree of optimal k-plex is largest among others. Second, bottleneck of extending a current k-plex H always depends on the nodes those have the minimum degree. So the improved candidate generation method partial branching makes following improvement: • Instead of representing each current k-plex H as a set of nodes, it uses min-heap structure with node v as key, degHðvÞ as value. • At each iteration, we only consider neighbors of node with minimum degree in H and use them to expand H. Next we show the partial branching would not miss any feasible solution. Theorem 13 Given any two k-plex H0 and Hk in G, which satisfies H H0; H0is connected. There always exists a sequence v1; . . .; vk such that 8i 2 ½0; k 1 ; vi 2 Hkn H0; Hiþ1 ¼ Hi [ fviþ1g; ðviþ1; uiÞ 2 EðGÞ, where ui ¼ arg minui2Hi;9v2HknHi;ðui;vÞ2Eðui;vÞdegG½Hi ðuiÞ. Proof We can construct this sequence by following steps. First, we partition G½Hk into two parts A ¼ G½H0 ; B ¼ G½Hk n H0 . Then at each step, we move one node from B to A, by choosing the crossing edge(one endpoint in A, the other in B) whose degree of the node in A is minimum compared with other crossing edges. The process is always success until B is empty. Suppose at some step A0; B0 there is no crossing edge and B0 is not empty, this would imply G½A0 [ B0 ¼ G½Hk is not connected, which contradicts the assumption. h We now analyze complexity of partial candidate generation. Since we only extend neighbors of node v with minimum degree in H and H is a min-heap, fetching v can be done in O(1) time. Let dmin denote degree of node v. And up to dmin, updates are performed to H due to expanding. Updating a heap H can be done in O(log(|H|)). Totally, the time complexity is OðdminjHjÞ 5.2 Incremental Candidate Update We now introduce an alternative to reduce cost of candidate neighbors generation. In basic B&B algorithm, in each round validate neighbors are generated from scratch given H. That would incur the case one node would be generated multiple times in following steps, starting from H. Example 2 Consider the graph in Fig. 1 again, suppose H ¼ fa; g; k ¼ 2. Figure 2 shows one of search paths of basic B&B algorithm, extending H until it becomes maximal. At each round, it first gets its neighbors and keeps those valid. From B1 (candidates in round 1) to B5, node c is generated and tested three times and node d twice. The case also applies to nodes not valid. For example, g is rejected in N3ðHÞ, but still tested in the next round in N4ðHÞ. The above example shows tedious calculation in candidate generation step. Here, we propose more efficient way (shown in algorithm 6) to generate candidates Btþ1, which takes use of results Bt in last round. We define B as hash table, whose key is node id and value number of nonneighbors in H plus 1. First, remove a v of Bt and add it to H. can expand indicates whether new members can be added. Then, we increase the value of node who is not in neighbors of v in Bt by one. Then filter out those nonneighbors in H exceed k. Next, we add new members x from NGnðBt[HÞðvÞ if can expand is still positive. There are three cases that we cannot extend the current candidates Bt from residual graph G n ðBt [ HÞ: 1. Some node u of Bt becomes tight. In this case, it would block all new members since any new node x. Because node u 2 Bt, u must have at least one neighbor in H, which implies jNH ðxÞj ¼ jHj [ jHj 1 jNHðuÞj [ ¼ k. 2. Some node w of H becomes tight. If x 2 NGnðBt[HÞðvÞ, adding x would cause jNH[fxgðwÞj [ jNHðwÞj ¼ k. 3. jHj [ k. Then, any new member would have at least k non-neighbors in H. In above three cases, can expand would be toggled to negative and jBtj would decrease in the following iterations. We now analyze time complexity of candidates update. Updating values of hash table costs OðjBtjÞ time. Checking tight nodes in H costs OðjHjjBtjÞ time. Adding new member cost O(|N(x)|) time if Bt can be extended. Since jBtj always smaller than N(H), the total time complexity would be O(|H||N(H)|). 6 Heuristic Solutions Since MCKPQ is hard for both optimization and approximation, in this section, we propose heuristic algorithms which are fast and produce high-quality results, while have no theoretical guarantee. At each step, we choose one node from NGðHÞ greedily to extend H until H becomes maximal. The new node is chosen based on the quality function f. Intuitively, given a set of neighbor nodes to extend H, the larger f(v), the more promising H [ fvg could reach a k-plex with large size. So how to define a quality function is the key issue to influence results. Given two k-plex H and H0 with same size, there are two aspects to compare its ability of extension: 1. Number of missing links. K-plex requires each node in it has at most k non-neighbors, so H would be more promising to expand to larger size if jEGðHÞj [ jEGðH0Þj. 2. neighbors of H. This is straightforward because if NðHÞ [ NðH0Þ, H would have a higher chance to become a larger community. So we define the quality function of v as follows: jNðH [ fvgÞj fHðvÞ ¼ jEGðH [ fvgÞj þ maxv2CjNðH [ fvgÞj ð6Þ The intuition is to find neighbors which can achieve largest edges increment and break ties by choosing the one who can add more neighbors to current solution. 7 Experiment We have conducted extensive experiments to test the effectiveness and efficiency of our proposed solutions in finding k-plex community. We first show by case study that the effectiveness of k-plex community model capturing cohesive membership structure and common interests of members in searching results. Then, we report the experimental results on the efficiency tests using different problem settings on different datasets. We use four real-world datasets provided by Stanford network dataset collection:5 (1) DBLP collaboration network; (2) Amazon product co 5 https://snap.stanford.edu/. Table 2 Statistical information of datasets Dataset purchasing network; (3) Google-Web graph; (4) Arxiv collaboration network of condensed matter. The statistical information of datasets is shown in Table 2. All algorithms are implemented in Python and run on a PC with Intel(R) Core(TM) i7 CPU 860 @2.8GHz 2.8GHz with 8GB memory. 7.1 Case Study Since different community query models differ in their objective goodness functions (such as degree, density, and diameter), an uniform quantitive quality evaluation is Fig. 3 Communities of Q ¼ fJiawei Hang; k ¼ 2. a 2plex community H1 and b 2plex community H2 beyond this paper’s scope. Instead, we show the results of our k-plex community search by case studies. We use the dblp coauthor graph, in this graph two scientists are linked together if they have worked on the same paper. We issue the query Q ¼ fJiawei Hang; k ¼ 2 and we get optimal communities shown in Fig. 3, both of which achieve maximized community. To see the effectiveness, we can see members in H1 are all data mining scientists. Jiawei Han and Philip S. Yu have worked on about 50 papers together, and Jian Pei and Ke Wang have worked on more than 20 papers together. H1 and H2 also indicate the ability of extracting overlapping communities of k-plex model. By increasing k by 1 and using the same query, we get another result shown in Fig. 4a, which is a supergraph of H1, this implies the hierarchical structure of k-plex community model. We also issue query with multiple nodes Q ¼ fXuemin Lin; Jeffrey Xu Yug; k ¼ 3. From the result shown in Fig. 4b, we can see members in this community all have published papers in graph database and processing, which are one of research interests of Xuemin Lin and Jeffrey Xu Yu and this is verified by checking their webpage profile. Next, we study the time efficiency of different methods. There are two parameters for querying k-plex community: Q and k. We range |Q| from 1 to 4 and k from 1 to 4. To generate query sets, some preprocessing steps are conducted, for example: For each graph, we first sort all the nodes based on their degree. Then, nodes are partitioned into five buckets in degree order so that each bucket has the same number of nodes. For each pair of (|Q|, k), we randomly sample 20 queries from each degree bucket. So for each query setting, we get 100 queries to evaluate. Diameter of query is also concerned, let Qk denote query nodes for k, then queries are generated with the restriction of diamGðQkÞ k, since for any diamGðQkÞ [ k, there does not exist feasible solution. We denote B&K enumeration by bk_enum, the naive branch-and-bound method by bb_basic, the partial branching method by bb_pb, the incremental candidate generation method by bb_inc. Since most of results of the naive enumeration method do not return in hours thus cannot be collected, we would ignore this method in our comparison. 7.2.1 Varying |Q| We evaluate querying time using different query size |Q|. For each |Q|, we aggregate query time on |Q| and calculate the average. The results of different datasets are listed in Fig. 5. For all datasets, B&K enumeration always has highest query delay. And branch-and-bound based methods can reduce query delay significantly. The simple greedy heuristic solution has lowest cost of time. For different B&B based methods, bb_inc outperforms bb_basic and bb_pb since its incremental candidate generation reduces the repeated evaluation of neighbors. The improvement of bb_pb on bb_basic is not notable. The reason is that even Fig. 6 Amazon: query time varying |Q|. a k ¼ 1, b k ¼ 2, c k ¼ 3, and d k ¼ 4 though partial branching decreases the cost of selecting valid neighbors, i.e., it narrows down the number of children of a branching node in each step, the size of search tree is not shrunk. We can also observe that query time decreases with increasing |Q|. This is due to search space is reduced by adding query nodes. This is illustrated by Q Theorem 10. While j j ¼ 1, searching for maximum kplex would cost hundreds of seconds. When |Q| has increased to 4, query time decreases, respectively, to between 0.01 and 0.1 s. Average query time of each |Q| in Fig. 5 shows the aggregated results on various k. Thus, to get a more clear investigation of varying |Q|, we also fix k for various query sizes. Results of Amazon dataset are shown in Fig. 6, and evaluation of other datasets is omitted due to limited space. For k ¼ 1, the task is equal to search for maximum clique and the B&K enumeration is most costly compared with other methods, while they could get near zero cost for clique finding. 7.2.2 Varying k We test query performance on various k. Even though there is no limitation of k in k-plex search, usually k is set to small values to keep subgraph cohesive, 2 and 3 are frequently used in practice [ 3, 6 ]. We range k from 1 to 4. For each specific k, different query size |Q| is evaluated and we take the average query time. Results from different datasets are shown in Fig. 7. The query time increases with increasing k. This is because the larger k, the larger upper bound of number of non-neighbors; this results in larger search space. B&K enumeration is most costly in different k settings. And B&B based algorithms is better, due to its pruning technique and early termination. In spite of improvement of branch-and-bound technique, the complexity is still exponential with k, since it is an exact solution. And the heuristic greedy algorithm always terminates fast and outperforms others. For fixed k, results of different query size contribute to average query time. Since queries with small |Q| contribute large amount of query time. We also test results of different fixed |Q| to get a more accurate evaluation of various k, shown in Fig. 8. Results of DBLP dataset are listed and others omitted. As we can see from Fig. 7, while k ¼ 1, query time is no larger than 0.1 s. When k is increased to 4, the time consuming would increase to hundreds of seconds. This is due to the exponential time complexity of MCKPQ on k. 7.2.3 Quality of Heuristic The quality of output of heuristic greedy algorithm is also tested, shown in Fig. 9. Since all other methods would output optimal k-plex community, so only one of exact algorithm bb_basic is compared. For each specific k, average size of optimal k-plex is calculated. We feed the same set of queries to heuristic solution and get average size of output. We can see from results heuristic solution has size larger than 50% of optimal averagely. The results also imply the influence of k on the size of output community. In average, smaller k would result in smaller cohesive community. 8 Related Work 8.1 Densest Subgraph Problem Densest subgraph problem is a major research topic in graph analysis. Given a graph G, the task is to find the densest subgraph. Average degree is one of most frequent used measurements in dense subgraph mining. [ 15 ] first gives polynomial time algorithm O(mn) for densest subgraph problem, by transforming it into min-cut instance and using binary search to get the optimal density. An 2approximation algorithm is proposed in [ 9 ], this is done by greedily removing node which is of minimum degree. This greedy algorithm runs in Oðm þ nÞ time. [ 19 ] first introduces densest subgraph problem for directed graphs and Fig. 8 DBLP: query time varying k. a jQj ¼ 1, b jQj ¼ 2, c jQj ¼ 3 and d jQj ¼ 4 gives an OðlogðnÞÞ-approximation algorithm. [ 20 ] then gives a max-flow based exact algorithm firstly and improves the greedy one to Oðm þ nÞ. Even though densest subgraph problem (DSP) admits polynomial time complexity, it becomes NP-hard when where is a size constraint. There are three variants: (a) kdensest subgraph problem (DkS): It requires subgraph jSj ¼ k; (b) densest subgraph at least k(DalkS): It requires subgraph jSj k; and (c) denest subgraph at most k(DamkS): It requires subgraph jSj k. It is hard to approximate DkS and DamkS problem within a constant 1 factor. [ 12 ] introduces an Oðn3Þ-approximate algorithm for DkS. However, DalkS can be approximated with a constant factor. An 3-approximate solution is shown in [ 2 ]. Furthermore, [ 20 ] provides an 2-approximate algorithm for Dalks. Our work differs in that it’s query specific and its goodness metric is size of k-plex, instead of subgraph density. 8.2 Community Search Community search problem is recently proposed in [ 28 ]. The task is to find high-quality community given initial query members. There is no uniform goodness metric for this problem. Minimum degree measurement is used in [ 4, 11, 28 ]. K-truss based variants are introduced in [ 17, 18 ]. [17] requires results are of edge connectivity to avoid disjoint communities. They further define optimal ktruss community by largest k and smallest diameter in [ 18 ]. Since the problem becomes NP-hard, they present ð2 Þapproximate algorithm to solve it. [ 30 ] addresses the free rider effect and proposes query-density based community query problem. [ 29 ] studies triangle densest community problem and shows a max-flow based algorithm. However, the connectivity of answer community is not guaranteed. [ 10 ] studies searching overlapping communities given one single query node. Their model is based on quasi-clique. Our work differs from above works in definition of objective community and unrestricted size of query nodes. maximizing modularity (a global optimal graph partition may not be an local optimal community for query sets). 8.3 Community Detection Community detection is a well-studied topic in graphs and social network analysis. The task is to identify and list all communities given a graph. A common used measurement is modularity [ 24 ]. Modularity maximization is shown NPhard theoretically in [ 7 ]. [ 1 ] introduces rounding technique and efficient algorithms. However, optimizing modularity in large networks has limitation to resolve small communities, shown in [ 14 ]. The above works focus on detecting disjoint communities and graphs are partitioned. Since each person can be involved in multiple groups in social network, there are also works on detecting all overlapping communities [ 32 ]. Our work differs in that it is query oriented. And query result can be different from that of 9 Conclusion In this paper, we study querying optimal k-plex community problem, that is, given a set of nodes Q, finding optimal k-plex community consisting of Q. We show that our proposed community model can guarantee the good quality of query results theoretically. Based on the fact the problem is NP-hard and hard to approximate, we design efficient branch-and-bound method and further improve it by technique of fast generating candidates. We then give heuristic solution of low time cost. Experiments show the effectiveness of our k-plex model and efficiency of our proposed methods. Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://crea tivecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. 1. Agarwal G , Kempe D ( 2008 ) Modularity-maximizing graph communities via mathematical programming . Eur Phys J B 66 ( 3 ): 409 - 418 2. Andersen R , Chellapilla K ( 2009 ) Finding dense subgraphs with size bounds . In: Proceedings of algorithms and models for the web-graph, 6th international workshop , WAW 2009, Barcelona, Spain, February 12-13 , 2009 , pp 25 - 37 3. Balasundaram B , Butenko S , Hicks IV ( 2011 ) Clique relaxations in social network analysis: the maximum k-plex problem . Oper Res 59 ( 1 ): 133 - 142 4. Barbieri N , Bonchi F , Galimberti E , Gullo F ( 2015 ) Efficient and effective community search . Data Min Knowl Discov 29 ( 5 ): 1406 - 1433 5. Batagelj V , Zaversnik M ( 2003 ) An O (m) algorithm for cores decomposition of networks . arXiv preprint cs/0310049 6. Berlowitz D , Cohen S , Kimelfeld B ( 2015 ) Efficient enumeration of maximal k-plexes . In: Proceedings of the 2015 ACM SIGMOD international conference on management of data, Melbourne , Victoria, Australia, May 31-June 4, 2015 , pp 431 - 444 7. Brandes U , Delling D , Gaertler M , Gorke R , Hoefer M , Nikoloski Z , Wagner D ( 2008 ) On modularity clustering . IEEE Trans Knowl Data Eng 20 ( 2 ): 172 - 188 8. Bron C , Kerbosch J ( 1973 ) Algorithm 457: finding all cliques of an undirected graph . Commun ACM 16 ( 9 ): 575 - 577 9. Charikar M ( 2000 ) Greedy approximation algorithms for finding dense components in a graph. In: Proceedings of approximation algorithms for combinatorial optimization , third international workshop, APPROX 2000, Saarbru¨cken, Germany, September 5- 8 , 2000 , pp 84 - 95 10. Cui W , Xiao Y , Wang H , Lu Y , Wang W ( 2013 ) Online search of overlapping communities . In: Proceedings of the 2013 ACM SIGMOD international conference on management of data, ACM , pp 277 - 288 11. Cui W , Xiao Y , Wang H , Wang W ( 2014 ) Local search of communities in large graphs . In: International conference on management of data, SIGMOD 2014 , Snowbird , UT , USA, June 22-27, 2014 , pp 991 - 1002 12. Feige U , Kortsarz G , Peleg D ( 2001 ) The dense k-subgraph problem . Algorithmica 29 ( 3 ): 410 - 421 13. Fortunato S ( 2010 ) Community detection in graphs . Phys Rep 486 ( 3 ): 75 - 174 14. Fortunato S , Barthe´lemy M ( 2007 ) Resolution limit in community detection . Proc Nat Acad Sci 104 ( 1 ): 36 - 41 15. Goldberg AV ( 1984 ) Finding a maximum density subgraph . University of California, Berkeley, Berkeley 16. Ha˚stad J ( 1999 ) Clique is hard to approximate within n1 . Acta Math 182 ( 1 ): 105 - 142 17. Huang X , Cheng H , Qin L , Tian W , Yu JX ( 2014 ) Querying k-truss community in large and dynamic graphs . In: International conference on management of data, SIGMOD 2014 , Snowbird , UT , USA, June 22-27, 2014 , pp 1311 - 1322 18. Huang X , Lakshmanan LVS , Yu JX , Cheng H ( 2015 ) Approximate closest community search in networks . PVLDB 9 ( 4 ): 276 - 287 19. Kannan R , Vinay V ( 1999 ) Analyzing the structure of large graphs . Rheinische Friedrich-Wilhelms-Universita¨t Bonn , Bonn 20. Khuller S , Saha B ( 2009 ) On finding dense subgraphs . In: Proceedings of automata, languages and programming , 36th international colloquium, ICALP 2009 , Rhodes, Greece, July 5- 12 , 2009 , Part I , pp 597 - 608 21. Li K , Lu W , Bhagat S , Lakshmanan LVS , Yu C ( 2014 ) On social event organization . In: The 20th ACM SIGKDD international conference on knowledge discovery and data mining , KDD '14 , New York, NY, USA, August 24- 27 , 2014 , pp 1206 - 1215 22. Li R , Qin L , Yu JX , Mao R ( 2015 ) Influential community search in large networks . PVLDB 8 ( 5 ): 509 - 520 23. McDermott R ( 2000 ) Knowing in community . IHRIM 19 24. Newman ME ( 2004 ) Fast algorithm for detecting community structure in networks . Phys Rev E 69 ( 6 ): 066 , 133 25. Rowe L , Nadeau J , Turner R , Frankel W , Letts V , Eppig J , Ko M , Thurston S , Birkenmeier E ( 1994 ) Maps from two interspecific backcross dna panels available as a community genetic mapping resource . Mamm Genome 5 ( 5 ): 253 - 274 26. Seidman SB , Foster BL ( 1978 ) A graph-theoretic generalization of the clique concept . J Math Sociol 6 ( 1 ): 139 - 154 27. Song DM , Tumminello M , Zhou WX , Mantegna RN ( 2011 ) Evolution of worldwide stock markets, correlation structure, and correlation-based graphs . Phys Rev E 84 ( 2 ): 026 , 108 28. Sozio M , Gionis A ( 2010 ) The community-search problem and how to plan a successful cocktail party . In: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM , pp 939 - 948 29. Tsourakakis C ( 2015 ) The k-clique densest subgraph problem . In: Proceedings of the 24th International Conference on World Wide Web, International World Wide Web Conferences Steering Committee , pp 1122 - 1132 30. Wu Y , Jin R , Li J , Zhang X ( 2015a ) Robust local community detection: on free rider effect and its elimination . Proc VLDB Endow 8 ( 7 ): 798 - 809 31. Wu Y , Jin R , Li J , Zhang X ( 2015b ) Robust local community detection: on free rider effect and its elimination . PVLDB 8 ( 7 ): 798 - 809 32. Xie J , Kelley S , Szymanski BK ( 2013 ) Overlapping community detection in networks: the state-of-the-art and comparative study . ACM Comput Surv (csur) 45 ( 4 ): 43

This is a preview of a remote PDF: https://link.springer.com/content/pdf/10.1007%2Fs41019-017-0051-3.pdf

Yue Wang, Jian Xun, Zhenhua Yang, Jia Li. Query Optimal k-Plex Based Community in Graphs, Data Science and Engineering, 2017, 1-17, DOI: 10.1007/s41019-017-0051-3