Query Optimal k-Plex Based Community in Graphs
Query Optimal k-Plex Based Community in Graphs
Yue Wang 0 1
Jian Xun 0 1
Zhenhua Yang 0 1
Jia Li 0 1
K-plex 0 1
0 Zhenhua Yang
1 Huawei Technologies Co. Ltd. , Xi'an , China
Community search problem, which is to find good communities given a set of query nodes in a graph, has attracted increasing research interest recently. Though various measurement models have been proposed to define and solve community search problem. Few of them could define a community concisely and have good quality of query results. They either involve additional constraints for modeling communities, such as size and diameter, or suffer from the free rider effect, i.e., include irrelevant subgraphs. In this paper, we propose a new k-plex based community model for community search. We show that our model not only is simple and clear, but also meets with basic requirements of defining a community search problem. We formulate the maximum k-plex community query (MCKPQ) problem, that is, given a set of query nodes Q, searching for optimal k-plex containing Q. We prove that MCKPQ is NP-hard, and it is hard to approximate in any constant factor. We first give exact solutions. Then, we propose an efficient branch-and-bound (B&B) method and design an effective upper bound function and a pruning strategy. Furthermore, we optimize the basic B&B by fast candidate generation. We also give a fast heuristic solution,
Community search; Graph algorithm
Hong Kong University of Science and Technology,
Clear Water Bay, Hong Kong
which produces high-quality results in practice. The
effectiveness of our model of community and the efficiency
of our methods are verified by elaborate experiments.
Community, which is defined as set of nodes densely
connected internally, is considered as an important
structure in networks and plays a significant role in graph
mining. Community detection is a task to find communities
for a graph, and it has many applications in different fields,
such as: (1) mining sets of highly correlated stocks [
detecting DNA motif in bioinformatics [
]; (3) finding
Web sites communities sharing similar topics [
Numerous works, which are based on different community
models, have been developed to detect communities
Since nowadays graph data become large and dynamic,
much research attention has been transferred from the
community detection problem to the community query
10, 11, 17, 28
]. Unlike community detection,
which enumerates all communities, community query aims
to find the communities that contain a set of query nodes Q.
Compared with community detection, community
search by query nodes could avoid: (a) the time consumed
to find all communities of entire large graphs; (b) the
inflexible global parameter for the community criterion;
(c) the difficulty to deal with the dynamically evolving
]. In addition, community search has many
applications in real-life networks on its own for query
• Social group finding In an event-based social networks
such as Meetup,1 a common task is to personally search
for a group with strong social ties among people, to
organize interesting activities.
• Friends recommendation In popular social network
applications, e.g., Facebook,2 Twitter,3 and Instagram,4
community search could help predict potential
friendship relationships for users, especially for new users
with few friends.
• Frequent pattern mining When transactions and items
are modeled as a bipartite graph, then dense
communities group items which are bought together frequently
in different transactions. Given some specific items,
community search would help identifying frequent item
sets and further benefiting marketing.
It is not straightforward to define what is a community
and to formulate community search problem precisely.
There are several aspects to be concerned, among which
the most important are how to measure the goodness of a
community and how to define it. Although there is no
standard answer, we summarize common requirements that
a formulated community search problem should meet:
1. Cohesive A community should be cohesive. Each
member within it should be familiar with others.
Currently, many cohesive measurements have been
proposed to ensure the cohesiveness of a community,
such as subgraph density [
], minimum degree ,
and subgraph diameter [
2. Number of query nodes When jQj ¼ 0, community
query is identical to community detection. Some works
limit jQj ¼ 1, a single query node, which is reasonable
but not general in all cases. It is required jQj 1
sometimes. For example, Alice and Bod are holding a
party, and they want to invite friends who have strong
social ties with both of them.
3. Avoid free rider effect Lots of community
measurements lead to free rider effect [
], which means
including unrelated subgraphs in query results and
making communities impure. Thus, it is required that
the query results should not have the free rider effect.
4. Connectivity Suppose there is a community H for the
community query, then not only H should contain Q,
but also H is supposed to be connected. However, the
latter requirement is not always guaranteed in different
5. Size of community In some scenarios, we are interested
in not only cohesive communities given a query, but
1 https://www.meetup.com/. 2 https://www.facebook.com/. 3 https://twitter.com/. 4 https://www.instagram.com/.
also those with size bound. As shown in [
activity may not be triggered on if the size of social
group is less than the activity’s lower bound for the
number of participants. Sozio and Gionis [
study the problem of querying the optimal community
with an upper bound of the size.
6. Few parameters In spite of above constraints, as
indicated in [
], definitions of problems which are
simple to formulate have few parameters are more
Currently, several models of community have been
proposed to solve community query problem
4, 11, 17, 18, 28, 31
]. However, none of them could satisfy
the above requirements simultaneously, which means that
more constraints would be added to their proposed models
to cover missing requirement, and making the problems
more complicated. Take k-truss model  as an example,
Huang et al. [
] use k-truss property to ensure community
cohesiveness and use edge connectivity to ensure the
community connectivity. K-truss model is further
improved by restricting community diameter, and avoiding
free rider effect in [
]. k-core based (minimum degree)
model is also used to define community in [
4, 11, 22, 28
Sozio and Gionis  first introduce minimum degree as
the goodness function of community and give a linear time
algorithm for unbounded size problem. Cui et al. [
improve the unbounded size version using the local search
strategy, but leave the problem with bounded size
unsolved. Sozio and Gionis [
] also add the bounded
diameter and the size constraint to keep community small.
Cui et al [
] introduce a quasi-clique based community to
discover overlapping communities for single query node,
but include too many parameters and make the problem
hard to set appropriate parameters, as pointed out in [
To address the free rider effect, which is common in most
community models, Wu et al. [
] propose a query-density
based model. In [
], the random walk is used to measure
the density of each node from query nodes, and the
problem is to find a community which maximizes the
querybased density. However, the result communities have no
size guarantee in this measurement.
To address the above problems, in this paper, we
proposed a novel k-plex based community model for query
communities. K-plex, first introduced in [
], is one kind of
relaxation of a k-clique. A graph H is called a k-plex if for
each node v 2 H, v has at least jHj k neighbors. In other
words, each member could have missed at most k
nonneighbors in H. Clique is a special case of k-plex when
k ¼ 1.
The reason of that we model a community as a k-plex is
it can meet all above requirements for the community
search problem, that is, a k-plex community H has good
cohesive structure, bounded diameter, upper bound size
guarantee. At the same time, the connectivity of H is
guaranteed as H becomes large. The overview properties of
a k-plex H are listed as below:
1. The diameter of H is bounded by k;
2. the lower bound of the density of H is jH2j k, which
makes community cohesive with its size increasing;
3. any k-plex larger than k þ 1 is connected;
4. there is an upper bound for |H|;
5. when H is optimal (defined in Section 2) for query
nodes, it can avoid free rider effect;
6. since a k-plex must be a (k þ 1)-plex, k-plex
communities could form a hierarchy structure for query nodes,
by querying with increasing k.
The detailed analysis would be shown in Sect. 2. In spite of
listed properties, querying optimal k-plex community only
needs specifying Q and k, making issuing a query as easy
Although the description of querying k-plex community
problem is simple and clear, we show it is NP-hard. In
addition, it is even NP-hard to approximate with O(n)
factor in polynomial time.
Thus, to search k-plex community efficiently, we first
introduce two enumeration methods to get the exact
solution, while the later can prevent repeated enumeration.
Next, we come up with an efficient branch-and-bound
framework. Furthermore, we reduce the search space by
carefully defining a upper bound function. In addition, we
also introduce global pruning and local pruning techniques
to filter out unqualified neighbors. Then, we present two
optimization methods, partial branching and incremental
generation, to accelerate the candidate generation
processes. In the end, we present heuristic solutions which can
generate results fast with good quality in practice.
The rest of the paper is organized as follows. We give
the problem definition in Sect. 2. We present baseline
solutions in Sect. 3. In Sect. 4, we propose an efficient
branch-and-bound framework. In Sect. 5, we introduce two
optimization techniques to accelerate the B&B framework.
In Sect. 6, we discuss a heuristic solution. Experiments are
reported in Sect. 7. We discuss related works in Sect. 8 and
conclusion in Sect. 9.
2 Problem Statement
In this section, we give some preliminaries and formulate
our k-plex based community problem, and then we
compare our model with current ones and present the analysis
of our proposed problems.
2.1 Problem Definition
In this paper, we consider simple undirected graphs
G(V, E) which have no weight on nodes and edges. Let
n ¼ jV ðGÞj and m ¼ jEðGÞj. We denote the neighbors of a
node v 2 V by N(v) so the degree of v is degGðvÞ ¼ jNðvÞj.
For any H V , the subgraph induced by H is denoted as
G[H] with nodes V ðG½H Þ ¼ H and edges
EðHÞ ¼ fðv1; v2Þjv1; v2 2 H; ðv1; v2Þ 2 EðGÞg. Sometimes
we replace G[H] by H if the context is clear (Table 1).
As a relaxation of clique, k-plex was first introduced in
]. We now give the definition of connected k-plex.
Definition 1 (Connected k-plex) Given a graph G and a
constant k, a subgraph G[H] is said to be a connected
kplex if G[H] is connected and for any v 2 H,
degG½H ðvÞ jHj k.
Lemma 1 If G is a k-plex. Then, any vertex-induced
subgraph from G is also a k-plex.
Compared with k-core model, whose lower bound of
minimum degree is a constant k, the minimum degree of a
k-plex is increased with its cardinality. The larger of a
k-plex, the more cohesive it is.
However, a k-plex defined by a single k would form
different structural subgraphs. Consider k ¼ 1, which is the
minimum k we could provide, then both an edge and a
clique could satisfy 1-plex property. Clearly the former one
is less concerned than the latter. So we define the k-plex
community problem with size constraint.
Definition 2 (Connected k-plex query) Given a graph G,
a set of query nodes Q 2 G, a constant k and size constraint
c, find a subgraph G[H], such that:
1. Q H VðGÞ and G[H] is a connected k-plex.
2. |H| is no less than c.
The results of CKPQ could be exponential. Consider a
clique of size n, set an arbitrary single query node q, k ¼ 1
and c ¼ 1, then there would be 2n 1 candidate results(each
subgraph whose size larger than 1 containing q would be a
feasible solution). Now we give the optimized version of
k-plex community search problem.
Definition 3 (Maximize connected k-plex query) Given a
graph G, k, a set of query nodes Q 2 G find a subgraph
G[H] whose size is maximized among all the solutions to
Theorem 1 For each q 2 Q, the maximized connected
k-plex consisting of Q must be one of maximal k-plex of q.
Proof Clearly the optimal solution H which consists of Q
must be maximal, otherwise we can add extra nodes to
current solution to make it larger, which results in better
solution and contradicts with that H is optimal. Next,
q 2 Q, Q H implies H is maximal k-plex of q. h
2.2 Compare with k-core Based Community
There are already some works about searching community
with minimum degree guarantee [
]. We make a
comparison with k-plex community here and show what is
the difference. For a community H, previous works mainly
focus on make dðHÞ large, where dðHÞ defines the lower
bound of neighbors for each node in |H|. However, k-plex
defines upper bound of the number of each node would
miss. That results in the larger H is, the more neighbors
each node has. As a result, k-plex community is ‘‘denser.’’
Definition 4 (k-core) Given graph G(V, E), a subgraph
G[H] is called k-core or a core of order k iff 8v 2
H; degH ðvÞ k and H is a maximum subgraph with this
Without size bound, deciding whether there exists a H
such that dðHÞ k can be done in linear time, shown in
]. However, with size bound, the problem becomes
NP-hard. Authors of  provided efficient solution for the
unbounded version and proposed bounded problem
unsolved, shown below.
Definition 5 (mCST [
]) Given a graph G(V, E) and a
query node v0 2 V and a constant k, find H V such that (1)
dðG½H Þ k; (2) G[H] is connected; (3) |H| is minimized.
Note that CKPQ(Q, k, c) is identical to decision version
of mCST, by setting jQj ¼ 1; k0 ¼ c k; c0 ¼ c. In this
paper, we only focus on the optimization problem
MCKPQ, since the solution of MCKPQ can help solving
the CKPQ problem.
2.3 Hardness Results
In this section, we show the decision version of MCKPQ,
CKPQ is NP-complete. We further show that MCKPQ is
also hard to approximate in any constant factor.
Theorem 2 The CKPQ Problem is NP-Complete for any
constant k and jQj 1.
Proof Clearly, given Q, k, c and a candidate subgraph H,
we could test whether H is a k-plex with size larger than
c and contains Q in polynomial time.
Next, we reduce from k-plex problem, which is shown
NP-complete for any positive integer k in [
]. Given a
graph G(V, E) and k, the k-plex problem is to decide
whether there exists a k-plex of size c in G. We now
construct an instance of CKPQ by constructing a new
graph G0ðV0; E0Þ as follows: we create arbitrary set of nodes
Q and add them to V, so V ðG0Þ ¼ Q þ VðGÞ, for each node
v in Q, we connected v to all the other nodes, i.e.,
EðG0Þ ¼ EðGÞ [ fðv1; v2Þjv1 2 Q; v2 2 V 0; v2 6¼ v1g,
obviously G0½Q is a clique, and we finish construction by
setting c0 ¼ c þ jQj and Q as query nodes. We show that
kplex problem is a Yes-instance iff the decision version of
MCKPQ is a Yes-instance. Suppose subgraph G[H] is a
solution of k-plex problem, then H0 ¼ G0½H [ Q is a
solution of MCKPQ: (1) Each pair of nodes in H0 is
adjacent or connected by at least one node in Q; (2)
jH0j jQj þ c ¼ c0; (3) for any node v 2 H, degG½H0 ðvÞ ¼
degG½H ðvÞ þ jQj jHj k þ jQj ¼ jH0j k0, for any node
v 2 Q, degG½H0 ðvÞ ¼ jQj 1 þ jHj ¼ jH0j k0. So G½H0 is
a k-plex. For the other direction, if we have a connected
kplex G0½H0 with size c0, then G½H0 n Q ¼ G0½H0 n Q is a
kplex by Lemma 1 and has size c0 jQj ¼ c, which
completes the proof. h
Theorem 3 For any [ 0, it is hard to approximate for
MCKPQ problem in polynomial time within a factor n1 .
Proof Works in [
] show it is n1 -hard to approximate
the maximum clique problem, which aims to find the largest
clique in a given graph. Given an instance G of MCP, we
can perform a gap-preserving reduction to MCKPQ by
setting G0 ¼ G, Q ¼ ;, and k ¼ 1, whose solution is
identical to MCP. So there is no ðn1 Þ-approximation
algorithm which runs in polynomial time for MCKPQ,
where n ¼ jVðGÞj. h
2.4 Problem Analysis
In this subsection, we first show that the solution of
MCKPQ problem can avoid the free rider effect, then we
present the properties of our proposed community model,
and show that it can meet the requirements for community
search problem, as summarized in Sect. 1.
2.4.1 Free Rider Effect(FRE)
In community detection problem, the free rider effect is
under some goodness metric, the results of community
search admit including irrelevant subgraph [
According to , most of community metrics suffer from
free rider effect, such as minimum degree, graph density,
subgraph modularity and so on. We first give an example of
FRE and formal definition of it based on [
], and then we
show the formulation of MCKPQ can avoid free rider
Minimum degree dðHÞ is commonly used as community
measurement in [
4, 11, 22, 28
], the bigger, the better. Take
graph G in Fig. 1 for example, suppose Q ¼ f g
g . Then
most subgraphs in G consisting of fg; e; hg could be as
results of this query with optimal value 2, for example the
subgraph dðfg; h; e; d; k; jgÞ ¼ 2. But obviously it cannot
be returned as best community, since k, j are too far away
from g and make no contribution to optimal value. The
same scenario also exists in k-truss model in .
Definition 6 (Free Rider Effect [
]) Suppose H is a
solution to a community definition according to a goodness
function f(.). The community definition is called suffers
from free rider effect, provided whenever there is an
optimal solution H to the community detection problem,
then f ðH [ H Þ f ðHÞ. This means the combination of
communities H and H is no worse than H.
The definition of MCKPQ could avoid free
Proof Given Q and k, the optimal community H is the
largest k-plex containing Q. So for any other k-plex
fH : Q Hg. And f(.) is simply |H|. There are two cases
between H and H :
1. H H . Then f ðH [ H Þ ¼ f ðH Þ ¼ jH j [
jHj ¼ f ðHÞ.
2. H 6 H . Since H is optimal and maximal, then H [
H cannot be k-plex(otherwise, H ¼ H [ H since
H [ H is larger) and becomes not feasible any more.
We cannot evaluate H [ H by f(.).
In both cases, we cannot get f ðH [ H Þ
nition of MCKPQ could avoid FRE.
f ðHÞ. So
2.4.2 Properties of k-plex
In this section, we dive into properties of k-plex and
describe how it can meet requirements of community
search problem stated previously. These properties include
bounded size, bounded diameter, connectivity.
Given single k, then any k-plex community has global
Theorem 5 Given a connected graph G(V, E), for any
node v, its optimal k-plex size satisfies
ðk þ 2Þ þ
ðk þ 2Þ2
Proof Suppose a k-plex H. Then dðHÞ jHj k and
jEðHÞj jHjðjH2j kÞ. Then, we contract H to one node in G to
get G0. From the fact G is connected, G0 is also connected.
So jEðG0Þj jV ðGÞj jHj þ 1 1 ¼ jVj jHj. Also |E| is
the upper bound of number of edges of this two graphs H
jEj jEðHÞj þ jEðG0Þj
kÞ þ jVj
Solving above inequation, we get the result. If we take Q into consideration, then we have a local bounded community size stated as follows:
Given an instance of MCKPQ(G, Q, k), the
mq2iQnðjNGðqÞjÞ þ k
Proof Suppose H is one
dðHÞ jHj k, it follows jNH ðqÞj
Since H G, then jNGðqÞj
jHj k þ jNGðqÞj. Because
jHj minq2QðjNGðqÞjÞ þ k.
of solutions. Then
jHj k for any q 2 Q.
jNH ðqÞj. So we have
arbitrariness of q, so
We next discuss the connectivity of k-plex. We first
show that for a disconnected k-plex, upper bound of its size
is related to the number of its disjoint component and
k. Based on that, we show when a k-plex is larger than a
threshold, it must be connected.
Theorem 7 If disconnected k-plex H ¼
disjoint components (a [ 1), then jHj\ a a 1 ðk
i¼1 Ci has a
Proof Suppose Ci is the minimized component of H, then
jCij jHa j. For each node in Ci, it has at least ða 1Þ
nonneighbors, so a 1 k 1 ) a k.
jH2j jCij þ ða
a þ ðk
Actually this bound is tight. Consider the disconnected
3-plex VðHÞ ¼ fg; h; j; kg in Fig. 1 with two components.
jHj ¼ 2 ð3 1Þ ¼ 4.
Corollary 1 If a k-plex H whose size is larger than
ðk þ 1Þ, then H must be connected.
Proof We prove by contradiction. Suppose H is
disconnected, then it has 1\a k components, and
H a a 1 ðk 1Þ. That implies k þ 1\ a a 1 ðk 1Þ by
Theorem 7. However,
k þ 1
1Þ ¼ ða
Theorem 8 If G is a connected k-plex, then jGj
Proof Since G is connected, then every node in G would
at least have one neighbor, which implies jGj k 1. h
From above we can see that when H is larger than
ðk þ 1Þ, it must be connected. As a result, while finding
kplex community, we do not need to do the connectivity
checking for most of them, which is a main process in most
community search algorithm [
]. If we make the
assumption for each k-plex community query, there always
exists H such that jHj [ k þ 1, in this case, connectivity is
not a concern any more.
Since diameter is another additional constraint for
community query [
]. Next, we show that any k-plex
has bounded diameter by k.
Theorem 9 If a connected k-plex G has the diameter d,
then k d.
Proof Suppose v1 v2 vdþ1 is the longest shortest path
in G. For node v1, there are at least d 1 non-neighbors in
this path, which implies k 1 d 1. h
Corollary 2 All nodes of optimal solution of MCKP are
in k-hop neighbors of Q.
In this section, we state rationality of k-plex community
search problem. Even though the problem is simple to
model, interestingly it has nice properties for size,
connectivity, diameter, free rider effect, all of which are
major requirements for most of community search
problem. Next, we discuss solutions to MCKPQ problem.
First, we give two basic enumeration algorithms as
baselines. Next, we improve by presenting
branch-andbound based algorithm and introduce the upper bound
function and pruning strategies. Furthermore, we present
two methods to accelerate the basic B&B algorithm.
3 Baseline Methods
We first give the baseline method to solve the MCKP
problem. Since MCKP is NP-hard and it is hard to
approximate by any linear function in polynomial time, we
use the generate and verify method to explore the whole
search space. Algorithm 1 describes this basic framework:
It enumerates all k-plexes consisting of Q, the methods of
enumeration differ but all using neighbors of current k-plex
being enumerated, keep those with largest size(note that
there maybe multiple largest k-plexes with the same size),
among them it returns the connected one. Otherwise, it
claims there does not exist solution for the query.
The search space of enumeration decreases with
increasing query size, based on Theorem 10.
Theorem 10 Given Q ¼ fq1; . . .; qng and
Qi ¼ fq1; . . .; qigð1 i\nÞ. Let Mi denote set of all
maximal k-plex of Qi. Then for any 1 i j , Mj Mi.
Next, we show that if maximized k-plexes are not
connected, then all k-plexes are not connected.
Theorem 11 If all maximized k-plex of Q is not
connected, then there does not exist connected k-plex
consisting of Q.
Proof We prove by contradiction. Suppose H ¼ ia¼1 Ci
is maximized k-plex of Q, disconnected. And there exists
another connected k-plex H0 that satisfies jH0j\jHj. Since
the enumeration is based on neighbors of Q and H is
disconnected, then G[Q] is disconnected either, at the same
time, each component Ci contains at least one of the query
nodes. H0 includes all query nodes, and H0 would intersect
each of component of H. Now H0 can be represented by
H0 ¼ ð ia¼1 IiÞ ðH0 n HÞ. We now can construct a
connected k-plex H whose size is equal to |H| by following
steps. Initially set H ¼ H, at each iteration remove the
node with minimum degree among all the components and
add one node from H0 until all nodes in H0 n H are added to
H0, H is a connected k-plex, contradicting with all H are
To achieve both maximization and connectivity
constraint of MCKP, based on Algorithm 1, we only need to
check the connectivity of those k-plexes with largest size
with Theorem 11. The NaiveEnum (shown in algorithm 2)
is used to generate all k-plexes of Q. Starting with Q, it
searches each of Q0s neighbors to check whether it is
validate to extend Q until there is no such neighbors. Even
though NaiveEnum would generate all maximal k-plexes, it
can result in repeated generation, i.e., enumerate identical
k-plex multiple times.
Example 1 When Q ¼ fgg and k ¼ 2 in Fig. 1, then the
maximal 2-plex fg; a; e; hg could be enumerated by g !
e ! a ! h and g ! h ! a ! e. Hence, tedious
computation may happen in naive enumeration.
To prevent stated drawbacks and reduce the
computation, we design algorithm basing on Bron–Kerbosch
], which is used to generate all maximal cliques
recursively given a graph. Here, we changed it to
enumerate all maximal k-plex containing a set of specific
query nodes. At each iteration, three sets R, P, X are feed
to Maximal Search (MS), shown in algorithm 3. R is
current founded k-plex, which is to be extended; P is
candidate nodes, each of which can enlarge R, and nodes in X
can also extend R but they are used in previous search.
When there is no candidate can be used to enlarge R, then it
is maximal. And if there are all candidates have already
been used in previous search, i.e., X is not empty, it means
that maximal k-plex containing R is already enumerated.
Otherwise, we perform DFS search for all candidates. To
get maximal k-plex of Q, MS is initialized with R ¼
fQg; P ¼ fv : v 2 NðQÞ; R ¼ fv : v 2 NðQÞ;
fvg [ Q is k ¼ plexg; X ¼ ;.
The general generate and test procedure shown above is
costly. First, for each found k-plex, it expands all its
neighbors no matter whether they can lead to a better solution or
not. Second, all candidates are enumerated equally, and each
neighbor is chosen with equal probability to expand current
solution. However, some candidates and neighbors are more
potential to enlarge the size of current k-plex. To reduce the
search space and improve efficiency, we develop
branchand-bound(B&B) based algorithm shown in Algorithm 4.
B&B paradigm is widely used for solving large-scale
NP-hard combinatorial problems. An explicit enumeration
for hard problem is normally time consuming due to
exponentially increasing number of potential solutions.
Using bounds for the function to be optimized plus score of
current best solution enables B&B to search parts of the
candidate space only.
Algorithm 4 is to get the maximized k-plex of Q. It is
DFS-based branch-and-bound, which is shown efficient
practical in solving various hard problems.
At each iteration, H is the current k plex, we select
validate nodes B (candidate neighbors are generated by
cand_gen) from H0s neighbors and branch on each node of
set B. Algorithm 4 also generates maximal k-plexes of Q;
however, it only searches partial searching space instead of
all of them, which is performed in NaiveEnum and
B&KEnum. This is done by following strategies:
1. define effective upper bound function for candidate, if
current candidate k-plex cannot improve the quality
better than current optimal one, then it is discarded. So
the search space starting with this branch is removed.
2. for each candidate to be extended, select neighbors of
candidate with high priority first to get optimal k-plex
3. prune invalidate neighbors.
4.1 Upper Bound Function
Upper bound function is crucial in B&B algorithm, loose
upper bound would have no ability to prune search space.
To derive efficient upper bound function of current k-plex
H, we consider two cases.
Definition 7 (tight nodes) A node v 2 H is called tight if
degH ðvÞ ¼ jHj k
The upper bound function of H is defined as follows:
1. There exist tight nodes in H. T Suppose
V ¼ fv : v 2 H; v is tightg. Then at most j v2V NðvÞj
nodes could be added to H, this means every node
added to H must be one of neighbors of tight node v
(any node non-neighborhood of v is unqualified
because v has already k 1 non-neighborhood in H).
2. There exist no tight nodes in H. In this case, upper
bound is defined as
jHj þ minv2HðjNGnHðvÞj þ ðk þ NHðvÞ jHjÞÞ. For
each node, it can expand |N(v)| neighbors and limited
number of non-neighbors candidates, and the
maximized size H can be expanded depends on lowest
candidates number of v 2 H.
Upper bound of above two cases involves |N(v)|. We
further improve the bound quality by replacing N(v) with
fu : u 2 NðvÞ; dðH [ fugÞ jHj þ 1 kg. By not counting
neighbors that cannot be used to expand H, the gap
between limitation of H and bound function is reduced.
4.2 Prune Unqualified Nodes
Unqualified nodes refer to those that are not able to
produce maximized k-plex of Q or even k-plex of Q. Next, we
introduce two strategies for pruning unqualified nodes: one
is based on the query distance, the other is based on the
Definition 8 (Query Distance) Given a graph G(V, E)
and a set of query nodes Q V , and a arbitrary node
v 2 V , the query distance between v and Q is
distðv; QÞ ¼ minv2Q distðv; qÞ, where dist(v, q) is the length
of shortest path between v and q.
Based on Theorem 9, the query distance of any node v
and Q is no greater than k. So the candidate nodes set
Vc ¼ ii¼¼j1Qjfv : v is k hop neighbor of qg . We only
perform the search in the subgraph G0 ¼ G½Vc .
Furthermore, not all neighbors of current candidate H
are supposed to branch on. Let current maximized k-plex is
denoted by H . Neighbors of H are pruned by following
strategies: degree pruning and core number pruning.
Degree pruning is straightforward: if for node v 2 NðHÞ,
degG0 ðvÞ\jH j k, then it is unqualified for branching.
Since for any subgraph consisting of v, its minimum degree
is no greater than deg(v).
Definition 9 (core number) The core number cv of node v
is the highest order of a core that contains this node.
For any subgraph H
G, if v 2 H, then
Proof This can be proved by contradiction. Let
dðHÞ [ cv and H0 denote dðHÞ core of G, then H H0,
otherwise, H0 [ H would result in a larger subgraph whose
minimum degree is no less than dðHÞ, contradicting H0 is
maximized. Since H 2 H0 and v 2 H, then
cv dðH0Þ ¼ dðHÞ. This contradicts that dðHÞ [ cv. h
By Theorem 12, a node v cannot be included in any
subgraph whose minimum degree is larger than cv. So we
remove nodes in N(H) if cv\jH j k, where H is current
best solution. Note that calculating core number for each
node is a preprocessing step, and this can be done in linear
time by core decomposition using method in [
We now analyze time complexity of basic candidate
generation. Testing whether H is k-plex can be done in
O(|H|) time. The algorithm 5 would run in OðdmaxjHj2Þ,
where dmax ¼ maxv2H degGðvÞ.
5 Optimization on B&B
5.1 Partial Branching
In basic branch-and-bound algorithm, given a k-plex H, we
simply extend H by branching all the validate neighbors
B of H until all of them become maximal or cannot produce
larger results by upper bound function. That is, we
enumerate all branches and give them the same priority. The
following two observations lead us to devise more subtle
branching schema. First, the minimum degree of optimal
k-plex is largest among others. Second, bottleneck of
extending a current k-plex H always depends on the nodes
those have the minimum degree.
So the improved candidate generation method partial
branching makes following improvement:
• Instead of representing each current k-plex H as a set of
nodes, it uses min-heap structure with node v as key,
degHðvÞ as value.
• At each iteration, we only consider neighbors of node
with minimum degree in H and use them to expand H.
Next we show the partial branching would not miss any
Theorem 13 Given any two k-plex H0 and Hk in G, which
satisfies H H0; H0is connected. There always exists a
sequence v1; . . .; vk such that 8i 2 ½0; k 1 ; vi 2 Hkn
H0; Hiþ1 ¼ Hi [ fviþ1g; ðviþ1; uiÞ 2 EðGÞ, where
ui ¼ arg minui2Hi;9v2HknHi;ðui;vÞ2Eðui;vÞdegG½Hi ðuiÞ.
Proof We can construct this sequence by following steps.
First, we partition G½Hk into two parts
A ¼ G½H0 ; B ¼ G½Hk n H0 . Then at each step, we move
one node from B to A, by choosing the crossing edge(one
endpoint in A, the other in B) whose degree of the node in A
is minimum compared with other crossing edges. The
process is always success until B is empty. Suppose at
some step A0; B0 there is no crossing edge and B0 is not
empty, this would imply G½A0 [ B0 ¼ G½Hk is not
connected, which contradicts the assumption. h
We now analyze complexity of partial candidate
generation. Since we only extend neighbors of node v with
minimum degree in H and H is a min-heap, fetching v can
be done in O(1) time. Let dmin denote degree of node v.
And up to dmin, updates are performed to H due to
expanding. Updating a heap H can be done in O(log(|H|)).
Totally, the time complexity is OðdminjHjÞ
5.2 Incremental Candidate Update
We now introduce an alternative to reduce cost of
candidate neighbors generation. In basic B&B algorithm, in each
round validate neighbors are generated from scratch given
H. That would incur the case one node would be generated
multiple times in following steps, starting from H.
Example 2 Consider the graph in Fig. 1 again, suppose
H ¼ fa; g; k ¼ 2. Figure 2 shows one of search paths of
basic B&B algorithm, extending H until it becomes
maximal. At each round, it first gets its neighbors and keeps
those valid. From B1 (candidates in round 1) to B5, node c
is generated and tested three times and node d twice. The
case also applies to nodes not valid. For example, g is
rejected in N3ðHÞ, but still tested in the next round in
The above example shows tedious calculation in
candidate generation step. Here, we propose more efficient
way (shown in algorithm 6) to generate candidates Btþ1,
which takes use of results Bt in last round. We define B as
hash table, whose key is node id and value number of
nonneighbors in H plus 1. First, remove a v of Bt and add it to
H. can expand indicates whether new members can be
added. Then, we increase the value of node who is not in
neighbors of v in Bt by one. Then filter out those
nonneighbors in H exceed k. Next, we add new members x
from NGnðBt[HÞðvÞ if can expand is still positive. There are
three cases that we cannot extend the current candidates Bt
from residual graph G n ðBt [ HÞ:
1. Some node u of Bt becomes tight. In this case, it would
block all new members since any new node x. Because
node u 2 Bt, u must have at least one neighbor in H,
jNH ðxÞj ¼ jHj [ jHj 1 jNHðuÞj [ ¼ k.
2. Some node w of H becomes tight. If x 2 NGnðBt[HÞðvÞ,
adding x would cause jNH[fxgðwÞj [ jNHðwÞj ¼ k.
3. jHj [ k. Then, any new member would have at least
k non-neighbors in H.
In above three cases, can expand would be toggled to
negative and jBtj would decrease in the following
We now analyze time complexity of candidates update.
Updating values of hash table costs OðjBtjÞ time. Checking
tight nodes in H costs OðjHjjBtjÞ time. Adding new
member cost O(|N(x)|) time if Bt can be extended. Since
jBtj always smaller than N(H), the total time complexity
would be O(|H||N(H)|).
6 Heuristic Solutions
Since MCKPQ is hard for both optimization and
approximation, in this section, we propose heuristic algorithms
which are fast and produce high-quality results, while have
no theoretical guarantee. At each step, we choose one node
from NGðHÞ greedily to extend H until H becomes
maximal. The new node is chosen based on the quality function
f. Intuitively, given a set of neighbor nodes to extend H, the
larger f(v), the more promising H [ fvg could reach a
k-plex with large size.
So how to define a quality function is the key issue to
influence results. Given two k-plex H and H0 with same
size, there are two aspects to compare its ability of
1. Number of missing links. K-plex requires each node in
it has at most k non-neighbors, so H would be more
promising to expand to larger size if
jEGðHÞj [ jEGðH0Þj.
2. neighbors of H. This is straightforward because if
NðHÞ [ NðH0Þ, H would have a higher chance to
become a larger community.
So we define the quality function of v as follows:
jNðH [ fvgÞj
fHðvÞ ¼ jEGðH [ fvgÞj þ maxv2CjNðH [ fvgÞj
The intuition is to find neighbors which can achieve largest
edges increment and break ties by choosing the one who
can add more neighbors to current solution.
We have conducted extensive experiments to test the
effectiveness and efficiency of our proposed solutions in
finding k-plex community. We first show by case study that
the effectiveness of k-plex community model capturing
cohesive membership structure and common interests of
members in searching results. Then, we report the
experimental results on the efficiency tests using different
problem settings on different datasets. We use four real-world
datasets provided by Stanford network dataset collection:5
(1) DBLP collaboration network; (2) Amazon product
Table 2 Statistical information of datasets
purchasing network; (3) Google-Web graph; (4) Arxiv
collaboration network of condensed matter. The statistical
information of datasets is shown in Table 2. All algorithms
are implemented in Python and run on a PC with
Intel(R) Core(TM) i7 CPU 860 @2.8GHz 2.8GHz with
7.1 Case Study
Since different community query models differ in their
objective goodness functions (such as degree, density, and
diameter), an uniform quantitive quality evaluation is
Fig. 3 Communities of
Q ¼ fJiawei Hang; k ¼ 2. a
2plex community H1 and b
2plex community H2
beyond this paper’s scope. Instead, we show the results of
our k-plex community search by case studies.
We use the dblp coauthor graph, in this graph two
scientists are linked together if they have worked on the same
paper. We issue the query Q ¼ fJiawei Hang; k ¼ 2 and we
get optimal communities shown in Fig. 3, both of which
achieve maximized community. To see the effectiveness,
we can see members in H1 are all data mining scientists.
Jiawei Han and Philip S. Yu have worked on about 50
papers together, and Jian Pei and Ke Wang have worked on
more than 20 papers together. H1 and H2 also indicate the
ability of extracting overlapping communities of k-plex
model. By increasing k by 1 and using the same query, we
get another result shown in Fig. 4a, which is a supergraph
of H1, this implies the hierarchical structure of k-plex
community model. We also issue query with multiple
nodes Q ¼ fXuemin Lin; Jeffrey Xu Yug; k ¼ 3. From the
result shown in Fig. 4b, we can see members in this
community all have published papers in graph database and
processing, which are one of research interests of Xuemin
Lin and Jeffrey Xu Yu and this is verified by checking their
Next, we study the time efficiency of different methods.
There are two parameters for querying k-plex community:
Q and k. We range |Q| from 1 to 4 and k from 1 to 4. To
generate query sets, some preprocessing steps are
conducted, for example: For each graph, we first sort all the
nodes based on their degree. Then, nodes are partitioned
into five buckets in degree order so that each bucket has the
same number of nodes. For each pair of (|Q|, k), we
randomly sample 20 queries from each degree bucket. So for
each query setting, we get 100 queries to evaluate.
Diameter of query is also concerned, let Qk denote query nodes
for k, then queries are generated with the restriction of
diamGðQkÞ k, since for any diamGðQkÞ [ k, there does
not exist feasible solution. We denote B&K enumeration
by bk_enum, the naive branch-and-bound method by
bb_basic, the partial branching method by bb_pb, the
incremental candidate generation method by bb_inc. Since
most of results of the naive enumeration method do not
return in hours thus cannot be collected, we would ignore
this method in our comparison.
7.2.1 Varying |Q|
We evaluate querying time using different query size |Q|.
For each |Q|, we aggregate query time on |Q| and calculate
the average. The results of different datasets are listed in
Fig. 5. For all datasets, B&K enumeration always has
highest query delay. And branch-and-bound based methods
can reduce query delay significantly. The simple greedy
heuristic solution has lowest cost of time. For different
B&B based methods, bb_inc outperforms bb_basic and
bb_pb since its incremental candidate generation reduces
the repeated evaluation of neighbors. The improvement of
bb_pb on bb_basic is not notable. The reason is that even
Fig. 6 Amazon: query time varying |Q|. a k ¼ 1, b k ¼ 2, c k ¼ 3, and d k ¼ 4
though partial branching decreases the cost of selecting
valid neighbors, i.e., it narrows down the number of
children of a branching node in each step, the size of search
tree is not shrunk. We can also observe that query time
decreases with increasing |Q|. This is due to search space is
reduced by adding query nodes. This is illustrated by
Theorem 10. While j j ¼ 1, searching for maximum
kplex would cost hundreds of seconds. When |Q| has
increased to 4, query time decreases, respectively, to
between 0.01 and 0.1 s. Average query time of each |Q| in
Fig. 5 shows the aggregated results on various k. Thus, to
get a more clear investigation of varying |Q|, we also fix k
for various query sizes. Results of Amazon dataset are
shown in Fig. 6, and evaluation of other datasets is omitted
due to limited space. For k ¼ 1, the task is equal to search
for maximum clique and the B&K enumeration is most
costly compared with other methods, while they could get
near zero cost for clique finding.
7.2.2 Varying k
We test query performance on various k. Even though there
is no limitation of k in k-plex search, usually k is set to
small values to keep subgraph cohesive, 2 and 3 are
frequently used in practice [
]. We range k from 1 to 4. For
each specific k, different query size |Q| is evaluated and we
take the average query time. Results from different datasets
are shown in Fig. 7. The query time increases with
increasing k. This is because the larger k, the larger upper
bound of number of non-neighbors; this results in larger
search space. B&K enumeration is most costly in different
k settings. And B&B based algorithms is better, due to its
pruning technique and early termination. In spite of
improvement of branch-and-bound technique, the
complexity is still exponential with k, since it is an exact
solution. And the heuristic greedy algorithm always
terminates fast and outperforms others. For fixed k, results of
different query size contribute to average query time. Since
queries with small |Q| contribute large amount of query
time. We also test results of different fixed |Q| to get a more
accurate evaluation of various k, shown in Fig. 8. Results
of DBLP dataset are listed and others omitted. As we can
see from Fig. 7, while k ¼ 1, query time is no larger than
0.1 s. When k is increased to 4, the time consuming would
increase to hundreds of seconds. This is due to the
exponential time complexity of MCKPQ on k.
7.2.3 Quality of Heuristic
The quality of output of heuristic greedy algorithm is also
tested, shown in Fig. 9. Since all other methods would
output optimal k-plex community, so only one of exact
algorithm bb_basic is compared. For each specific k,
average size of optimal k-plex is calculated. We feed the
same set of queries to heuristic solution and get average
size of output. We can see from results heuristic solution
has size larger than 50% of optimal averagely. The results
also imply the influence of k on the size of output
community. In average, smaller k would result in smaller
8 Related Work
8.1 Densest Subgraph Problem
Densest subgraph problem is a major research topic in
graph analysis. Given a graph G, the task is to find the
densest subgraph. Average degree is one of most frequent
used measurements in dense subgraph mining. [
gives polynomial time algorithm O(mn) for densest
subgraph problem, by transforming it into min-cut instance
and using binary search to get the optimal density. An
2approximation algorithm is proposed in [
], this is done by
greedily removing node which is of minimum degree. This
greedy algorithm runs in Oðm þ nÞ time. [
introduces densest subgraph problem for directed graphs and
Fig. 8 DBLP: query time varying k. a jQj ¼ 1, b jQj ¼ 2, c jQj ¼ 3 and d jQj ¼ 4
gives an OðlogðnÞÞ-approximation algorithm. [
gives a max-flow based exact algorithm firstly and
improves the greedy one to Oðm þ nÞ.
Even though densest subgraph problem (DSP) admits
polynomial time complexity, it becomes NP-hard when
where is a size constraint. There are three variants: (a)
kdensest subgraph problem (DkS): It requires subgraph
jSj ¼ k; (b) densest subgraph at least k(DalkS): It requires
subgraph jSj k; and (c) denest subgraph at most
k(DamkS): It requires subgraph jSj k. It is hard to
approximate DkS and DamkS problem within a constant
] introduces an Oðn3Þ-approximate algorithm for
DkS. However, DalkS can be approximated with a constant
factor. An 3-approximate solution is shown in [
] provides an 2-approximate algorithm for
Our work differs in that it’s query specific and its
goodness metric is size of k-plex, instead of subgraph
8.2 Community Search
Community search problem is recently proposed in [
The task is to find high-quality community given initial
query members. There is no uniform goodness metric for
this problem. Minimum degree measurement is used in
4, 11, 28
]. K-truss based variants are introduced in
].  requires results are of edge connectivity to
avoid disjoint communities. They further define optimal
ktruss community by largest k and smallest diameter in [
Since the problem becomes NP-hard, they present ð2
Þapproximate algorithm to solve it. [
] addresses the free
rider effect and proposes query-density based community
query problem. [
] studies triangle densest community
problem and shows a max-flow based algorithm. However,
the connectivity of answer community is not guaranteed.
] studies searching overlapping communities given one
single query node. Their model is based on quasi-clique.
Our work differs from above works in definition of
objective community and unrestricted size of query nodes.
maximizing modularity (a global optimal graph partition
may not be an local optimal community for query sets).
8.3 Community Detection
Community detection is a well-studied topic in graphs and
social network analysis. The task is to identify and list all
communities given a graph. A common used measurement
is modularity [
]. Modularity maximization is shown
NPhard theoretically in [
] introduces rounding technique
and efficient algorithms. However, optimizing modularity
in large networks has limitation to resolve small
communities, shown in [
]. The above works focus on detecting
disjoint communities and graphs are partitioned. Since each
person can be involved in multiple groups in social
network, there are also works on detecting all overlapping
]. Our work differs in that it is query
oriented. And query result can be different from that of
In this paper, we study querying optimal k-plex community
problem, that is, given a set of nodes Q, finding optimal
k-plex community consisting of Q. We show that our
proposed community model can guarantee the good quality
of query results theoretically. Based on the fact the
problem is NP-hard and hard to approximate, we design
efficient branch-and-bound method and further improve it by
technique of fast generating candidates. We then give
heuristic solution of low time cost. Experiments show the
effectiveness of our k-plex model and efficiency of our
Open Access This article is distributed under the terms of the
Creative Commons Attribution 4.0 International License (http://crea
tivecommons.org/licenses/by/4.0/), which permits unrestricted use,
distribution, and reproduction in any medium, provided you give
appropriate credit to the original author(s) and the source, provide a
link to the Creative Commons license, and indicate if changes were
1. Agarwal G , Kempe D ( 2008 ) Modularity-maximizing graph communities via mathematical programming . Eur Phys J B 66 ( 3 ): 409 - 418
2. Andersen R , Chellapilla K ( 2009 ) Finding dense subgraphs with size bounds . In: Proceedings of algorithms and models for the web-graph, 6th international workshop , WAW 2009, Barcelona, Spain, February 12-13 , 2009 , pp 25 - 37
3. Balasundaram B , Butenko S , Hicks IV ( 2011 ) Clique relaxations in social network analysis: the maximum k-plex problem . Oper Res 59 ( 1 ): 133 - 142
4. Barbieri N , Bonchi F , Galimberti E , Gullo F ( 2015 ) Efficient and effective community search . Data Min Knowl Discov 29 ( 5 ): 1406 - 1433
5. Batagelj V , Zaversnik M ( 2003 ) An O (m) algorithm for cores decomposition of networks . arXiv preprint cs/0310049
6. Berlowitz D , Cohen S , Kimelfeld B ( 2015 ) Efficient enumeration of maximal k-plexes . In: Proceedings of the 2015 ACM SIGMOD international conference on management of data, Melbourne , Victoria, Australia, May 31-June 4, 2015 , pp 431 - 444
7. Brandes U , Delling D , Gaertler M , Gorke R , Hoefer M , Nikoloski Z , Wagner D ( 2008 ) On modularity clustering . IEEE Trans Knowl Data Eng 20 ( 2 ): 172 - 188
8. Bron C , Kerbosch J ( 1973 ) Algorithm 457: finding all cliques of an undirected graph . Commun ACM 16 ( 9 ): 575 - 577
9. Charikar M ( 2000 ) Greedy approximation algorithms for finding dense components in a graph. In: Proceedings of approximation algorithms for combinatorial optimization , third international workshop, APPROX 2000, Saarbru¨cken, Germany, September 5- 8 , 2000 , pp 84 - 95
10. Cui W , Xiao Y , Wang H , Lu Y , Wang W ( 2013 ) Online search of overlapping communities . In: Proceedings of the 2013 ACM SIGMOD international conference on management of data, ACM , pp 277 - 288
11. Cui W , Xiao Y , Wang H , Wang W ( 2014 ) Local search of communities in large graphs . In: International conference on management of data, SIGMOD 2014 , Snowbird , UT , USA, June 22-27, 2014 , pp 991 - 1002
12. Feige U , Kortsarz G , Peleg D ( 2001 ) The dense k-subgraph problem . Algorithmica 29 ( 3 ): 410 - 421
13. Fortunato S ( 2010 ) Community detection in graphs . Phys Rep 486 ( 3 ): 75 - 174
14. Fortunato S , Barthe´lemy M ( 2007 ) Resolution limit in community detection . Proc Nat Acad Sci 104 ( 1 ): 36 - 41
15. Goldberg AV ( 1984 ) Finding a maximum density subgraph . University of California, Berkeley, Berkeley
16. Ha˚stad J ( 1999 ) Clique is hard to approximate within n1 . Acta Math 182 ( 1 ): 105 - 142
17. Huang X , Cheng H , Qin L , Tian W , Yu JX ( 2014 ) Querying k-truss community in large and dynamic graphs . In: International conference on management of data, SIGMOD 2014 , Snowbird , UT , USA, June 22-27, 2014 , pp 1311 - 1322
18. Huang X , Lakshmanan LVS , Yu JX , Cheng H ( 2015 ) Approximate closest community search in networks . PVLDB 9 ( 4 ): 276 - 287
19. Kannan R , Vinay V ( 1999 ) Analyzing the structure of large graphs . Rheinische Friedrich-Wilhelms-Universita¨t Bonn , Bonn
20. Khuller S , Saha B ( 2009 ) On finding dense subgraphs . In: Proceedings of automata, languages and programming , 36th international colloquium, ICALP 2009 , Rhodes, Greece, July 5- 12 , 2009 , Part I , pp 597 - 608
21. Li K , Lu W , Bhagat S , Lakshmanan LVS , Yu C ( 2014 ) On social event organization . In: The 20th ACM SIGKDD international conference on knowledge discovery and data mining , KDD '14 , New York, NY, USA, August 24- 27 , 2014 , pp 1206 - 1215
22. Li R , Qin L , Yu JX , Mao R ( 2015 ) Influential community search in large networks . PVLDB 8 ( 5 ): 509 - 520
23. McDermott R ( 2000 ) Knowing in community . IHRIM 19
24. Newman ME ( 2004 ) Fast algorithm for detecting community structure in networks . Phys Rev E 69 ( 6 ): 066 , 133
25. Rowe L , Nadeau J , Turner R , Frankel W , Letts V , Eppig J , Ko M , Thurston S , Birkenmeier E ( 1994 ) Maps from two interspecific backcross dna panels available as a community genetic mapping resource . Mamm Genome 5 ( 5 ): 253 - 274
26. Seidman SB , Foster BL ( 1978 ) A graph-theoretic generalization of the clique concept . J Math Sociol 6 ( 1 ): 139 - 154
27. Song DM , Tumminello M , Zhou WX , Mantegna RN ( 2011 ) Evolution of worldwide stock markets, correlation structure, and correlation-based graphs . Phys Rev E 84 ( 2 ): 026 , 108
28. Sozio M , Gionis A ( 2010 ) The community-search problem and how to plan a successful cocktail party . In: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM , pp 939 - 948
29. Tsourakakis C ( 2015 ) The k-clique densest subgraph problem . In: Proceedings of the 24th International Conference on World Wide Web, International World Wide Web Conferences Steering Committee , pp 1122 - 1132
30. Wu Y , Jin R , Li J , Zhang X ( 2015a ) Robust local community detection: on free rider effect and its elimination . Proc VLDB Endow 8 ( 7 ): 798 - 809
31. Wu Y , Jin R , Li J , Zhang X ( 2015b ) Robust local community detection: on free rider effect and its elimination . PVLDB 8 ( 7 ): 798 - 809
32. Xie J , Kelley S , Szymanski BK ( 2013 ) Overlapping community detection in networks: the state-of-the-art and comparative study . ACM Comput Surv (csur) 45 ( 4 ): 43