Generating realistic scaled complex networks
Staudt et al. Applied Network Science
Generating realistic scaled complex networks
Christian L. Staudt 0
Michael Hamann 0
Alexander Gutfraind 0
Ilya Safro 0
0 and Henning Meyerhenke
Research on generative models plays a central role in the emerging field of network science, studying how statistical patterns found in real networks could be generated by formal rules. Output from these generative models is then the basis for designing and evaluating computational methods on networks including verification and simulation studies. During the last two decades, a variety of models has been proposed with an ultimate goal of achieving comprehensive realism for the generated networks. In this study, we (a) introduce a new generator, termed ReCoN; (b) explore how ReCoN and some existing models can be fitted to an original network to produce a structurally similar replica, (c) use ReCoN to produce networks much larger than the original exemplar, and finally (d) discuss open problems and promising research directions. In a comparative experimental study, we find that ReCoN is often superior to many other state-of-the-art network generation methods. We argue that ReCoN is a scalable and effective tool for modeling a given network while preserving important properties at both micro- and macroscopic scales, and for scaling the exemplar data by orders of magnitude in size.
Network generation; Multiscale modeling; Network modeling; Communities
Introduction
Networks are widely used to represent connections between entities, because they
provide intuitive windows into the function, dynamics, and evolution of natural and
man-made systems. However, high-quality, large-scale network data is often unavailable
because of economic, legal, technological, or other obstacles
(Chakrabarti and Faloutsos
2006; Brase and Brown 2009)
. For example, human contact networks in the context of
infectious disease spread are notoriously difficult to estimate, and thus our understanding
of the dynamics and control of epidemics stems from models that make highly simplifying
assumptions or simulate contact networks from incomplete or proxy data
(Eubank et al.
2004; Keeling and Rohani 2008; Meyers et al. 2005)
. In another domain, the development
of cybersecurity systems requires testing across diverse threat scenarios and validation
across diverse network structures that are not yet known, in anticipation of the computer
networks of the future (Dunlavy et al. 2009). In both examples, the systems of interest
cannot be represented by a single exemplar network, but must instead be modeled as
collections of networks in which the variation among them may be just as important as their
common features. Such cases point to the importance of data-driven methods for
synthesizing networks that capture both the essential features of a system and realistic variability
in order to use them in such tasks as simulations, analysis, and decision making.
A good network generator must meet two primary criteria: realism and diversity.
The first, realism, needs to consider any properties of the network that govern the
domain-specific processes of interest such as system function, dynamics, and
evolution. Hence, realism may depend on both structural network features and the more
subtle emerging features of the network. Consider the following examples in potential
applications:
• Models of social networks should be able not only to reproduce structural features
such as small-world properties, but also, and perhaps more importantly, to emulate
emergent sociological phenomena such as interactions between individuals in a
community, as driven by their psychological needs and daily routines. That is, the
generated network should show similar interactions by its artificial individuals, as
determined by implicit psychological and social rules.
• Models of connected solar energy collectors of different sizes and capacities should
simulate realistic energy outputs influenced by the weather.
• Models of metabolic interactions should ultimately reflect biochemical properties of
a cell.
Second, a synthetic network should reflect naturally occurring stochasticity in a
system, without systematic bias that departs from reality. This feature is important for
benchmarking and evaluating the robustness of network-based algorithms, anonymizing
networks, and generating plausible hypothetical scenarios. In particular, when
engineering algorithms, the ability to create good synthetic test data sets is valuable to estimate
effectiveness and scalability of the proposed methods.
In addition, a network generator should be effective in tasks such as obfuscation
(replacing restricted real data with similar synthetic data), compression (storing only a generator
and its parameters instead of large graphs), as well as extrapolation and sampling
(generating data at larger or smaller scales). Finally, the running time and memory requirements
of the generator should be acceptable for realistic (...truncated)