Subgraph-Indexed Sequential Subdivision for Continuous Subgraph Matching on Dynamic Knowledge Graph
Hindawi
Complexity
Volume 2020, Article ID 8871756, 18 pages
https://doi.org/10.1155/2020/8871756
Research Article
Subgraph-Indexed Sequential Subdivision for Continuous
Subgraph Matching on Dynamic Knowledge Graph
Yunhao Sun , Guanyu Li , Mengmeng Guan , and Bo Ning
Faculty of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
Correspondence should be addressed to Guanyu Li;
Received 22 September 2020; Revised 24 October 2020; Accepted 9 November 2020; Published 22 December 2020
Academic Editor: Weitong Chen
Copyright © 2020 Yunhao Sun et al. This is an open access article distributed under the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Continuous subgraph matching problem on dynamic graph has become a popular research topic in the field of graph analysis,
which has a wide range of applications including information retrieval and community detection. Specifically, given a query graph
q, an initial graph G0 , and a graph update stream △Gi , the problem of continuous subgraph matching is to sequentially conduct all
possible isomorphic subgraphs covering △Gi of q on Gi (�G0 ⊕ △Gi ). Since knowledge graph is a directed labeled multigraph
having multiple edges between a pair of vertices, it brings new challenges for the problem focusing on dynamic knowledge graph.
One challenge is that the multigraph characteristic of knowledge graph intensifies the complexity of candidate calculation, which
is the combination of complex topological and attributed structures. Another challenge is that the isomorphic subgraphs covering
a given region are conducted on a huge search space of seed candidates, which causes a lot of time consumption for searching the
unpromising candidates. To address these challenges, a method of subgraph-indexed sequential subdivision is proposed to
accelerating the continuous subgraph matching on dynamic knowledge graph. Firstly, a flow graph index is proposed to arrange
the search space of seed candidates in topological knowledge graph and an adjacent index is designed to accelerate the identification of candidate activation states in attributed knowledge graph. Secondly, the sequential subdivision of flow graph index
and the transition state model are employed to incrementally conduct subgraph matching and maintain the regional influence of
changed candidates, respectively. Finally, extensive empirical studies on real and synthetic graphs demonstrate that our techniques outperform the state-of-the-art algorithms.
1. Introduction
The problem of subgraph matching is one fundamental issue
in graph search, which is NP-Complete problem [1]. Specifically, given a query graph q and a large data graph G, the
problem of subgraph matching is to extract all isomorphic
subgraphs of q on G. In real world, data is usually emerged as
a streamlined feature in social networks, which is formed as a
graph stream. Recently, continuous subgraph matching on
dynamic graph has become a popular research topic in the
field of graph analysis, which has a wide range of applications including query answering [2], information retrieval
[3, 4], and community detection [5, 6]. Specifically, given a
query graph q, an initial graph G0 , and a graph update
stream △Gi , the problem of continuous subgraph matching
is to sequentially conduct all possible isomorphic subgraphs
covering △Gi of q on Gi (�G0 ⊕ △Gi ). In this paper, we study
the continuous subgraph matching on a special graph
structure of knowledge graph (KG-CSM).
Despite the complex multigraph characteristic of
knowledge graph and the polynomial-time complexity of
continuous subgraph matching [1], recent existing research
studies have made significant advances in developing
computational paradigm of KG-CSM.
One aspect is to storing and indexing RDF triple data
based on relational approaches. Weiss et al. [7] and Pérez
et al. [8] employed an index-based solution to storing triples
directly in an index of B+ -tree over multiple redundant
〈s, p, o〉 permutations. Abadi et al. [9] vertically partitioned
the RDF triples into a set of tables bounded by the labels of
patterns and used an index structure on top of it to locate the
required tables. Broekstra et al. [10] were based on the idea of
2
graph database and abstract concepts of RDF triples with
multiple properties. The same pattern matching strategy was
used to provide a pattern selectivity approach, which can
determine the search space for data tables. This strategy used
a tree-pattern structure to filter RDF data into tables, which
stored partial operated data units. Then, the partial operated
data units were incrementally joined by searching the treepattern structure. However, relational approaches result in
extensive indexing and data preprocessing because the approaches are coupled with sophisticated statistics and highly
joining depth and query-optimization techniques.
Another aspect is to resolving the recalculations of
matches with the aid of intermediate results. The incremental solutions have been employed in a variety of applications [11–13]. The solutions aim at the incremental
strategies for generating results without incurring the expensive cost of recalculated data resources. However, most
incremental methods are approximate algorithms based on
relaxed graph simulations and only work for small numbers
of graphs. And the incremental solutions are hard to be
presented in the context of KG-CSM because of the inherent
complexity and large-scale nature of knowledge multigraph
structure.
1.1. Challenge 1: Multigraph Characteristic of Knowledge
Graph Intensifies the Complexity of Candidate Calculation.
Knowledge graph is a directed labeled multigraph having
multiple edges between a pair of vertices, each vertex
represents an entity with attributes and each edge denotes
an interentity relationship. Considering the model of
knowledge multigraph in Figure 1, it is composed of attributed and topological structures. The attributed structure describes the attribute and type of entity, where
attribute is taken as the label of edge coupled with a value
and type is taken as the label of entity. The topological
structure describes the relationship between a pair of entities and some relationships are coexistent, e.g., partnerships and couple relationship between persons. The
multigraph characteristic of knowledge graph leads to a
more dense adjacent structure than general graph, and it
brings a new challenge to the research of KG-CSM problem. Furthermore, KG-CSM problem still contains the
traditional challenge on general graph.
1.2. Challenge 2: Subgraph Isomorphic Mappings Covering a
Given Region Are Conducted on a Huge Search Space of Seed
Candidates. The traditional challenge on general graph is
that the isomorphic subgraphs covering a given region are
conducted on a huge search space of seed candidates, which
causes a lot of time consu (...truncated)