Subgraph-Indexed Sequential Subdivision for Continuous Subgraph Matching on Dynamic Knowledge Graph (pdf)

Article PDF cannot be displayed. You can download it here:

https://downloads.hindawi.com/journals/complexity/2020/8871756.pdf

Subgraph-Indexed Sequential Subdivision for Continuous Subgraph Matching on Dynamic Knowledge Graph

Hindawi Complexity Volume 2020, Article ID 8871756, 18 pages https://doi.org/10.1155/2020/8871756 Research Article Subgraph-Indexed Sequential Subdivision for Continuous Subgraph Matching on Dynamic Knowledge Graph Yunhao Sun , Guanyu Li , Mengmeng Guan , and Bo Ning Faculty of Information Science and Technology, Dalian Maritime University, Dalian 116026, China Correspondence should be addressed to Guanyu Li; Received 22 September 2020; Revised 24 October 2020; Accepted 9 November 2020; Published 22 December 2020 Academic Editor: Weitong Chen Copyright © 2020 Yunhao Sun et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Continuous subgraph matching problem on dynamic graph has become a popular research topic in the ﬁeld of graph analysis, which has a wide range of applications including information retrieval and community detection. Speciﬁcally, given a query graph q, an initial graph G0 , and a graph update stream △Gi , the problem of continuous subgraph matching is to sequentially conduct all possible isomorphic subgraphs covering △Gi of q on Gi (�G0 ⊕ △Gi ). Since knowledge graph is a directed labeled multigraph having multiple edges between a pair of vertices, it brings new challenges for the problem focusing on dynamic knowledge graph. One challenge is that the multigraph characteristic of knowledge graph intensiﬁes the complexity of candidate calculation, which is the combination of complex topological and attributed structures. Another challenge is that the isomorphic subgraphs covering a given region are conducted on a huge search space of seed candidates, which causes a lot of time consumption for searching the unpromising candidates. To address these challenges, a method of subgraph-indexed sequential subdivision is proposed to accelerating the continuous subgraph matching on dynamic knowledge graph. Firstly, a ﬂow graph index is proposed to arrange the search space of seed candidates in topological knowledge graph and an adjacent index is designed to accelerate the identiﬁcation of candidate activation states in attributed knowledge graph. Secondly, the sequential subdivision of ﬂow graph index and the transition state model are employed to incrementally conduct subgraph matching and maintain the regional inﬂuence of changed candidates, respectively. Finally, extensive empirical studies on real and synthetic graphs demonstrate that our techniques outperform the state-of-the-art algorithms. 1. Introduction The problem of subgraph matching is one fundamental issue in graph search, which is NP-Complete problem [1]. Speciﬁcally, given a query graph q and a large data graph G, the problem of subgraph matching is to extract all isomorphic subgraphs of q on G. In real world, data is usually emerged as a streamlined feature in social networks, which is formed as a graph stream. Recently, continuous subgraph matching on dynamic graph has become a popular research topic in the ﬁeld of graph analysis, which has a wide range of applications including query answering [2], information retrieval [3, 4], and community detection [5, 6]. Speciﬁcally, given a query graph q, an initial graph G0 , and a graph update stream △Gi , the problem of continuous subgraph matching is to sequentially conduct all possible isomorphic subgraphs covering △Gi of q on Gi (�G0 ⊕ △Gi ). In this paper, we study the continuous subgraph matching on a special graph structure of knowledge graph (KG-CSM). Despite the complex multigraph characteristic of knowledge graph and the polynomial-time complexity of continuous subgraph matching [1], recent existing research studies have made signiﬁcant advances in developing computational paradigm of KG-CSM. One aspect is to storing and indexing RDF triple data based on relational approaches. Weiss et al. [7] and Pérez et al. [8] employed an index-based solution to storing triples directly in an index of B+ -tree over multiple redundant 〈s, p, o〉 permutations. Abadi et al. [9] vertically partitioned the RDF triples into a set of tables bounded by the labels of patterns and used an index structure on top of it to locate the required tables. Broekstra et al. [10] were based on the idea of 2 graph database and abstract concepts of RDF triples with multiple properties. The same pattern matching strategy was used to provide a pattern selectivity approach, which can determine the search space for data tables. This strategy used a tree-pattern structure to ﬁlter RDF data into tables, which stored partial operated data units. Then, the partial operated data units were incrementally joined by searching the treepattern structure. However, relational approaches result in extensive indexing and data preprocessing because the approaches are coupled with sophisticated statistics and highly joining depth and query-optimization techniques. Another aspect is to resolving the recalculations of matches with the aid of intermediate results. The incremental solutions have been employed in a variety of applications [11–13]. The solutions aim at the incremental strategies for generating results without incurring the expensive cost of recalculated data resources. However, most incremental methods are approximate algorithms based on relaxed graph simulations and only work for small numbers of graphs. And the incremental solutions are hard to be presented in the context of KG-CSM because of the inherent complexity and large-scale nature of knowledge multigraph structure. 1.1. Challenge 1: Multigraph Characteristic of Knowledge Graph Intensiﬁes the Complexity of Candidate Calculation. Knowledge graph is a directed labeled multigraph having multiple edges between a pair of vertices, each vertex represents an entity with attributes and each edge denotes an interentity relationship. Considering the model of knowledge multigraph in Figure 1, it is composed of attributed and topological structures. The attributed structure describes the attribute and type of entity, where attribute is taken as the label of edge coupled with a value and type is taken as the label of entity. The topological structure describes the relationship between a pair of entities and some relationships are coexistent, e.g., partnerships and couple relationship between persons. The multigraph characteristic of knowledge graph leads to a more dense adjacent structure than general graph, and it brings a new challenge to the research of KG-CSM problem. Furthermore, KG-CSM problem still contains the traditional challenge on general graph. 1.2. Challenge 2: Subgraph Isomorphic Mappings Covering a Given Region Are Conducted on a Huge Search Space of Seed Candidates. The traditional challenge on general graph is that the isomorphic subgraphs covering a given region are conducted on a huge search space of seed candidates, which causes a lot of time consu (...truncated)