4.3 Article

Inexact tree pattern matching with 1-degree edit distance using finite automata

Journal

DISCRETE APPLIED MATHEMATICS
Volume 330, Issue -, Pages 78-97

Publisher

ELSEVIER
DOI: 10.1016/j.dam.2023.01.003

Keywords

Inexact tree pattern matching; Approximate tree pattern matching; 1-degree edit distance; Finite automaton; Dynamic programming

Ask authors/readers for more resources

Given an input tree and a tree pattern, the inexact tree pattern matching problem is to find all subtrees in the input tree that match the tree pattern with up to k errors. The proposed solution is based on a finite automaton that reads the input tree represented in linear, prefix bar notation. The deterministic version of the finite automaton finds all inexact occurrences of the tree pattern in linear time to the size of the input tree.
Given an input tree and a tree pattern, the inexact tree pattern matching problem is about finding all subtrees in the input tree that match the tree pattern with up to k errors. To measure the number of errors between two labeled ordered trees, we use the 1-degree edit distance that uses operations node relabel, leaf insert, and leaf delete. The proposed solution is based on a finite automaton that reads a given input tree represented in linear, prefix bar, notation. First, we show its construction for a constrained variant of 1-degree edit distance where leaf insert/delete operations cannot be applied to the tree pattern recursively. Then, we extend this approach to both unit cost and non-unit cost 1-degree edit distance. The deterministic version of the proposed finite automaton finds all inexact occurrences of the tree pattern in time linear to the size of the input tree. However, since the size of such automaton can be exponential in the number of nodes of the tree pattern, it is practical only for small patterns. Therefore, we also consider the dynamic programming approach as a way of simulating the nondeterministic finite automaton. This approach comes with the space complexity O(mk) and time complexity O(kmn) where m is the number of nodes of the tree pattern, n is the number of nodes of the input tree, and k is the maximum number of errors allowed.(c) 2023 Elsevier B.V. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.3
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available