King's College London

Research portal

A faster and more accurate heuristic for cyclic edit distance computation

Research output: Contribution to journalArticle

Standard

A faster and more accurate heuristic for cyclic edit distance computation. / Ayad, Lorraine A.K.; Barton, Carl; Pissis, Solon P.

In: PATTERN RECOGNITION LETTERS, Vol. 88, 01.03.2017, p. 81-87.

Research output: Contribution to journalArticle

Harvard

Ayad, LAK, Barton, C & Pissis, SP 2017, 'A faster and more accurate heuristic for cyclic edit distance computation', PATTERN RECOGNITION LETTERS, vol. 88, pp. 81-87. https://doi.org/10.1016/j.patrec.2017.01.018

APA

Ayad, L. A. K., Barton, C., & Pissis, S. P. (2017). A faster and more accurate heuristic for cyclic edit distance computation. PATTERN RECOGNITION LETTERS, 88, 81-87. https://doi.org/10.1016/j.patrec.2017.01.018

Vancouver

Ayad LAK, Barton C, Pissis SP. A faster and more accurate heuristic for cyclic edit distance computation. PATTERN RECOGNITION LETTERS. 2017 Mar 1;88:81-87. https://doi.org/10.1016/j.patrec.2017.01.018

Author

Ayad, Lorraine A.K. ; Barton, Carl ; Pissis, Solon P. / A faster and more accurate heuristic for cyclic edit distance computation. In: PATTERN RECOGNITION LETTERS. 2017 ; Vol. 88. pp. 81-87.

Bibtex Download

@article{0feb54c9d3b44b4d8800174dd47c6cc4,
title = "A faster and more accurate heuristic for cyclic edit distance computation",
abstract = "Sequence comparison is the core computation of many applications involving textual representations of data. Edit distance is the most widely used measure to quantify the similarity of two sequences. Edit distance can be defined as the minimal total cost of a sequence of edit operations to transform one sequence into the other; for a sequence x of length m and a sequence y of length n, it can be computed in time O ( m n ) . In many applications, it is common to consider sequences with circular structure: for instance, the orientation of two images or the leftmost position of two linearised circular DNA sequences may be irrelevant. To this end, an algorithm to compute the cyclic edit distance in time O ( m n log m ) was proposed (Maes, Inf. Proc. Let. 2003) and several heuristics have been proposed to speed up this computation. Recently, a new algorithm based on q-grams was proposed for circular sequence comparison (Grossi et al., Algorithms Mol Biol. 2016). We extend this algorithm for cyclic edit distance computation and show that this new heuristic is faster and more accurate than the state of the art. The aim of this letter is to give visibility to this idea in the pattern recognition community.",
keywords = "Cyclic edit distance, circular sequences, cyclic strings, chain code, q-gram distance",
author = "Ayad, {Lorraine A.K.} and Carl Barton and Pissis, {Solon P.}",
year = "2017",
month = mar,
day = "1",
doi = "10.1016/j.patrec.2017.01.018",
language = "English",
volume = "88",
pages = "81--87",
journal = "PATTERN RECOGNITION LETTERS",
issn = "0167-8655",
publisher = "Elsevier",

}

RIS (suitable for import to EndNote) Download

TY - JOUR

T1 - A faster and more accurate heuristic for cyclic edit distance computation

AU - Ayad, Lorraine A.K.

AU - Barton, Carl

AU - Pissis, Solon P.

PY - 2017/3/1

Y1 - 2017/3/1

N2 - Sequence comparison is the core computation of many applications involving textual representations of data. Edit distance is the most widely used measure to quantify the similarity of two sequences. Edit distance can be defined as the minimal total cost of a sequence of edit operations to transform one sequence into the other; for a sequence x of length m and a sequence y of length n, it can be computed in time O ( m n ) . In many applications, it is common to consider sequences with circular structure: for instance, the orientation of two images or the leftmost position of two linearised circular DNA sequences may be irrelevant. To this end, an algorithm to compute the cyclic edit distance in time O ( m n log m ) was proposed (Maes, Inf. Proc. Let. 2003) and several heuristics have been proposed to speed up this computation. Recently, a new algorithm based on q-grams was proposed for circular sequence comparison (Grossi et al., Algorithms Mol Biol. 2016). We extend this algorithm for cyclic edit distance computation and show that this new heuristic is faster and more accurate than the state of the art. The aim of this letter is to give visibility to this idea in the pattern recognition community.

AB - Sequence comparison is the core computation of many applications involving textual representations of data. Edit distance is the most widely used measure to quantify the similarity of two sequences. Edit distance can be defined as the minimal total cost of a sequence of edit operations to transform one sequence into the other; for a sequence x of length m and a sequence y of length n, it can be computed in time O ( m n ) . In many applications, it is common to consider sequences with circular structure: for instance, the orientation of two images or the leftmost position of two linearised circular DNA sequences may be irrelevant. To this end, an algorithm to compute the cyclic edit distance in time O ( m n log m ) was proposed (Maes, Inf. Proc. Let. 2003) and several heuristics have been proposed to speed up this computation. Recently, a new algorithm based on q-grams was proposed for circular sequence comparison (Grossi et al., Algorithms Mol Biol. 2016). We extend this algorithm for cyclic edit distance computation and show that this new heuristic is faster and more accurate than the state of the art. The aim of this letter is to give visibility to this idea in the pattern recognition community.

KW - Cyclic edit distance

KW - circular sequences

KW - cyclic strings

KW - chain code

KW - q-gram distance

U2 - 10.1016/j.patrec.2017.01.018

DO - 10.1016/j.patrec.2017.01.018

M3 - Article

VL - 88

SP - 81

EP - 87

JO - PATTERN RECOGNITION LETTERS

JF - PATTERN RECOGNITION LETTERS

SN - 0167-8655

ER -

View graph of relations

© 2020 King's College London | Strand | London WC2R 2LS | England | United Kingdom | Tel +44 (0)20 7836 5454