Approximate matching of run-length compressed strings
Abstract
We focus on the problem of approximate matching of strings that have been compressed using run-length encoding. Previous studies have concentrated on the problem of computing the longest common subsequence (LCS) between two strings of length m and n, compressed to m? and n? runs. We extend an existing algorithm for the LCS to the Levenshtein distance achieving O(m?n + n?m) complexity. Furthermore, we extend this algorithm to a weighted edit distance model, where the weights of the three basic edit operations can be chosen arbitrarily. This approach also gives an algorithm for approximate searching of a pattern of m letters (m? runs) in a text of n letters (n? runs) in O(mm? n?) time. Then we propose improvements for a greedy algorithm for the LCS, and conjecture that the improved algorithm has O(m? n?) expected case complexity. Experimental results are provided to support the conjecture.
Más información
Título según WOS: | Approximate matching of run-length compressed strings |
Título según SCOPUS: | Approximate matching of run-length compressed strings |
Título de la Revista: | ALGORITHMICA |
Volumen: | 35 |
Número: | 4 |
Editorial: | Springer |
Fecha de publicación: | 2003 |
Página de inicio: | 347 |
Página final: | 369 |
Idioma: | English |
URL: | http://link.springer.com/10.1007/s00453-002-1005-2 |
DOI: |
10.1007/s00453-002-1005-2 |
Notas: | ISI, SCOPUS |