Large deviations-based upper bounds on the expected relative length of longest common subsequences
Abstract
Consider the random variable Ln defined as the length of a longest common subsequence of two random strings of length n and whose random characters are independent and identically distributed over a finite alphabet. Chvátal and Sankoff showed that the limit γ = limn→∞ E[Ln]/n is well defined. The exact value of this constant is not known, but various methods for the computation of upper and lower bounds have been discussed in the literature. Even so, high-precision bounds are hard to come by. In this paper we discuss how large deviation theory can be used to derive a consistent sequence of upper bounds, (qm)m∈ℕ, on γ, and how Monte Carlo simulation can be used in theory to compute estimates, q̂m, of the qm such that, for given Θ > 0 and Λ ∈ (0, 1), we have P[γ < q̂ < γ + Θ] ≥ Λ. In other words, with high probability the result is an upper bound that approximates γ to high precision. We establish O ((1 - Λ)-1 Θ-(4+ε)) as a theoretical upper bound on the complexity of computing q̂m to the given level of accuracy and confidence. Finally, we discuss a practical heuristic based on our theoretical approach and discuss its empirical behavior. © Applied Probability Trust 2006.
Más información
Título según WOS: | Large deviations-based upper bounds on the expected relative length of longest common subsequences |
Título según SCOPUS: | Large deviations-based upper bounds on the expected relative length of longest common subsequences |
Título de la Revista: | ADVANCES IN APPLIED PROBABILITY |
Volumen: | 38 |
Número: | 3 |
Editorial: | Applied Probability Trust |
Fecha de publicación: | 2006 |
Página de inicio: | 827 |
Página final: | 852 |
Idioma: | English |
URL: | http://projecteuclid.org/getRecord?id=euclid.aap/1158685004 |
DOI: |
10.1239/aap/1158685004 |
Notas: | ISI, SCOPUS |