Lempel-Ziv compression of highly structured documents
Abstract
The authors describe Lempel-Ziv to Compress Structure (LZCS), a novel Lempel-Ziv approach suitable for compressing structured documents. LZCS takes advantage of repeated substructures that may appear in the documents, by replacing them with a backward reference to their previous occurrence. The result of the LZCS transformation is still a valid structured document, which is human-readable and can be transmitted by ASCII channels. Moreover, LZCS transformed documents are easy to search, display, access at random, and navigate. In a second stage, the transformed documents can be further compressed using any semistatic technique, so that it is still possible to do all those operations efficiently; or with any adaptive technique to boost compression. LZCS is especially efficient in the compression of collections of highly structured data, such as extensible markup language (XML) forms, invoices, e-commerce, and Web-service exchange documents. The comparison with other structure-aware and standard compressors shows that LZCS is a competitive choice for these type of documents, whereas the others are not well-suited to support navigation or random access. When joined to an adaptive compressor, LZCS obtains by far the best compression ratios.
Más información
| Título según WOS: | Lempel-Ziv compression of highly structured documents |
| Título según SCOPUS: | Lempel-Ziv compression of highly structured documents |
| Título de la Revista: | JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY |
| Volumen: | 58 |
| Número: | 4 |
| Editorial: | John Wiley & Sons Inc. |
| Fecha de publicación: | 2007 |
| Página de inicio: | 461 |
| Página final: | 478 |
| Idioma: | English |
| URL: | http://doi.wiley.com/10.1002/asi.20496 |
| DOI: |
10.1002/asi.20496 |
| Notas: | ISI, SCOPUS |